Re: [HACKERS] Patch: Implement failover on libpq connect level.
On Mon, Nov 14, 2016 at 11:31 AM, Tsunakawa, Takayuki < tsunakawa.ta...@jp.fujitsu.com> wrote: > PGconn->target_server_type is not freed in freePGconn(). Thanks, will fix in new patch. >Could you add PGTARGETSERVERTYPE environment variable? Like other variables, it will ease testing, since users can change the behavior >without altering the connection string here and there. Okay, will add one. > I think it would be better to expose the server state via ParameterStatus protocol message like standard_conforming_strings, instead of running > >"SELECT pg_is_in_recovery()". We shouldn't want to add one round trip to check the server type (master, standby). postmaster can return the server >type based on its state (pmState); PM_RUN is master, and PM_HOT_STANDBY is standby. In addition, as an impractical concern, DBA can revoke >EXECUTE privilege on pg_is_in_recovery() from non-superusers. If you are suggesting me to change in protocol messages, I think that would not be backward compatible to older version servers. I also think such level of protocol changes will not be allowed. with connection status CONNECTION_SETENV used for protocol version 2.0 setup, we sent some query like "select pg_catalog.pg_client_encoding()" for same. So I think using "SELECT pg_is_in_recovery()" should be fine. - Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Patch: Implement failover on libpq connect level.
On Fri, Nov 11, 2016 at 7:33 PM, Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > We tend to use the term "primary" instead of "master". Thanks, I will use primary instead of master in my next patch. >Will this work with logical replication? I have not tested with logical replication. Currently we identify the primary to connect based on result of "SELECT pg_is_in_recovery()". So I think it works. Do you want me test a particular setup? -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
[HACKERS] Re: [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests
On 14 November 2016 at 15:01, Craig Ringerwrote: > These three are related enough, and all only touch the TAP framework, > so I've bundled them in a series. CF entry https://commitfest.postgresql.org/12/883/ -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests
Every test I write with the TAP framework for recovery seems to need to wait for one node to catch up to another or examine replication slot state. So: attached is a patch to add helper methods for these tasks. Also add a new pg_lsn.pm helper class to deal with PostgreSQL's awkward representation of LSNs in perl. Made extra fun by our use of Perl 5.8.8 with no new modules, so we can't rely on having 64-bit integers. Provides sensible LSN comparisons. I'll be using this in coming patches that have to make assertions about differences between LSNs, and I think it's sensible to have anyway. Incorporates tests for the class. Finally, add some coverage of physical replication slots to recovery tests. Backpatch to 9.6 desirable, since we seem to be doing that for TAP infrastructure. These three are related enough, and all only touch the TAP framework, so I've bundled them in a series. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From f01790bb3f47f7fabf64328f8c9a01e8f3cf837c Mon Sep 17 00:00:00 2001 From: Craig RingerDate: Wed, 9 Nov 2016 13:44:04 +0800 Subject: [PATCH 3/3] Expand streaming replication tests to cover hot standby feedback and physical replication slots --- src/test/recovery/t/001_stream_rep.pl | 105 +- 1 file changed, 104 insertions(+), 1 deletion(-) diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl index 5ce69bb..ef29892 100644 --- a/src/test/recovery/t/001_stream_rep.pl +++ b/src/test/recovery/t/001_stream_rep.pl @@ -3,7 +3,7 @@ use strict; use warnings; use PostgresNode; use TestLib; -use Test::More tests => 4; +use Test::More tests => 22; # Initialize master node my $node_master = get_new_node('master'); @@ -58,3 +58,106 @@ is($node_standby_1->psql('postgres', 'INSERT INTO tab_int VALUES (1)'), 3, 'read-only queries on standby 1'); is($node_standby_2->psql('postgres', 'INSERT INTO tab_int VALUES (1)'), 3, 'read-only queries on standby 2'); + +diag "switching to physical replication slot"; +# Switch to using a physical replication slot. We can do this without a new +# backup since physical slots can go backwards if needed. Do so on both +# standbys. Since we're going to be testing things that affect the slot state, +# also increase the standby feedback interval to ensure timely updates. +my ($slotname_1, $slotname_2) = ('standby_1', 'standby_2'); +$node_master->append_conf('postgresql.conf', "max_replication_slots = 4\n"); +$node_master->restart; +is($node_master->psql('postgres', qq[SELECT pg_create_physical_replication_slot('$slotname_1');]), 0, 'physical slot created on master'); +$node_standby_1->append_conf('recovery.conf', "primary_slot_name = $slotname_1\n"); +$node_standby_1->append_conf('postgresql.conf', "wal_receiver_status_interval = 1\n"); +$node_standby_1->append_conf('postgresql.conf', "max_replication_slots = 4\n"); +$node_standby_1->restart; +is($node_standby_1->psql('postgres', qq[SELECT pg_create_physical_replication_slot('$slotname_2');]), 0, 'physical slot created on intermediate replica'); +$node_standby_2->append_conf('recovery.conf', "primary_slot_name = $slotname_2\n"); +$node_standby_2->append_conf('postgresql.conf', "wal_receiver_status_interval = 1\n"); +$node_standby_2->restart; + +sub get_slot_xmins +{ + my ($node, $slotname) = @_; + my $slotinfo = $node->slot($slotname); + return ($slotinfo->{'xmin'}, $slotinfo->{'catalog_xmin'}); +} + +# There's no hot standby feedback and there are no logical slots on either peer +# so xmin and catalog_xmin should be null on both slots. +my ($xmin, $catalog_xmin) = get_slot_xmins($node_master, $slotname_1); +is($xmin, '', 'non-cascaded slot xmin null with no hs_feedback'); +is($catalog_xmin, '', 'non-cascaded slot xmin null with no hs_feedback'); + +($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2); +is($xmin, '', 'cascaded slot xmin null with no hs_feedback'); +is($catalog_xmin, '', 'cascaded slot xmin null with no hs_feedback'); + +# Replication still works? +$node_master->safe_psql('postgres', 'CREATE TABLE replayed(val integer);'); + +sub replay_check +{ + my $newval = $node_master->safe_psql('postgres', 'INSERT INTO replayed(val) SELECT coalesce(max(val),0) + 1 AS newval FROM replayed RETURNING val'); + $node_master->wait_for_catchup($node_standby_1); + $node_standby_1->wait_for_catchup($node_standby_2); + $node_standby_1->safe_psql('postgres', qq[SELECT 1 FROM replayed WHERE val = $newval]) + or die "standby_1 didn't replay master value $newval"; + $node_standby_2->safe_psql('postgres', qq[SELECT 1 FROM replayed WHERE val = $newval]) + or die "standby_2 didn't replay standby_1 value $newval"; +} + +replay_check(); + +diag "enabling hot_standby_feedback"; +# Enable hs_feedback. The slot should gain an xmin. We set the status interval +# so we'll see the results promptly.
Re: [HACKERS] 9.6 TAP tests and extensions
On 27 October 2016 at 00:42, Robert Haaswrote: > On Wed, Oct 26, 2016 at 7:17 AM, Andres Freund wrote: >> On 2016-09-23 16:04:32 -0400, Tom Lane wrote: >>> Looking back over the thread, I see that you also proposed installing >>> isolationtester and pg_isolation_regress for the benefit of extensions. >>> I'm very much less excited about that idea. It'd be substantially more >>> dead weight in typical installations, and I'm not sure that it'd be useful >>> to common extensions, and I'm not eager to treat isolationtester's API >>> and behavior as something we need to hold stable for extension use. >> >> FWIW, I'd be quite happy if it were installed. Running isolationtester >> when compiling extensions against distribution postgres packages would >> be quite useful. > > +1. Patch attached. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 40735041753c48d282f3af58482992b8da00c96d Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Mon, 14 Nov 2016 14:51:09 +0800 Subject: [PATCH] Install isolationtester Per discussion in --- src/Makefile| 1 + src/test/isolation/Makefile | 9 + 2 files changed, 10 insertions(+) diff --git a/src/Makefile b/src/Makefile index 977f80b..d4aa06b 100644 --- a/src/Makefile +++ b/src/Makefile @@ -27,6 +27,7 @@ SUBDIRS = \ pl \ makefiles \ test/regress \ + test/isolation \ test/perl # There are too many interdependencies between the subdirectories, so diff --git a/src/test/isolation/Makefile b/src/test/isolation/Makefile index 3d272d5..613309e 100644 --- a/src/test/isolation/Makefile +++ b/src/test/isolation/Makefile @@ -66,3 +66,12 @@ installcheck-prepared-txns: all temp-install check-prepared-txns: all temp-install ./pg_isolation_regress --temp-instance=./tmp_check $(TEMP_CONF) $(EXTRA_REGRESS_OPTS) --inputdir=$(srcdir) --schedule=$(srcdir)/isolation_schedule prepared-transactions + +install: all installdirs + $(INSTALL_PROGRAM) isolationtester$(X) '$(DESTDIR)$(pgxsdir)/$(subdir)/isolationtester$(X)' + +installdirs: + $(MKDIR_P) '$(DESTDIR)$(pgxsdir)/$(subdir)' + +uninstall: + rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/isolationtester$(X)' -- 2.5.5 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Allow TAP tests to be run individually
On 11 November 2016 at 18:13, Michael Paquierwrote: > On Fri, Nov 11, 2016 at 6:10 PM, Craig Ringer wrote: >> Please backpatch to at least 9.6 since it's trivial and we seem to be >> doing that for TAP. 9.5 and 9.4 would be nice too :) > > Yes please! No immediate takers, so adding to CF. I've taken the liberty of adding you as a reviewer based on your response and the simplicity of the patch. if you get the chance to test and verify please set ready for committer. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2016/11/04 0:49, Robert Haas wrote: > On Thu, Nov 3, 2016 at 7:46 AM,wrote: >> El 2016-10-28 07:53, Amit Langote escribió: >>> @@ -6267,6 +6416,12 @@ ATAddForeignKeyConstraint(AlteredTableInfo *tab, >>> Relation rel, >>> * Validity checks (permission checks wait till we have the column >>> * numbers) >>> */ >>> + if (pkrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) >>> + ereport(ERROR, >>> + (errcode(ERRCODE_WRONG_OBJECT_TYPE), >>> +errmsg("cannot reference relation >>> \"%s\"", RelationGetRelationName(pkrel)), >>> +errdetail("Referencing partitioned tables >>> in foreign key constraints is not supported."))); >> >> Is there a plan for fixing this particular limitation? It's a pretty >> serious problem for users, >> and the suggested workaround (to create a separate non-partitioned table >> which carries only the PK >> columns which is updated by triggers, and direct the FKs to it instead of to >> the partitioned table) >> is not only a very ugly one, but also very slow. > > If you have two compatibly partitioned tables, and the foreign key > matches the partitioning keys, you could implement a foreign key > between the two tables as a foreign key between each pair of matching > partitions. Otherwise, isn't the only way to handle this a global > index? I am assuming you don't mean a global index (on partitioned tables) as in some new kind of monolithic physical structure that implements the constraint across tables (partitions), right? I'm thinking you mean a collection of btree indexes on individual partitions with the key of each index matching the partition key of the parent, created internally as part of the creation of the same index on the parent. In fact, the said indexes are created and maintained sort of like how inherited attributes, constraints are. That would require quite a bit of new infrastructure. We did discuss about the possibility of such a feature being implemented on top of declarative partitioning, but not in version 1 [1]. Thanks, Amit [1] https://www.postgresql.org/message-id/CA%2BTgmoZZMfcf16YaHuhP1Vk%3Dj8PDFeHCvfj%2BFJQd%2BeFhs%2B7P8A%40mail.gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress
On Sat, Nov 12, 2016 at 9:01 PM, Andres Freundwrote: > On 2016-11-11 16:42:43 +0900, Michael Paquier wrote: > >> + * This takes also >> + * advantage to avoid 8-byte torn reads on some platforms by using the >> + * fact that each insert lock is located on the same cache line. > > Something residing on the same cache line doens't provide that guarantee > on all platforms. OK. Let's remove it then. >> + * XXX: There is still room for more improvements here, particularly >> + * WAL operations related to unlogged relations (INIT_FORKNUM) should not >> + * update the progress LSN as those relations are reset during crash >> + * recovery so enforcing buffers of such relations to be flushed for >> + * example in the case of a load only on unlogged relations is a waste >> + * of disk write. > > Commit records still have to be written, everything else doesn't write > WAL. So I'm doubtful this matters much? Hm, okay. In most cases this may not matter... Let's rip that off. >> @@ -997,6 +1022,24 @@ XLogInsertRecord(XLogRecData *rdata, XLogRecPtr >> fpw_lsn) >> inserted = true; >> } >> >> + /* >> + * Update the LSN progress positions. At least one WAL insertion lock >> + * is already taken appropriately before doing that, and it is simpler >> + * to do that here when the WAL record data and type are at hand. > > But we don't use the "WAL record data and type"? Yes, at some point this patch did so... >> + * GetProgressRecPtr -- Returns the newest WAL activity position, or in >> + * other words any activity not referring to standby logging or segment >> + * switches. Finding the last activity position is done by scanning each >> + * WAL insertion lock by taking directly the light-weight lock associated >> + * to it. >> + */ > > I'd prefer not to list the specific records here - that's just > guaranteed to get out of date. Why not say something "any activity not > requiring a checkpoint to be triggered" or such? OK. Makes sense to minimize maintenance. >> + * If this isn't a shutdown or forced checkpoint, and if there has >> been no >> + * WAL activity, skip the checkpoint. The idea here is to avoid >> inserting >> + * duplicate checkpoints when the system is idle. That wastes log >> space, >> + * and more importantly it exposes us to possible loss of both current >> and >> + * previous checkpoint records if the machine crashes just as we're >> writing >> + * the update. > > Shouldn't this mention archiving and also that we also ignore some forms > of WAL activity? I have reworded that as: "If this isn't a shutdown or forced checkpoint, and if there has been no WAL activity requiring a checkpoint, skip it." >> -/* Should the in-progress insertion log the origin? */ >> -static bool include_origin = false; >> +/* status flags of the in-progress insertion */ >> +static uint8 status_flags = 0; > > that seems a bit too generic a name. 'curinsert_flags'? OK. >> /* >> - * only log if enough time has passed and some xlog >> record has >> - * been inserted. >> + * Only log if enough time has passed, that some WAL >> activity >> + * has happened since last checkpoint, and that some >> new WAL >> + * records have been inserted since the last time we >> came here. > > I think that sentence needs some polish. Let's do this better: /* -* only log if enough time has passed and some xlog record has -* been inserted. +* Only log if one of the following conditions is satisfied since +* the last time we came here:: +* - timeout has been reached. +* - WAL activity has happened since last checkpoint. +* - New WAL records have been inserted. */ >>*/ >> if (now >= timeout && >> - last_snapshot_lsn != GetXLogInsertRecPtr()) >> + GetLastCheckpointRecPtr() < >> current_progress_lsn && >> + last_progress_lsn < current_progress_lsn) >> { > > Hm. I don't think it's correct to use GetLastCheckpointRecPtr() here? > Don't we need to do the comparisons here (and when doing the checkpoint > itself) with the REDO pointer of the last checkpoint? Hm? The progress pointer is pointing to the lastly inserted LSN, which is not the position of the REDO pointer, but the one of the checkpoint record. Doing a comparison of the REDO pointer would be a moot operation, because as the checkpoint completes, the progress LSN will be updated as well. Or do you mean that the progress LSN should *not* be updated for a checkpoint record? It seems to me that it should but... >> diff --git a/src/backend/postmaster/checkpointer.c >> b/src/backend/postmaster/checkpointer.c >>
Re: [HACKERS] Patch: Implement failover on libpq connect level.
Hi, Mithun Before going deeper into the patch, let me give you some findings. (1) PGconn->target_server_type is not freed in freePGconn(). (2) Could you add PGTARGETSERVERTYPE environment variable? Like other variables, it will ease testing, since users can change the behavior without altering the connection string here and there. (3) I think it would be better to expose the server state via ParameterStatus protocol message like standard_conforming_strings, instead of running "SELECT pg_is_in_recovery()". We shouldn't want to add one round trip to check the server type (master, standby). postmaster can return the server type based on its state (pmState); PM_RUN is master, and PM_HOT_STANDBY is standby. In addition, as an impractical concern, DBA can revoke EXECUTE privilege on pg_is_in_recovery() from non-superusers. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
I forgot to quote your comments in the email I sent on Friday [1], with new patches that do take care of the following comments. On 2016/11/11 4:04, Robert Haas wrote: > On Thu, Nov 10, 2016 at 7:40 AM, Amit Langote >> >> OK, "partition key" and "partitioning method" it is then. Source code >> comments, error messages, variables call the latter (partitioning) >> "strategy" though which hopefully is fine. > > Oh, I like "partitioning strategy". Can we standardize on that? Done. >> OK, I have removed the syntactic ability to specify INCLUSIVE/EXCLUSIVE >> with each of the range bounds. >> >> I haven't changed any code (such as comparison functions) that manipulates >> instances of PartitionRangeBound which has a flag called inclusive. I >> didn't remove the flag, but is instead just set to (is_lower ? true : >> false) when initializing from the parse node. Perhaps, there is some scope >> for further simplifying that code, which you probably alluded to when you >> proposed that we do this. > > Yes, you need to rip out all of the logic that supports it. Having > the logic to support it but not the syntax is bad because then that > code can't be properly tested. Agreed and done. Now the only thing that dictates the inclusivity of individual range bounds during comparisons (with other range bounds or partition key of tuples) is whether the bound is a lower bound or not; inclusive if, exclusive if not. Thanks, Amit [1] https://www.postgresql.org/message-id/8d7c35e3-1c85-33d0-4440-0a75bf9d31cd%40lab.ntt.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress
On Mon, Nov 14, 2016 at 12:49 PM, Kyotaro HORIGUCHIwrote: > At Sat, 12 Nov 2016 10:28:56 +0530, Amit Kapila > wrote in >> I think it is good to check the performance impact of this patch on >> many core m/c. Is it possible for you to once check with Alexander >> Korotkov to see if he can provide you access to his powerful m/c which >> has 70 cores (if I remember correctly)? I heard about a number like that, and there is no reason to not do tests to be sure. With that many cores we are more likely going to see the limitation of the number of XLOG insert slots popping up as a bottleneck, but that's just an assumption without any numbers. Alexander (added in CC), could it be possible to get an access to this machine? >> @@ -1035,6 +1038,7 @@ LogAccessExclusiveLocks(int nlocks, >> xl_standby_lock *locks) >> XLogBeginInsert(); >> XLogRegisterData((char *) , offsetof(xl_standby_locks, locks)); >> XLogRegisterData((char *) locks, nlocks * sizeof(xl_standby_lock)); >> + XLogSetFlags(XLOG_NO_PROGRESS); >> >> >> Is it right to set XLOG_NO_PROGRESS flag in LogAccessExclusiveLocks? >> This function is called not only in LogStandbySnapshot(), but during >> DDL operations as well where lockmode >= AccessExclusiveLock. > > This does not remove any record from WAL. So theoretically any > kind of record can be NO_PROGRESS, but practically as long as > checkpoints are not unreasonably suppressed. Any explicit > database operation must be accompanied with at least commit > record that triggers checkpoint. NO_PROGRESSing there doesn't > seem to me to harm database durability for this reason. > > The objective of this patch is skipping WALs on completely-idle > state and the NO_PROGRESSing is necessary to do its work. Of > course we can distinguish exclusive lock with PROGRESS and > without PROGRESS but it is unnecessary complexity. The point that applies here is that logging the exclusive lock information is necessary for the *standby* recovery conflicts, not the primary which is why it should not influence the checkpoint activity that is happening on the primary. So marking this record with NO_PROGRESS is actually fine to me. > But rethinking about the above, the namging of "XLOG_NO_PROGRESS" > might be inappropriate. "XLOG_NO_CKPT_TRIGGER" or any sainer name > would be needed. I got fond of NO_PROGRESS to be honest with the time, even if I don't like much the negative meaning it has... Perhaps something like XLOG_SKIP_PROGRESS would hold more meaning. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor improvement to delete.sgml
On 2016/10/19 2:51, Robert Haas wrote: On Fri, Oct 14, 2016 at 12:05 AM, Etsuro Fujitawrote: I think it's better to mention that an alias is needed for the target table specified in the USING clause of a DELETE statement, to set up a self-join, as the documentation on the from_list parameter of UPDATE does. Please find attached a patch. The statement you are proposing to add to the documentation isn't true. Consider a counterexample of DELETE doing a self-join of a target table: postgres=# create table t1 (c1 int); CREATE TABLE postgres=# insert into t1 values (1); INSERT 0 1 postgres=# delete from t1 using t1 where t1.c1 = t1.c1; ERROR: table name "t1" specified more than once Giving an alias to the target table t1 in the USING clause, postgres=# delete from t1 using t1 r1 where t1.c1 = r1.c1; DELETE 1 Am I missing something? Sorry for the delay. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress
Hello, thank you for the comment. At Sat, 12 Nov 2016 10:28:56 +0530, Amit Kapilawrote in > On Tue, Nov 8, 2016 at 5:18 PM, Kyotaro HORIGUCHI > wrote: > > Hello, > > > >> on something else than LW_EXCLUSIVE compared to now. To keep things > >> more simple I' would still favor using WALInsertLocks for this patch, > >> that looks more consistent, and also because there is no noticeable > >> difference. > > > > Ok, the patch looks fine. So there's nothing for me but to accept > > the current shape since the discussion about performance seems > > not to be settled with out performance measurement with machines > > with many cores. > > > > I think it is good to check the performance impact of this patch on > many core m/c. Is it possible for you to once check with Alexander > Korotkov to see if he can provide you access to his powerful m/c which > has 70 cores (if I remember correctly)? > > > @@ -1035,6 +1038,7 @@ LogAccessExclusiveLocks(int nlocks, > xl_standby_lock *locks) > XLogBeginInsert(); > XLogRegisterData((char *) , offsetof(xl_standby_locks, locks)); > XLogRegisterData((char *) locks, nlocks * sizeof(xl_standby_lock)); > + XLogSetFlags(XLOG_NO_PROGRESS); > > > Is it right to set XLOG_NO_PROGRESS flag in LogAccessExclusiveLocks? > This function is called not only in LogStandbySnapshot(), but during > DDL operations as well where lockmode >= AccessExclusiveLock. This does not remove any record from WAL. So theoretically any kind of record can be NO_PROGRESS, but practically as long as checkpoints are not unreasonably suppressed. Any explicit database operation must be accompanied with at least commit record that triggers checkpoint. NO_PROGRESSing there doesn't seem to me to harm database durability for this reason. The objective of this patch is skipping WALs on completely-idle state and the NO_PROGRESSing is necessary to do its work. Of course we can distinguish exclusive lock with PROGRESS and without PROGRESS but it is unnecessary complexity. But rethinking about the above, the namging of "XLOG_NO_PROGRESS" might be inappropriate. "XLOG_NO_CKPT_TRIGGER" or any sainer name would be needed. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sequences and pg_upgrade
On 11/2/16 2:34 AM, Michael Paquier wrote: > I had a look at those fresh patches, and 0001 looks like a good thing. > This makes the separation between sequences and table data dump > cleaner. I ran some tests with pg_upgrade and 0002, and things are > clear. And +1 for the way done in the patch, aka no options of pg_dump > exposed to user, still keep the option tracking as a separate value. Committed, thanks. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Implement failover on libpq connect level.
From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas > Great, committed. There's still potentially more work to be done here, > because my patch omits some features that were present in Victor's original > submission, like setting the failover timeout, optionally randomizing the > order of the hosts, and distinguishing between master and standby servers; > Victor, or anyone, please feel free to submit separate patches for those > things. The attached patch fixes some bugs and make a clarification for doc. Could you check and test the authentication stuff as I don't have an environment at hand? (1) fe-connect.c There was a memory leak. (2) fe-secure_openssl.c, fe-auth.c GSSAPI/SSPI/SSL authentication requires the target host name, but the code uses conn->pghost which contains a comma-separated list of host names. (3) libpq.sgml Added sentences to clarify connect_timeout when it is used with multiple hosts. BTW, let me two questions. Q1: Is there any reason why hostaddr doesn't accept multiple IP addresses? Q2: pg_isready (and hence PQping() family) reports success when one host is alive and other hosts are down. Is this intended? I think this behavior is correct. e.g. host1 is down and host2 is alive. $ pg_isready -h host1,host2 host1,host2:5450 - accepting connections $ echo $? 0 Regards Takayuki Tsunakawa libpq-failover-smallbugs.patch Description: libpq-failover-smallbugs.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Physical append-only tables
Magnus Hagander wrote: > But then consider the same table. Except rows are typically updated once or > twice when they are new, and *then* go read only. And we also have a > process that at some point deletes *some* old rows (but not all - in fact, > only a small portion). > > In this case, the next INSERT once VACUUM has run is likely to stick a > "new" row somewhere very "far back" in the table, since there is now free > space there. This more or less completely ruins the BRIN index usability, > as the "old" blocks will now contain a single row from a "new" series. Yeah. When we initially discussed BRIN, there was a mention of allowing a BRIN index to guide new tuple location -- something like auto-clustering, if you will. We haven't discussed the exact details but I think something along those lines is worth considering. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 11/12/2016 08:12 PM, Andres Freund wrote: Hi, Subject: [PATCH 1/2] slab allocator diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 6ad7e7d..520f295 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c I'd rather have that in a separate followup commit... + * IDENTIFICATION + * src/backend/utils/mmgr/slab.c + * + * + * The constant allocation size allows significant simplification and various + * optimizations that are not possible in AllocSet. Firstly, we can get rid + * of the doubling and carve the blocks into chunks of exactly the right size + * (plus alignment), not wasting memory. References to AllocSet aren't necessarily a good idea, they'll quite possibly get out of date. The argument can be quite easily be made without referring to a concrete reference to behaviour elsewhere. Yeah, that's probably true. + * + * At the context level, we use 'freelist' array to track blocks grouped by + * number of free chunks. For example freelist[0] is a list of completely full + * blocks, freelist[1] is a block with a single free chunk, etc. Hm. Those arrays are going to be quite large for small allocations w/ big blocks (an imo sensible combination). Maybe it'd make more sense to model it as a linked list of blocks? Full blocks are at one end, empty ones at the other? So there'd be one huge list of blocks, sorted by the number of empty chunks? Hmm, that might work I guess. I don't think the combination of large blocks with small allocations is particularly sensible, though - what exactly would be the benefit of such combination? I would even consider enforcing some upper limit on the number of chunks per block - say, 256, for example. + * About MEMORY_CONTEXT_CHECKING: + * + * Since we usually round request sizes up to the next power of 2, there + * is often some unused space immediately after a requested data area. I assume the "round up" stuff is copy-paste? Yeah, sorry about that. + * Thus, if someone makes the common error of writing past what they've + * requested, the problem is likely to go unnoticed ... until the day when + * there *isn't* any wasted space, perhaps because of different memory + * ... + * + * See also the cooperating Valgrind client requests in mcxt.c. I think we need a preliminary patch moving a lot of this into something like mcxt_internal.h. Copying this comment, and a lot of the utility functions, into every memory context implementation is a bad pattern. Yes, I planned to do that for the next version of patch. Laziness. +typedef struct SlabBlockData *SlabBlock; /* forward reference */ +typedef struct SlabChunkData *SlabChunk; Can we please not continue hiding pointers behind typedefs? It's a bad pattern, and that it's fairly widely used isn't a good excuse to introduce further usages of it. Why is it a bad pattern? +/* + * SlabContext is a specialized implementation of MemoryContext. + */ +typedef struct SlabContext +{ + MemoryContextData header; /* Standard memory-context fields */ + /* Allocation parameters for this context: */ + SizechunkSize; /* chunk size */ + SizefullChunkSize; /* chunk size including header and alignment */ + SizeblockSize; /* block size */ + int chunksPerBlock; /* number of chunks per block */ + int minFreeChunks; /* min number of free chunks in any block */ + int nblocks;/* number of blocks allocated */ + /* Info about storage allocated in this context: */ + SlabBlock freelist[1];/* free lists (block-level) */ I assume this is a variable-length array? If so, that a) should be documented b) use FLEXIBLE_ARRAY_MEMBER as length - not doing so actually will cause compiler warnings and potential misoptimizations. Will fix, thanks. +/* + * SlabBlockData + * Structure of a single block in SLAB allocator. + * + * slab: context owning this block What do we need this for? You're right the pointer to the owning context is unnecessary - there's nothing like "standard block header" and we already have the pointer in the standard chunk header. But maybe keeping the pointer at least with MEMORY_CONTEXT_CHECKING would be a good idea? + * prev, next: used for doubly-linked list of blocks in global freelist I'd prefer using an embedded list here (cf. ilist.h). Makes sense. +/* + * SlabChunk + * The prefix of each piece of memory in an SlabBlock + * + * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h. + * However it's possible to fields in front of the StandardChunkHeader fields, + * which is used to add pointer to the block. + */
Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?
From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Ashutosh Bapat > I have changed some comments around this block. See attached patch. > Let me know if that looks good. Thanks, it looks good. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ExplainOneQuery_hook ignored for EXPLAIN EXECUTE
Andres Freundwrites: > On 2016-11-13 17:20:05 -0500, Tom Lane wrote: >> Why do you care? It's a pretty specialized hook. > Citus currently uses it to output additional information for distributed > queries. I suspect we'll instead, for now, have to intercept EXPLAIN as > a whole then :( (moving to custom plans unfortunately is still a bit off > unfortunately). Seems like you might want a hook further down then, perhaps at ExplainPrintPlan. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ExplainOneQuery_hook ignored for EXPLAIN EXECUTE
On 2016-11-13 17:20:05 -0500, Tom Lane wrote: > Andres Freundwrites: > > I don't quite know what the hook in $subject was originally designed > > for, but right now it has the problem that it's not invoked for EXPLAIN > > EXECUTE. That's because ExplainExecuteQuery directly calls > > ExplainOnePlan instead of ExplainOneQuery_hook. > > > Unfortunately that's not entirely trivial to fix, because the hook > > accepts a Query, not a PlannedStmt. > > AFAIR, the purpose of that hook was to let index advisor plugins have > control of the planning step in an EXPLAIN, so that they could do things > like injecting hypothetical indexes. There isn't any easy way to do > something similar in EXPLAIN EXECUTE because the plan comes out of the > plancache. > Why do you care? It's a pretty specialized hook. Citus currently uses it to output additional information for distributed queries. I suspect we'll instead, for now, have to intercept EXPLAIN as a whole then :( (moving to custom plans unfortunately is still a bit off unfortunately). Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ExplainOneQuery_hook ignored for EXPLAIN EXECUTE
Andres Freundwrites: > I don't quite know what the hook in $subject was originally designed > for, but right now it has the problem that it's not invoked for EXPLAIN > EXECUTE. That's because ExplainExecuteQuery directly calls > ExplainOnePlan instead of ExplainOneQuery_hook. > Unfortunately that's not entirely trivial to fix, because the hook > accepts a Query, not a PlannedStmt. AFAIR, the purpose of that hook was to let index advisor plugins have control of the planning step in an EXPLAIN, so that they could do things like injecting hypothetical indexes. There isn't any easy way to do something similar in EXPLAIN EXECUTE because the plan comes out of the plancache. Why do you care? It's a pretty specialized hook. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 10/31/2016 06:38 AM, Petr Jelinek wrote: On 31/10/16 00:52, Steve Singer wrote: There are some fundamental issues with initial sync that need to be discussed on list but this one is not known. I'll try to convert this to test case (seems like useful one) and fix it, thanks for the report. In meantime I realized I broke the last patch in the series during rebase so attached is the fixed version. It also contains the type info in the protocol. Attached are some proposed documentation updates (to be applied ontop of your 20161031 patch set) Also Publication + +The tables are matched using fully qualified table name. Renaming of +tables or schemas is not supported. + Is renaming of tables any less supported than other DDL operations For example alter table nokey2 rename to nokey3 select * FROM pg_publication_tables ; pubname | schemaname | tablename -++--- tpub| public | nokey3 (1 row) If I then kill the postmaster on my subscriber and restart it, I get 2016-11-13 16:17:11.341 EST [29488] FATAL: the logical replication target public.nokey3 not found 2016-11-13 16:17:11.342 EST [29272] LOG: worker process: logical replication worker 41076 (PID 29488) exited with exit code 1 2016-11-13 16:17:16.350 EST [29496] LOG: logical replication apply for subscription nokeysub started 2016-11-13 16:17:16.358 EST [29498] LOG: logical replication sync for subscription nokeysub, table nokey2 started 2016-11-13 16:17:16.515 EST [29498] ERROR: table public.nokey2 not found on publisher 2016-11-13 16:17:16.517 EST [29272] LOG: worker process: logical replication worker 41076 sync 24688 (PID 29498) exited with exit code 1 but if I then rename the table on the subscriber everything seems to work. (I suspect the need to kill+restart is a bug, I've seen other instances where a hard restart of the subscriber following changes to is required) I am also having issues adding a table to a publication ,it doesn't seem work P: create publication tpub for table a; S: create subscription mysub connection 'host=localhost dbname=test port=5440' publication tpub; P: insert into a(b) values ('1'); P: alter publication tpub add table b; P: insert into b(b) values ('1'); P: insert into a(b) values ('2'); select * FROM pg_publication_tables ; pubname | schemaname | tablename -++--- tpub| public | a tpub| public | b but S: select * FROM b; a | b ---+--- (0 rows) S: select * FROM a; a | b ---+--- 5 | 1 6 | 2 (2 rows) diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml new file mode 100644 index 63af1a5..89723c3 *** a/doc/src/sgml/logical-replication.sgml --- b/doc/src/sgml/logical-replication.sgml *** *** 95,110 how the table is accessed. Each table can be added to multiple publications if needed. Publications may currently only contain tables. Objects must be added explicitly, except when a publication ! is created for ALL TABLES. There is no default name for ! a publication which specifies all tables. Publications can choose to limit the changes they produce to show any combination of INSERT, UPDATE and DELETE in a similar way to the way triggers are fired by ! particular event types. Only tables with a REPLICA IDENTITY ! index can be added to a publication which replicates UPDATE ! and DELETE operation. The definition of a publication object will be included within --- 95,110 how the table is accessed. Each table can be added to multiple publications if needed. Publications may currently only contain tables. Objects must be added explicitly, except when a publication ! is created for ALL TABLES. Publications can choose to limit the changes they produce to show any combination of INSERT, UPDATE and DELETE in a similar way to the way triggers are fired by ! particular event types. If a table with without a REPLICA IDENTITY ! index is added to a publication which replicates UPDATE or ! DELETE operations then subsequent UPDATE ! or DELETE operations will fail on the publisher. The definition of a publication object will be included within *** *** 195,214 A conflict will produce an error and will stop the replication; it ! must be resolved manually by the user. ! The resolution can be done either by changing data on the subscriber ! so that it does not conflict with incoming change or by skipping the ! transaction that conflicts with the existing data. The transaction ! can be skipped by calling the ! ! pg_replication_origin_advance() function ! with a node_name corresponding to the subscription name. The ! current position of origins can be seen in the ! !
Re: [HACKERS] Improving RLS planning
Stephen Frostwrites: > * Dean Rasheed (dean.a.rash...@gmail.com) wrote: >> +1 for that terminology and no renaming of fields. > Agreed. Here's an updated version of the RLS planning patch that gets rid of the incorrect interaction with Query.hasRowSecurity and adjusts terminology as agreed. regards, tom lane new-rls-planning-implementation-2.patch.gz Description: new-rls-planning-implementation-2.patch.gz -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why PostgreSQL doesn't implement a semi sync replication?
在 2016/11/12 上午11:27, Craig Ringer 写道: On 12 November 2016 at 02:12, Petr Jelinekwrote: On 11/11/16 16:03, Francisco Olarte wrote: On Fri, Nov 11, 2016 at 4:40 AM, 余森彬 wrote: As we know, the synchronous commit process is blocked while receives from acknowledgement from standby in PostgreSQL.This is good for data consistence in master and standby, and application can get important data from standby.But when the standby crash or network goes wrong, the master could be hang.Is there a feature plan for a semi sync like MySQL InnoDB(set a timer, and become asynchronous when timeout)? JMO, but it seems this basically means any process should be dessigned to cope with the posibility of not having replicated data after commit, so, why bother with synchronous replication in the first place? It's often more acceptable to say "we lose data when 2 servers die (or are in problems)" than "we lose data when 1 server dies" and it's also more acceptable to say "we stop answering when we lose 2 servers" but not "we stop answering when we lose 1 server", and semisync replication works for combination of these two. Yep. Also, monitoring. sync with a short timeout means you can usually rely on sync rep, and if it times out and falls back to async your monitoring system can start screaming at you. I think k= replication will help quite a bit with this though. In my opinion, the semantic of synchronous commit design is to handle sync replication and provide high data consistence and availablity. Meanwhile a time limit is also vaulable within this synchronous commit process, which may protect things from getting wrose when some problems occur bewteen master and sync standbys. Like option ' idle_in_transaction_session_timeout' in PostgreSQL 9.6, with a timeout, making transaction commit failed at bad condition rather than hanging untill things gets beteer.This doesn't break the synchronous commit semantic while becoming less unpredictable.
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
> We can remove the fuzz factor altogether but I think we also > should provide a means usable to do similar things. At least "is > a point on a line" might be useless for most cases without any > fuzzing feature. (Nevertheless, it is a problem only when it is > being used to do that:) If we don't find reasonable policy on > fuzzing operations, it would be the proof that we shouldn't > change the behavior. It was my initial idea to keep the fuzzy comparison behaviour on some places, but the more I get into I realised that it is almost impossible to get this right. Instead, I re-implemented some operators to keep precision as much as possible. The previous "is a point on a line" operator would *never* give the true result without the fuzzy comparison. The new implementation would return true, when precision is not lost. I think this is a behaviour people, who are working with floating points, are prepared to deal with. By the way, "is a point on a line" operator is quite wrong with the fuzzy comparison at the moment [1]. > The 0001 patch adds many FP comparison functions individually > considering NaN. As the result the sort order logic involving NaN > is scattered around into the functions, then, you implement > generic comparison function using them. It seems inside-out to > me. Defining ordering at one place, then comparison using it > seems to be reasonable. I agree that it would be simpler to use the comparison function for implementing other operators. I have done it other way around to make them more optimised. They are called very often. I don't think checking exit code of the comparison function would be optimised the same way. I could leave the comparison functions as they are, but re-implemented them using the others to keep documentation of NaN comparison in the single place. > If the center somehow goes extremely near to the origin, it could > result in a false error. > >> =# select @@ box'(-8e-324, -8e-324), (4.9e-324, 4.9e-324)'; >> ERROR: value out of range: underflow > > I don't think this underflow is an error, and actually it is a > change of the current behavior without a reasonable reason. More > significant (and maybe unacceptable) side-effect is that it > changes the behavior of ordinary operators. I don't think this is > acceptable. More consideration is needed. > >> =# select ('-8e-324'::float8 + '4.9e-324'::float8) / 2.0; >> ERROR: value out of range: underflow This is the current behaviour of float datatype. My patch doesn't change that. This problem would probably also apply to multiplying very small values. I agree that this is not the ideal behaviour. Though I am not sure, if we should go to a different direction than the float datatypes. I think there is value in making geometric types compatible with the float. Users are going to mix them, anyway. For example, users can calculate the center of a box manually, and confuse when the built-in operator behaves differently. > In regard to fuzzy operations, libgeos seems to have several > types of this kind of feature. (I haven't looked closer into > them). Other than reducing precision seems overkill or > unappliable for PostgreSQL bulitins. As Jim said, can we replace > the fixed scale fuzz factor by precision reduction? Maybe, with a > GUC variable (I hear someone's roaring..) to specify the amount > defaults to fit the current assumption. I am disinclined to try to implement something complicated for the geometric types. I think they are mostly useful for 2 purposes: uses simple enough to not worth looking for better solutions, and demonstrating our indexing capabilities. The inconsistencies harm both of those. [1] https://www.postgresql.org/message-id/flat/CAE2gYzw_-z%3DV2kh8QqFjenu%3D8MJXzOP44wRW%3DAzzeamrmTT1%3DQ%40mail.gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvements in psql hooks for variables
>ECHO_HIDDEN differs from the generic boolean case because it also >accepts "noexec" and pset.echo_hidden is an enum, not a boolean. When >considering refactoring echo_hidden_hook() to call >generic_boolean_hook() instead of ParseVariableBool() after >having established that the value is not "noexec", I don't see >any advantage in clarity or code size, so I'm not in favor of that change I agree if generic_boolean_hook() is used in its current form instead of ParseVariableBool() inside echo_hidden_hook or on_error_rollback_hook the code will not change much. I was suggesting change on the lines of extending generic_boolean_hook to include enum(part in enum_hooks which calls ParseVariableBool) and integer parsing as well. >Besides, I don't think it would go well with hooks. If we wanted one >big function that knows all about parsing all built-in variables, we >could just as well dispense with hooks, since their current purpose in >psql is to achieve this parsing, but in a decentralized way. >Or if we keep them, our various built-in variables would be >essentially tied to the same one-big-hook-that-does-all, but isn't >that an antipattern for hooks? I was suggesting something on the lines of having common parsing logic for each implementation of hook. This is similar to what ParseVariableBool does in the existing code. ParseVariableBool is being called from different hooks to parse the boolean value of the variable. Thus representing common code in various implementations of hook. >"hook" is changed by the patch from [pointer to function returning >void], to [pointer to function returning bool]. The cast to void is >not essential, it just indicates that we deliberately want to >ignore the return value here. I expect some compilers might >complain under a high level of warnings without this cast, although >TBH if you ask me, I wouldn't know which compiler with which flags >exactly. Thank you for explanation. Thank you, Rahila Syed On Fri, Nov 11, 2016 at 2:57 AM, Daniel Veritewrote: > Rahila Syed wrote: > > > I have applied this patch on latest HEAD and have done basic testing > which > > works fine. > > Thanks for reviewing this patch! > > > >if (current->assign_hook) > > >- (*current->assign_hook) (current->value); > > >- return true; > > >+ { > > >+ confirmed = (*current->assign_hook) (value); > > >+ } > > >+ if (confirmed) > > > > Spurious brackets > > OK. > > > >static bool > > >+generic_boolean_hook(const char *newval, const char* varname, bool > *flag) > > >+{ > > > > Contrary to what name suggests this function does not seem to have other > > implementations as in a hook. > > Also this takes care of rejecting a syntactically wrong value only for > > boolean variable hooks like autocommit_hook, > > on_error_stop_hook. However, there are other variable hooks which call > > ParseVariableBool. > > For instance, echo_hidden_hook which is handled separately in the patch. > > Thus there is some duplication of code between generic_boolean_hook and > > echo_hidden_hook. > > Similarly for on_error_rollback_hook. > > The purpose of generic_boolean_hook() is to handle the case of a > boolean variable that only accepts ON or OFF, and has its pset.varname > declared as bool. I thought of this case as "generic" because that's > the base case and several variables need no more than that. > > ECHO_HIDDEN differs from the generic boolean case because it also > accepts "noexec" and pset.echo_hidden is an enum, not a boolean. When > considering refactoring echo_hidden_hook() to call > generic_boolean_hook() instead of ParseVariableBool() after > having established that the value is not "noexec", I don't see > any advantage in clarity or code size, so I'm not in favor of that change. > > The same applies to on_error_rollback_hook(), which has to deal > with a specific enum as well. > > > >-static void > > >+static bool > > > fetch_count_hook(const char *newval) > > > { > > >pset.fetch_count = ParseVariableNum(newval, -1, -1, false); > > >+ return true; > > > } > > > > Shouldn't invalid numeric string assignment for numeric variables be > > handled too? > > Agreed. Assignments like "\set FETCH_COUNT bogus" don't provoke any > user feedback currently, which is not ideal. I'll add this in a > v3 of the patch tomorrow. > > > Instead of generic_boolean_hook cant we have something like follows which > > like generic_boolean_hook can be called from specific variable assignment > > hooks, > > > > static bool > > ParseVariable(newval, VariableName, ) > > { > > if (VariableName == ‘AUTOCOMMIT’ || ECHO_HIDDEN || other variable > with > > hooks which call ParseVariableBool ) > > > > ON_ERROR_ROLLBACK> > > else if (VariableName == ‘FETCH_COUNT’) > > ParseVariableNum(); > > } > > It's not possible because pset.var corresponds to different fields from > struct _psqlSettings
[HACKERS] ExplainOneQuery_hook ignored for EXPLAIN EXECUTE
Hi, I don't quite know what the hook in $subject was originally designed for, but right now it has the problem that it's not invoked for EXPLAIN EXECUTE. That's because ExplainExecuteQuery directly calls ExplainOnePlan instead of ExplainOneQuery_hook. Unfortunately that's not entirely trivial to fix, because the hook accepts a Query, not a PlannedStmt. Does anybody have a good way to fix this? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Physical append-only tables
On Mon, Nov 14, 2016 at 4:45 AM, Magnus Haganderwrote: > For a scenario like this, would it make sense to have an option that could > be set on an individual table making it physical append only? Basically > VACUUM would run as normal and clean up the old space when rows are deleted > back in history, but when new space is needed for a row the system would > never look at the old blocks, and only append to the end. One thing I have idly wondered about: if you had an append-only mode that guarantees that we never write new tuples anywhere but the end of the heap, you could create a bsearch access method that uses zero storage and simply checks that every key inserted is >= the previous high key before allowing the insertion to proceed. Then it could make use of the guaranteed correlation with physical order to do scans using a binary search of the heap. Maybe that'd be useful for some kinds of write-only time-series data that needs to be searched by time range. On the other hand, BRIN indexes are tiny, should be nearly as good and are much less restrictive, so I haven't follow this thought up. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tackling JsonPath support
Hi 2016-11-13 15:14 GMT+01:00 Christian Convey: > Hi Pavel, > > Can I check a few assumptions about what you're suggesting for this task? > > * Our ultimate goal is to give Postgres an implementation of the functions > "JSON_EXISTS", "JSON_VALUE", and "JSON_QUERY" which fully comply with the > SQL standards. > > * The best representation of those standards is found here: [1]. > > * When [1] mentions a "JSON path expression" or "JSON path language", it's > referring to the query language described here: [2]. > > * Even if other popular DBMS's deviate from [1], or other popular JSONPath > implementations deviate from [2], we remain committed to a faithful > implementation of [1]. > > * It's okay for my first commit to implement just two things: (a) a > PG-internal implementation of JsonPath, and (b) a user-visible > implementation of "JSON_QUERY" based on (a). Later commits could add > implementations of "JSON_VALUE", "JSON_EXISTS", etc. in terms of (a). > My goal is implementation of JSON_TABLE function - this function can be used instead any other mentioned function (and it is really useful - it is usual task - transform JSON to table). The SQL/JSON is pretty new and bigger for implementation in one step. Nobody knows it from PostgreSQL world. The our SQL/XML needed more than 10 years and still is not fully complete - and we used power and features libxml2 (nothing similar we have for JSON). But almost what is daily need from SQL/XML we have. For JSON_TABLE we need only basic features of JSONPath - the predicates are not necessary in first step. http://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/json/src/tpc/db2z_bif_jsontable.html The vendors use name for this query language "SQL/JSON path expressions" - so important source is SQL/JSON (this can be different than origin JSONPath (your second source)). Regards Pavel Regards Pavel > Thanks, > Christian > > [1] http://jtc1sc32.org/doc/N2501-2550/32N2528-WG3- > Tutorial-Opening-Plenary.pdf > > [2] http://goessner.net/articles/JsonPath > > > On Fri, Sep 16, 2016 at 2:28 AM, Pavel Stehule > wrote: > >> Hi >> >> 2016-09-15 18:05 GMT+02:00 Christian Convey : >> >>> On Mon, Sep 5, 2016 at 1:44 PM, Pavel Stehule >>> wrote: >>> ... >>> >>> > I wrote XMLTABLE function, and I am thinking about JSON_TABLE >>> function. But >>> > there is one blocker - missing JsonPath support in our JSON >>> implementation. >>> > >>> > So one idea - implement JsonPath support and related JSON query >>> functions. >>> > This can help with better standard conformance. >>> >>> Hi Pavel, >>> >>> Are you still looking for someone to add the JsonPath support to the >>> JSON implementation? And if so, how urgently are people waiting for >>> it? >>> >> >> yes - JsonPath support should be great. I hope so this or next commitfest >> the XMLTABLE patch will be committed, and with JsonPath I can start to work >> on JSON_TABLE function. >> >> But the JsonPath can be merged separately without dependency to >> JSON_TABLE. There are more JSON searching functions, and these functions >> should to support JsonPath be ANSI SQL compliant. >> >> >>> >>> I'd be happy to start working on a patch, but since I'm new to PG >>> development, I'm probably not the fastest person to get it done. >>> >> >> It is not problem. Probably you should to do this work without deep >> knowledges about PostgreSQL internals. The work with data types (and >> functions for data types) is well isolated from PostgreSQL engine. >> >> You can learn from current searching on JSON - >> postgresql/src/backend/utils/adt/json.c >> >> And it is good start to be PostgreSQL's hacker - I started with >> implementation of own data type and related functions. >> >> Regards >> >> Pavel >> >> >>> Kind regards, >>> Christian >>> >> >> >
Re: [HACKERS] proposal: psql \setfileref
Hi I am sending a initial implementation of psql content commands. This design is reaction to Tom's objections against psql file ref variables. This design is more cleaner, more explicit and more practical - import can be in one step. Now supported commands are: r - read file without any modification rq - read file, escape as literal and use outer quotes rb - read binary file - transform to hex code string rbq - read binary file, transform to hex code string and use outer quotes Usage: create table testt(a xml); insert into test values( {rq /home/pavel/.local/share/rhythmbox/rhythmdb.xml} ); \set xxx {r /home/pavel/.local/share/rhythmbox/rhythmdb.xml} This patch is demo of this design - one part is redundant - I'll clean it in next iteration. Regards Pavel diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index a7789df..dbf3ffa 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -12,6 +12,7 @@ #include #include #include +#include #ifndef WIN32 #include /* for write() */ #else @@ -168,6 +169,142 @@ psql_get_variable(const char *varname, bool escape, bool as_ident) return result; } +/* + * file-content-fetching callback for flex lexer. + */ +char * +psql_get_file_content(const char *filename, bool escape, bool binary, bool quote) +{ + FILE *fd; + char *fname = pg_strdup(filename); + char *result = NULL; + + expand_tilde(); + canonicalize_path(fname); + + fd = fopen(fname, PG_BINARY_R); + if (fd) + { + struct stat fst; + + if (fstat(fileno(fd), ) != -1) + { + if (S_ISREG(fst.st_mode)) + { +if (fst.st_size <= ((int64) 1024) * 1024 * 1024) +{ + size_tsize; + PQExpBufferData raw_data; + charbuf[512]; + + initPQExpBuffer(_data); + + if (!escape && quote) + appendPQExpBufferChar(_data, '\''); + + while ((size = fread(buf, 1, sizeof(buf), fd)) > 0) + appendBinaryPQExpBuffer(_data, buf, size); + + if (!ferror(fd) && !(PQExpBufferDataBroken(raw_data))) + { + if (escape) + { + if (binary) + { +unsigned char *escaped_value; +size_t escaped_size; + +escaped_value = PQescapeByteaConn(pset.db, + (const unsigned char *) raw_data.data, raw_data.len, + _size); + +if (escaped_value != NULL) +{ + if (quote) + { + PQExpBufferData finalbuf; + + initPQExpBuffer(); + appendPQExpBufferChar(, '\''); + appendBinaryPQExpBuffer(, +(const char *) escaped_value, escaped_size - 1); + appendPQExpBufferChar(, '\''); + PQfreemem(escaped_value); + + result = finalbuf.data; + } + else + result = (char *) escaped_value; +} +else + psql_error("%s\n", PQerrorMessage(pset.db)); + } + else + { +/* escape text */ +if (quote) +{ + result = PQescapeLiteral(pset.db, +raw_data.data, raw_data.len); + if (result == NULL) + psql_error("%s\n", PQerrorMessage(pset.db)); +} +else +{ + int error; + + result = pg_malloc(raw_data.len * 2 + 1); + PQescapeStringConn(pset.db, result, raw_data.data, raw_data.len, ); + if (error) + { + psql_error("%s\n", PQerrorMessage(pset.db)); + PQfreemem(result); + result = NULL; + } +} + } + } + else + { + /* returns raw data, without any transformations */ + if (quote) +appendPQExpBufferChar(_data, '\''); + + if (PQExpBufferDataBroken(raw_data)) +psql_error("out of memory\n"); + else +result = raw_data.data; + } + } + else + { + if (PQExpBufferDataBroken(raw_data)) + psql_error("out of memory\n"); + else + psql_error("%s: %s\n", fname, strerror(errno)); + } + + if (result != raw_data.data) + termPQExpBuffer(_data); +} +else + psql_error("%s is too big (greather than 1GB)\n", fname); + } + else +psql_error("%s is not regular file\n", fname); + } + else + psql_error("%s: %s\n", fname, strerror(errno)); + + fclose(fd); + } + else + psql_error("%s: %s\n", fname, strerror(errno)); + + PQfreemem(fname); + + return result; +} /* * Error reporting for scripts. Errors should look like diff --git a/src/bin/psql/common.h b/src/bin/psql/common.h index bdcb58f..6b8ae67 100644 --- a/src/bin/psql/common.h +++ b/src/bin/psql/common.h @@ -19,6 +19,7 @@ extern bool openQueryOutputFile(const char *fname, FILE **fout, bool *is_pipe); extern bool setQFout(const char *fname); extern char *psql_get_variable(const char *varname, bool escape, bool as_ident); +extern char *psql_get_file_content(const char *filename, bool escape, bool binary, bool quote); extern void psql_error(const char *fmt,...) pg_attribute_printf(1, 2); diff --git
Re: [HACKERS] Contains and is contained by operators of inet datatypes
On 11/13/2016 01:21 PM, Emre Hasegeli wrote: Thank you for the review. New version is attached. Nice, I am fine with this version of the patch. Setting it to ready for committer! Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tackling JsonPath support
2016-11-13 18:13 GMT+01:00 Tom Lane: > Christian Convey writes: > > * Our ultimate goal is to give Postgres an implementation of the > functions > > "JSON_EXISTS", "JSON_VALUE", and "JSON_QUERY" which fully comply with the > > SQL standards. > > * The best representation of those standards is found here: [1]. > > [1] > > http://jtc1sc32.org/doc/N2501-2550/32N2528-WG3-Tutorial- > Opening-Plenary.pdf > > You're going to need to find a draft standard somewhere, as that > presentation is too thin on details to support writing an actual > implementation. In particular, it's far from clear that this is > true at all: > > > * When [1] mentions a "JSON path expression" or "JSON path language", > it's > > referring to the query language described here: [2]. > > [2] http://goessner.net/articles/JsonPath > > The one slide they have on the path language mentions a lax/strict syntax > that I don't see either in the document you mention or in the Wikipedia > XPath article it links to. This does not give me a warm feeling. The SQL > committee is *fully* capable of inventing their own random path notation, > especially when there's no ISO-blessed precedent to bind them. > > In general, the stuff I see in these WG3 slides strikes me as pretty > horribly designed. The committee is evidently still stuck on the idea > that every feature they invent should have a bunch of new bespoke syntax > for function calls, which is a direction we really don't want to go in > because of the parser overhead and need for more fully-reserved keywords. > For instance: > WHERE JSON_EXISTS (T.J, 'strict $.where' FALSE ON ERROR) > Really? Who thought that was a better idea than a simple bool parameter? > > I have no objection to providing some functions that implement XPath-like > tests for JSON, but I'm not sure that you ought to try to tie it to > whatever the SQL committee is going to do, especially when they've not > published a finished standard yet. You may be chasing a moving target. > > As for whether JSONPath is the right spec to follow, I'm not sure. > The article you mention is from 2007 and I don't see all that many > other references in a Google search. I found this Wikipedia page: > https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats > which mentions half a dozen competitors, including "JSON Pointer" > which has at least gotten as far as being an RFC standard: > https://tools.ietf.org/html/rfc6901 > I'm not up enough on the JSON ecosystem to know which of these has the > most traction, but I'm unconvinced that it's JSONPath. > We can use some other databases with this implementation as references. I have to agree, so the people in SQL committee are not too consistent - and sometimes creates too cobolish syntax, but it is standard - and it is implemented by major vendors. We doesn't need to implement full API - not in first step - important point is don't close door to possible ANSI conformance. In first step we can take the best and important from standard. It can be similar to our SQL/XML implementation - we implement maybe 75% - and only XPath instead XQuery, but I don't feel any weak. I see very useful "JSON_TABLE" function, which is good for start. Regards Pavel > regards, tom lane >
Re: [HACKERS] Do we need use more meaningful variables to replace 0 in catalog head files?
Andres Freundwrites: > On 2016-11-13 11:23:09 -0500, Tom Lane wrote: >> We can't use CREATE FUNCTION as the representation in the .bki file, >> because of the circularities involved (you can't fill pg_proc before >> pg_type nor vice versa). But I think Peter was suggesting that the >> input to the bki-generator script could look like CREATE commands. >> That's true, but I fear it would greatly increase the complexity >> of the script for not much benefit. > It'd also be very pg_proc specific, which isn't where I think this > should go.. The presumption is that we have a CREATE command for every type of object that we need to put into the system catalogs. But yes, the other problem with this approach is that you need to do a lot more work per-catalog to build the converter script. I'm not sure how much of that could be imported from gram.y, but I'm afraid the answer would be "not enough". regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tackling JsonPath support
Christian Conveywrites: > * Our ultimate goal is to give Postgres an implementation of the functions > "JSON_EXISTS", "JSON_VALUE", and "JSON_QUERY" which fully comply with the > SQL standards. > * The best representation of those standards is found here: [1]. > [1] > http://jtc1sc32.org/doc/N2501-2550/32N2528-WG3-Tutorial-Opening-Plenary.pdf You're going to need to find a draft standard somewhere, as that presentation is too thin on details to support writing an actual implementation. In particular, it's far from clear that this is true at all: > * When [1] mentions a "JSON path expression" or "JSON path language", it's > referring to the query language described here: [2]. > [2] http://goessner.net/articles/JsonPath The one slide they have on the path language mentions a lax/strict syntax that I don't see either in the document you mention or in the Wikipedia XPath article it links to. This does not give me a warm feeling. The SQL committee is *fully* capable of inventing their own random path notation, especially when there's no ISO-blessed precedent to bind them. In general, the stuff I see in these WG3 slides strikes me as pretty horribly designed. The committee is evidently still stuck on the idea that every feature they invent should have a bunch of new bespoke syntax for function calls, which is a direction we really don't want to go in because of the parser overhead and need for more fully-reserved keywords. For instance: WHERE JSON_EXISTS (T.J, 'strict $.where' FALSE ON ERROR) Really? Who thought that was a better idea than a simple bool parameter? I have no objection to providing some functions that implement XPath-like tests for JSON, but I'm not sure that you ought to try to tie it to whatever the SQL committee is going to do, especially when they've not published a finished standard yet. You may be chasing a moving target. As for whether JSONPath is the right spec to follow, I'm not sure. The article you mention is from 2007 and I don't see all that many other references in a Google search. I found this Wikipedia page: https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats which mentions half a dozen competitors, including "JSON Pointer" which has at least gotten as far as being an RFC standard: https://tools.ietf.org/html/rfc6901 I'm not up enough on the JSON ecosystem to know which of these has the most traction, but I'm unconvinced that it's JSONPath. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Do we need use more meaningful variables to replace 0 in catalog head files?
On 2016-11-13 11:23:09 -0500, Tom Lane wrote: > Andres Freundwrites: > > On 2016-11-13 00:20:22 -0500, Peter Eisentraut wrote: > >> Then we're not very far away from just using CREATE FUNCTION SQL commands. > > > Well, those do a lot of syscache lookups, which in turn do lookups for > > functions... > > We can't use CREATE FUNCTION as the representation in the .bki file, > because of the circularities involved (you can't fill pg_proc before > pg_type nor vice versa). But I think Peter was suggesting that the > input to the bki-generator script could look like CREATE commands. > That's true, but I fear it would greatly increase the complexity > of the script for not much benefit. It'd also be very pg_proc specific, which isn't where I think this should go.. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Do we need use more meaningful variables to replace 0 in catalog head files?
On 2016-11-13 11:11:37 -0500, Tom Lane wrote: > 1. Are we going to try to keep these things in the .h files, or split > them out? I'd like to get them out, as that eliminates both the need > to keep the things looking like macro calls, and the need for the data > within the macro call to be at least minimally parsable as C. I vote for splitting them out. > 2. Andrew's example above implies some sort of mapping between the > keywords and the actual column names (or at least column positions). > Where and how is that specified? I don't know what andrew was planning, but before I stopped I had a 1:1 mapping beteween column names and keywords. Catalog.pm parses the pg_*.h headers and thus knows the table definition via the CATALOG() stuff. > 3. Also where are we going to provide the per-column default values? That's a good question, I suspect we should move that knowledge to the headers as well. Possibly using something like BKI_DEFAULT(...)? > How does the converter script know which columns to convert to type oids, > proc oids, etc? I simply had that based on the underlying reg* type. I.e. if a column was regtype the script would map it to type oids and so on. That required some type changes, which does have some compatibility concerns. > Is it going to do any data validation beyond that, and if so on what basis? Hm, not sure if we really need something. > 4. What will we do about the #define's that some of the .h files provide > for (some of) their object OIDs? I assume that we want to move in the > direction of autogenerating those macros a la fmgroids.h, but this needs > a concrete spec as well. I suspect at least type oids we'll continue to have to maintain manually. A good number of things rely on the builtin type oids being essentially stable. > > If we can generalize this to other catalogs, then that will be good, but > > my inclination is to handle the elephant in the room (pg_proc.h) and > > worry about the gnats later. > > I think we want to do them all. +1 Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Do we need use more meaningful variables to replace 0 in catalog head files?
Andres Freundwrites: > On 2016-11-13 00:20:22 -0500, Peter Eisentraut wrote: >> Then we're not very far away from just using CREATE FUNCTION SQL commands. > Well, those do a lot of syscache lookups, which in turn do lookups for > functions... We can't use CREATE FUNCTION as the representation in the .bki file, because of the circularities involved (you can't fill pg_proc before pg_type nor vice versa). But I think Peter was suggesting that the input to the bki-generator script could look like CREATE commands. That's true, but I fear it would greatly increase the complexity of the script for not much benefit. It does little for the question of "how do you update the data when adding a new pg_proc column", for instance. And you'd still need some non-SQL warts, like how to specify manually-assigned OIDs for types and functions. (I'm not sure whether we could get away with dropping fixed assignments of function OIDs, but we absolutely can't do so for types. Lots of client code knows that text is oid 25, for example.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Do we need use more meaningful variables to replace 0 in catalog head files?
Andrew Dunstanwrites: > I'm not convinced the line prefix part is necessary, though. What I'm > thinking of is something like this: > PROCDATA( oid=1242 name=boolin isstrict=t volatile=i parallel=s nargs=1 > rettype=bool argtypes="cstring" src=boolin ); We could go in that direction too, but the apparent flexibility to split entries into multiple lines is an illusion, at least if you try to go beyond a few lines; you'd end up with duplicated line sequences in different entries and thus ambiguity for patch(1). I don't have any big objection to the above, but it's not obviously better either. Some things we should try to resolve before settling definitively on a data representation: 1. Are we going to try to keep these things in the .h files, or split them out? I'd like to get them out, as that eliminates both the need to keep the things looking like macro calls, and the need for the data within the macro call to be at least minimally parsable as C. 2. Andrew's example above implies some sort of mapping between the keywords and the actual column names (or at least column positions). Where and how is that specified? 3. Also where are we going to provide the per-column default values? How does the converter script know which columns to convert to type oids, proc oids, etc? Is it going to do any data validation beyond that, and if so on what basis? 4. What will we do about the #define's that some of the .h files provide for (some of) their object OIDs? I assume that we want to move in the direction of autogenerating those macros a la fmgroids.h, but this needs a concrete spec as well. If we don't want this change to result in a big hit to the source code, we're probably going to need to be able to specify the macro names to generate in the data files. 5. One of the requirements that was mentioned in previous discussions was to make it easier to add new columns to catalogs. This format does that only to the extent that you don't have to touch entries that can use the default value for such a column. Is that good enough, and if not, what might we be able to do to make it better? > I'd actually like to roll up the DESCR lines in pg_proc.h into this too, > they strike me as a bit of a wart. But I'm flexible on that. +1, if we can come up with a better syntax. This together with the OID-macro issue suggests that there will be items in each data entry that correspond to something other than columns of the target catalog. But that seems fine. > If we can generalize this to other catalogs, then that will be good, but > my inclination is to handle the elephant in the room (pg_proc.h) and > worry about the gnats later. I think we want to do them all. pg_proc.h is actually one of the easier catalogs to work on presently, IMO, because the only kind of cross-references it has are type OIDs. Things like pg_amop are a mess. And I really don't want to be dealing with multiple notations for catalog data. Also I think this will be subject to Polya's paradox: designing a general solution will be easier and cleaner than a hack that works only for one catalog. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Physical append-only tables
Scenario: Large table, *mostly* insert+select only, either a sequence or a timestamp column set to current time. This is an ideal usecase for a BRIN index. Awesome. But then consider the same table. Except rows are typically updated once or twice when they are new, and *then* go read only. And we also have a process that at some point deletes *some* old rows (but not all - in fact, only a small portion). In this case, the next INSERT once VACUUM has run is likely to stick a "new" row somewhere very "far back" in the table, since there is now free space there. This more or less completely ruins the BRIN index usability, as the "old" blocks will now contain a single row from a "new" series. For a scenario like this, would it make sense to have an option that could be set on an individual table making it physical append only? Basically VACUUM would run as normal and clean up the old space when rows are deleted back in history, but when new space is needed for a row the system would never look at the old blocks, and only append to the end. Yes, this wastes space in the database. But not as much as trying to track the deleted rows somewhere else and do an anti-join or something like that. And it would obviously not be on by default. Eventually the lost space might grow to a point where re-CLUSTERing the table might be worthwhile. But given the cost of that, I can see many scenarios where people are just willing to pay that extra overhead on large tables that are more or less never deleted from. I've run into a number of cases recently where this would've made the BRIN indexes on huge tables *much* more efficient. At least, it seems to me they would :) Or am I missing something that would make this not work? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Do we need use more meaningful variables to replace 0 in catalog head files?
On 11/13/2016 04:54 AM, Andres Freund wrote: On 2016-11-12 12:30:45 -0500, Andrew Dunstan wrote: On 11/11/2016 11:10 AM, Tom Lane wrote: boolin: OID=1242 proname=boolin proargtypes="cstring" prorettype=bool boolin: prosrc=boolin provolatile=i proparallel=s I have written a little perl script to turn the pg_proc DATA lines into something like the format suggested. In order to keep the space used as small as possible, I used a prefix based on the OID. See attached result. Still plenty of work to go, e.g. grabbing the DESCR lines, and turning this all back into DATA/DESCR lines, but I wanted to get this out there before going much further. The defaults I used are below (commented out keys are not defaulted, they are just there for completeness). In the referenced thread I'd started to work on something like this, until other people also said they'd be working on it. I chose a different output format (plain Data::Dumper), but I'd added the parsing of DATA/DESCR and such to genbki. Note that I found that initdb performance is greatly increased *and* legibility is improvided, if types and such in the data files are expanded, and converted to their oids when creating postgres.bki. Yeah, I have the type name piece, it was close to trivial. I just read in pg_type.h and stored the names/oids in a hash. Data::Dumper is too wasteful of space. The thing I like about Tom's format is that it's nice and concise. I'm not convinced the line prefix part is necessary, though. What I'm thinking of is something like this: PROCDATA( oid=1242 name=boolin isstrict=t volatile=i parallel=s nargs=1 rettype=bool argtypes="cstring" src=boolin ); Teaching Catalog.pm how to parse that and turn the type names back into oids won't be difficult. I already have code for the prefix version, and this would be easier since there is an end marker. I'd actually like to roll up the DESCR lines in pg_proc.h into this too, they strike me as a bit of a wart. But I'm flexible on that. If we can generalize this to other catalogs, then that will be good, but my inclination is to handle the elephant in the room (pg_proc.h) and worry about the gnats later. I basically made genbki/catalog.pm accept text whenever a column is of type regtype/regprocedure/. To then make use of that I converted a bunch of plain oid columns to their their reg* equivalent. That's also nice for just plain qureying of the catalogs ;) I don't think the code is going to be much use for you directlky, but it might be worthwhile to crib some stuff from the 0002 of the attached patches (based on 74811c4050921959d54d42e2c15bb79f0e2c37f3). Thanks, I will take a look. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tackling JsonPath support
Hi Pavel, Can I check a few assumptions about what you're suggesting for this task? * Our ultimate goal is to give Postgres an implementation of the functions "JSON_EXISTS", "JSON_VALUE", and "JSON_QUERY" which fully comply with the SQL standards. * The best representation of those standards is found here: [1]. * When [1] mentions a "JSON path expression" or "JSON path language", it's referring to the query language described here: [2]. * Even if other popular DBMS's deviate from [1], or other popular JSONPath implementations deviate from [2], we remain committed to a faithful implementation of [1]. * It's okay for my first commit to implement just two things: (a) a PG-internal implementation of JsonPath, and (b) a user-visible implementation of "JSON_QUERY" based on (a). Later commits could add implementations of "JSON_VALUE", "JSON_EXISTS", etc. in terms of (a). Thanks, Christian [1] http://jtc1sc32.org/doc/N2501-2550/32N2528-WG3-Tutorial-Opening-Plenary.pdf [2] http://goessner.net/articles/JsonPath On Fri, Sep 16, 2016 at 2:28 AM, Pavel Stehulewrote: > Hi > > 2016-09-15 18:05 GMT+02:00 Christian Convey : > >> On Mon, Sep 5, 2016 at 1:44 PM, Pavel Stehule >> wrote: >> ... >> >> > I wrote XMLTABLE function, and I am thinking about JSON_TABLE function. >> But >> > there is one blocker - missing JsonPath support in our JSON >> implementation. >> > >> > So one idea - implement JsonPath support and related JSON query >> functions. >> > This can help with better standard conformance. >> >> Hi Pavel, >> >> Are you still looking for someone to add the JsonPath support to the >> JSON implementation? And if so, how urgently are people waiting for >> it? >> > > yes - JsonPath support should be great. I hope so this or next commitfest > the XMLTABLE patch will be committed, and with JsonPath I can start to work > on JSON_TABLE function. > > But the JsonPath can be merged separately without dependency to > JSON_TABLE. There are more JSON searching functions, and these functions > should to support JsonPath be ANSI SQL compliant. > > >> >> I'd be happy to start working on a patch, but since I'm new to PG >> development, I'm probably not the fastest person to get it done. >> > > It is not problem. Probably you should to do this work without deep > knowledges about PostgreSQL internals. The work with data types (and > functions for data types) is well isolated from PostgreSQL engine. > > You can learn from current searching on JSON - > postgresql/src/backend/utils/adt/json.c > > And it is good start to be PostgreSQL's hacker - I started with > implementation of own data type and related functions. > > Regards > > Pavel > > >> Kind regards, >> Christian >> > >
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
> The reason for that is that you forgot to edit an alternative > exptect file, which matches for en_US locale. Now I understand. Thank you for the explanation. > But the test doesn't for collation and the only difference > between the alternative expect files comes from the difference of > collation for the query. "the workaround" seems to be the right > way to do it. I recommend rather to leave the workaroud as it is > and remove select_views_1.out from the "expected" directory. I will do. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Contains and is contained by operators of inet datatypes
> - I am not convinced that your changes to the descriptions of the operators > necessarily make things clearer. For example "is contained by and smaller > network (subnet)" only mentions subnets and not IP-addresses. I was trying to avoid confusion. <@ is the "contained by" operator which is also returning true when both sides are equal. We shouldn't continue calling <<@ also "contained by". I removed the "(subnet)" and "(supernet)" additions. Can you think of any better wording? > - Maybe change "deprecated and will eventually be removed." to "deprecated > and may be removed in a future release.". I prefer that latter wording but I > am fine with either. I copied that note from the Geometric Functions and Operators page. > - Won't renaming the functions which implement risk breaking people's > applications? While the new names are a bit nicer I am not sure it is worth > doing. You are right. I reverted that part. > - The changes to the code look generally good. Thank you for the review. New version is attached. 0001-inet-contain-op-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 2016-11-13 00:40:12 -0500, Peter Eisentraut wrote: > On 11/12/16 2:18 PM, Andres Freund wrote: > >>> I also wonder if we want an easier to > >>> > > extend form of pubinsert/update/delete (say to add pubddl, > >>> > > pubtruncate, > >>> > > pub ... without changing the schema). > >>> > > > >> > > >> > So like, text array that's then parsed everywhere (I am not doing > >> > bitmask/int definitely)? > > Yes, that sounds good to me. Then convert it to individual booleans or a > > bitmask when loading the publications into the in-memory form (which you > > already do). > > I'm not sure why that would be better. Adding catalog columns in future > versions is not a problem. It can be extended from what core provides, for extended versions of replication solutions, for one. I presume publications/subscriptions aren't only going to be used by built-in code. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Do we need use more meaningful variables to replace 0 in catalog head files?
On 2016-11-13 00:20:22 -0500, Peter Eisentraut wrote: > On 11/11/16 11:10 AM, Tom Lane wrote: > > boolin: OID=1242 proname=boolin proargtypes="cstring" prorettype=bool > > boolin: prosrc=boolin provolatile=i proparallel=s > > Then we're not very far away from just using CREATE FUNCTION SQL commands. Well, those do a lot of syscache lookups, which in turn do lookups for functions... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump, pg_dumpall and data durability
Hi, On 2016-11-08 18:18:01 -0500, Tom Lane wrote: > I think this might be better addressed by adding something to backup.sgml > pointing out that you'd better fsync or sync your backups before assuming > that they can't be lost. How does a normal user do that? I don't think there's a cross-platform advice we can give, and even on *nix the answer basically is 'sync; sync;' which is a pretty big hammer, and might be completely unacceptable on a busy server. Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers