Re: [HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1
On Fri, May 29, 2015 at 7:56 AM, Robert Haas robertmh...@gmail.com wrote: On Thu, May 28, 2015 at 8:51 AM, Robert Haas robertmh...@gmail.com wrote: [ speculation ] [...] However, since the vacuum did advance relfrozenxid, it will call vac_truncate_clog, which will call SetMultiXactIdLimit, which will propagate the bogus datminmxid = 1 setting into shared memory. Ah! [...] - There's a third possible problem related to boundary cases in SlruScanDirCbRemoveMembers, but I don't understand that one well enough to explain it. Maybe Thomas can jump in here and explain the concern. I noticed something in passing which is probably not harmful, and not relevant to this bug report, it was just a bit confusing while testing: SlruScanDirCbRemoveMembers never deletes any files if rangeStart == rangeEnd. In practice, if you have an idle cluster with a lot of multixact data and you VACUUM FREEZE all databases and then CHECKPOINT, you might be surprised to see no member files going away quite yet, but they'll eventually be truncated by a future checkpoint, once rangeEnd has had a chance to advance to the next page due to more multixacts being created. If we want to fix this one day, maybe the right thing to do is to treat the rangeStart == rangeEnd case the same way we treat rangeStart rangeEnd, that is, to assume that the range of pages isn't wrapped/inverted in this case. Although we don't have the actual start and end offset values to compare here, we know that for them to fall on the same page, the start offset index must be = the end offset index (since we added the new error to prevent member space wrapping, we never allow the end to get close enough to the start to fall on the same page). Like this (not tested): diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index 9568ff1..4d0bcc4 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -2755,7 +2755,7 @@ SlruScanDirCbRemoveMembers(SlruCtl ctl, char *filename, int segpage, /* Recheck the deletion condition. If it still holds, perform deletion */ if ((range-rangeStart range-rangeEnd segpage range-rangeEnd segpage range-rangeStart) || - (range-rangeStart range-rangeEnd + (range-rangeStart = range-rangeEnd (segpage range-rangeStart || segpage range-rangeEnd))) SlruDeleteSegment(ctl, filename); -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] useless assignment pointer argument
On 2015-05-28 20:14:33 +, Gaetano Mendola wrote: src/backend/commands/explain.c:1692 src/backend/commands/explain.c:1874 src/backend/commands/explain.c:1986 there is the following assignment: ancestors = list_delete_first(ancestors); but it has no effect at all being that a function parameter and not used anymore after the assignment itself. So? It costs little to nothing, and it'll make it much less likely that a stale version of 'ancestors' is used when the code is expanded. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
On 05/28/2015 01:11 PM, Andrew Dunstan wrote: This seems to come up regularly. Maybe we should put it in an FAQ somewhere. The barriers to making non-core types into core types are very very high, possibly insurmountable. This is pretty much not an option. O.k., then either: * We install it by default and document that it is available (and how to enable it) * Push it out of core and let it be happy wherever Theory wants it to be JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing I'm offended is basically telling the world you can't control your own emotions, so everyone else should do it for you. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql tabcomplete - minor bugfix - tabcomplete for SET ROLE TO xxx
On 5/22/15 6:35 AM, Pavel Stehule wrote: we support SET ROLE name and SET ROLE TO name. Second form isn't supported by tabcomplete. Attached trivial patch fixes it. We don't tab-complete everything we possibly could. The documentation only lists the first form, so I don't think we necessarily need to complete the second form. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
Abhijit Menon-Sen a...@2ndquadrant.com writes: Here's an updated patch for the fsync problem(s). I've committed this after some mostly-cosmetic rearrangements. 4. As a partial aside, why does fsync_fname use OpenTransientFile? It looks like it should use BasicOpenFile as pre_sync_fname does. We close the fd immediately after calling fsync anyway. Or is there some reason I'm missing that we should use OpenTransientFile in pre_sync_fname too? pre_sync_fname is the one that is wrong IMO. Using BasicOpenFile just creates an opportunity for problems; that function should get called from as few places as possible. 5. I made walktblspc_entries use stat() instead of lstat(), so that we can handle symlinks and directories the same way. Note that we won't continue to follow links inside the sub-directories because we use walkdir instead of recursing. Given that, walktblspc_entries was 99% duplicate, so I folded it into walkdir with a process_symlinks boolean. I have to leave shortly, so I'll look at the initdb cleanup tomorrow. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1
Robert Haas wrote: On Thu, May 28, 2015 at 8:51 AM, Robert Haas robertmh...@gmail.com wrote: [ speculation ] OK, I finally managed to reproduce this, after some off-list help from Steve Kehlet (the reporter), Alvaro, and Thomas Munro. Here's how to do it: It's a long list of steps, but if you consider them carefully, it becomes clear that they are natural steps that a normal production system would go through -- essentially the only one that's really time-critical is the decision to pg_upgrade with a version before 9.3.5. In the process of investigating this, we found a few other things that seem like they may also be bugs: - As noted upthread, replaying an older checkpoint after a newer checkpoint has already happened may lead to similar problems. This may be possible when starting from an online base backup; or when restarting a standby that did not perform a restartpoint when replaying the last checkpoint before the shutdown. I'm going through this one now, as it's closely related and caused issues for us. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] useless assignment pointer argument
If the compiler is good the assignment is elided indeed, that's not what I meant to point out. On Thu, 28 May 2015 at 22:17 Andres Freund and...@anarazel.de wrote: On 2015-05-28 20:14:33 +, Gaetano Mendola wrote: src/backend/commands/explain.c:1692 src/backend/commands/explain.c:1874 src/backend/commands/explain.c:1986 there is the following assignment: ancestors = list_delete_first(ancestors); but it has no effect at all being that a function parameter and not used anymore after the assignment itself. So? It costs little to nothing, and it'll make it much less likely that a stale version of 'ancestors' is used when the code is expanded. Greetings, Andres Freund
[HACKERS] useless assignment pointer argument
Hi, in the following spots: src/backend/commands/explain.c:1692 src/backend/commands/explain.c:1874 src/backend/commands/explain.c:1986 there is the following assignment: ancestors = list_delete_first(ancestors); but it has no effect at all being that a function parameter and not used anymore after the assignment itself.
Re: [HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1
On 05/28/2015 12:56 PM, Robert Haas wrote: FTR: Robert, you have been a Samurai on this issue. Our many thanks. Sincerely, jD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing I'm offended is basically telling the world you can't control your own emotions, so everyone else should do it for you. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Free indexed_tlist memory explicitly within set_plan_refs()
On Sun, May 24, 2015 at 6:17 PM, Peter Geoghegan p...@heroku.com wrote: Attached patch adds such a pfree() call. Attached, revised version incorporates this small fix, while adding an additional big fix, and a number of small style tweaks. This is mainly concerned with fixing the bug I was trying to fix when I spotted the minor pfree() issue: postgres=# insert into upsert (key, val) values('Foo', 'Bar') on conflict (key) do update set val = excluded.val where excluded.* is not null; ERROR: XX000: variable not found in subplan target lists LOCATION: fix_join_expr_mutator, setrefs.c:2003 postgres=# insert into upsert (key, val) values(('Foo', 'Bar') on conflict (key) do update set val = excluded.val where excluded.ctid is not null; ERROR: XX000: variable not found in subplan target lists LOCATION: fix_join_expr_mutator, setrefs.c:2003 The first query shown should clearly finish processing by the optimizer without raising this error message (execution should work correctly too, of course). The second query shown should fail with a user visible error message about referencing the excluded pseudo-relation's ctid column not making sense. The basic problem is that there wasn't much thought put into how the excluded pseudo-relation's reltargetlist is generated -- it currently comes from a call to expandRelAttrs() during parse analysis, which, on its own, doesn't allow whole row Vars to work. One approach to fixing this is to follow the example of RETURNING lists with references to more than one relation: preprocess_targetlist() handles this by calling pull_var_clause() and making new TargetEntry entries for Vars found to not be referencing the target (and not already in the targetlist for some other reason). Another approach, preferred by Andres, is to have query_planner() do more. I understand that the idea there is to make excluded.* closer to a regular table, in that it can be expected to have a baserel, and within the executor we have something closer to a bona-fide scan reltargetlist, that we can expect to have all Vars appear in. This should be enough to make the reltargetlist have the appropriate Vars more or less in the regular course of planning, including excluded.* whole row Vars. To make this work we could call add_vars_to_targetlist(), and call add_base_rels_to_query() and then build_base_rel_tlists() within query_planner() (while moving a few other things around). However, the ordering dependencies within query_planner() seemed quite complicated to me, and I didn't want to modify such an important routine just to fix this bug. RETURNING seemed like a perfectly good precedent to follow, so that's what I did. Now, it might have been that I misunderstood Andres when we discussed this problem on Jabber/IM, but ISTM that the second idea doesn't have much advantage over the first (I'm sure that Andres will be able to explain what he'd like to do better here -- it was a quick conversation). I did prototype the second approach, and the code footprint of what I have here (the first approach) seems lower than it would have to be with the second. Besides, I didn't see a convenient choke point to reject system column references with the second approach. Attached patch fixes the bug using the first approach. Tests were added demonstrating that the cases above are fixed. A second attached patch fixes another, largely independent bug. I noticed array assignment with ON CONFLICT DO UPDATE was broken. See commit message for full details. Thoughts? -- Peter Geoghegan From 72076f565b142debe39eb1e5a6cac27100b76fdb Mon Sep 17 00:00:00 2001 From: Peter Geoghegan peter.geoghega...@gmail.com Date: Thu, 28 May 2015 00:18:19 -0700 Subject: [PATCH 2/2] Fix bug with assigned-to expressions with indirection Handling of assigned-to expressions with indirection (as used with UPDATE targetlists, where, for example, assigned-to expressions allow array elements to be directly assigned to) could previously become confused. The problem was that ParseState was consulted to determine if an INSERT-appropriate or UPDATE-appropriate behavior should be used, and so for INSERT statements with ON CONFLICT DO UPDATE, an INSERT-targetlist-applicable codepath could incorrectly be taken. To fix, allow ParseState to reflect that an individual statement can be both p_is_insert and p_is_update at the same time. --- src/backend/parser/analyze.c | 3 +++ src/backend/parser/parse_target.c| 3 ++- src/include/parser/parse_node.h | 2 +- src/test/regress/expected/arrays.out | 21 + src/test/regress/sql/arrays.sql | 13 + 5 files changed, 40 insertions(+), 2 deletions(-) diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c index fc463fa..be474dc 100644 --- a/src/backend/parser/analyze.c +++ b/src/backend/parser/analyze.c @@ -891,6 +891,9 @@ transformOnConflictClause(ParseState *pstate, /* Process DO UPDATE */ if (onConflictClause-action ==
Re: [HACKERS] PGCon hacker lounge
On 5/28/15 9:43 AM, Dan Langille wrote: It seems it goes unused, and I was trying to see if anyone found it useful in the past. At BSDCan, for example, you can find people there every night discussing and working. Or perhaps just socializing. It's a major gathering point. ISTM that essentially all evening socializing/working at PGCon happens at various bars. Perhaps the BSD crowd is different in this regard. Now, if the lounge had a bar in it... ;P -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
On 05/28/2015 12:35 PM, Stephen Frost wrote: JD, What we would need for this is an 'extensions' directory, or similar, and a clear definition of what the requirements are around getting into it are. With that, we could decide for each module currently in contrib if it should go into the 'extensions' directory. I'm not sure that we would necessairly have to remove the contrib module or any modules which are deemed to not be appropriate for the 'extensions' directory. I am suggesting that things like citext be made a type proper. No install required. For things like pg_stat_statements of course there could be configuration required (need to add it to the preload) but it is automatically installed with the packages. I am sure there will be plenty of fun to be had with what should or shouldn't be merged into core. I think if we argue about the guidelines of how to analyze what should be in core versus the merits of any particular module, life will be easier. Here are some for a start: A. Must have been in contrib for at least two releases B. Must have visible community (and thus use case) I don't particularly like having a time-based requirement drive what's in which area, especially one that's double the length of our major release cycle. I think there is only one or two contrib modules that don't actually fit requirement A. 2. Push the rest out into a .Org project called contrib. Let those who are interested in the technology work on them or use them. This project since it is outside of core proper can work just like other extension projects. Alternately, allow the maintainers push them wherever they like (Landscape, Github, Savannah, git.postgresql.org ...). That's an interesting idea, but it's unclear how this would be any different from PGXN..? PGXN is CPAN not Perl or DBD::Pg. It is actually a compliment to PGXN because it is still needed and needed more because that is where you are going to get the latest and greatest of the modules. Sincerely, JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing I'm offended is basically telling the world you can't control your own emotions, so everyone else should do it for you. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
On 05/28/2015 04:05 PM, Joshua D. Drake wrote: On 05/28/2015 12:35 PM, Stephen Frost wrote: JD, What we would need for this is an 'extensions' directory, or similar, and a clear definition of what the requirements are around getting into it are. With that, we could decide for each module currently in contrib if it should go into the 'extensions' directory. I'm not sure that we would necessairly have to remove the contrib module or any modules which are deemed to not be appropriate for the 'extensions' directory. I am suggesting that things like citext be made a type proper. No install required. This seems to come up regularly. Maybe we should put it in an FAQ somewhere. The barriers to making non-core types into core types are very very high, possibly insurmountable. This is pretty much not an option. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible pointer dereference
While at it the assert(cnfa != NULL cnfa-nstates != 0); at src/backend/regex/rege_dfa.c:282 is issued too late indeed at line 278 and 279 cnfa was already dereferenced. Same for assert(t != NULL) in src/backend/regex/regexec.c:821 is issued way too late. On Thu, 28 May 2015 at 15:59 Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, May 27, 2015 at 8:57 PM, Haribabu Kommi kommi.harib...@gmail.com wrote: By correcting the following way will solve the problem. return ts ? (*ts != 0) : false; instead of retun *ts != 0; Attached a patch for it. If the only caller always passes a valid pointer, there's no point in adding this check. We have many functions in our source base that assume that the caller will pass a valid pointer, and changing them all would make the code bigger, harder to read, and possibly slower, without any real benefit. Well, we should either install something like Haribabu's patch, or else remove the existing tests in the function that allow ts to be NULL. And the function's API contract comment needs to be clarified in either case; the real bug here is lack of a specification. I don't particularly have an opinion on whether it's valuable to allow this function to be called without receiving a timestamp back. Perhaps the authors of the patch can comment on that. regards, tom lane
Re: [HACKERS] About that re-release ...
On 05/28/2015 02:37 AM, Marco Atzeri wrote: On 5/28/2015 5:00 AM, Tom Lane wrote: Assuming that we can get a fix for the fsync-failure-during-restart problem committed by the end of the week, there will be a new set of back-branch minor releases next week. Usual schedule, wrap Monday for public announcement Thursday. regards, tom lane Tom, thanks for the advise. I will postpone the deployment of new packages for cygwin until 9.4.3 will be available. You're doing the cygwin packages? You should probably be on the packagers list, then, no? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving GEQO
We follow your advice, our goal is improve the quality of the solution and we made it,however the total query execution time is higher. Regards. On 05/27/2015 04:36 PM, Tom Lane wrote: boix b...@uclv.cu writes: Hello, my partner and me are working with the goal of improve the GEQO's performance, we tried with Ant Colony Optimization, but it does not improve, actually we are trying with a new variant of Genetic Algorithm, specifically Micro-GA. This algorithm finds a better solution than GEQO in less time, however the total query execution time is higher. The fitness is calculated by geqo_eval function. Does anybody know why this happens? Well, for one thing, you can't just do this: + aux = aux1; without totally confusing all your subsequent steps. You'd want to copy the pointed-to data, likely, not the pointer. regards, tom lane diff --git a/contrib/Makefile b/contrib/Makefile index d230451..df9ccef 100755 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -26,6 +26,7 @@ SUBDIRS = \ isn \ lo \ ltree \ + microg \ oid2name \ pageinspect \ passwordcheck \ diff --git a/contrib/microg/Makefile b/contrib/microg/Makefile new file mode 100644 index 000..7597ede --- /dev/null +++ b/contrib/microg/Makefile @@ -0,0 +1,25 @@ +#- +# +# Makefile-- +#Makefile for the genetic query optimizer module +# +# Copyright (c) 2015, Universidad Central Marta Abreu de Las Villas +# +# contrib/microg/Makefile +# +#- + +MODULE_big = microg +OBJS = microg_main.o + + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/microg +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/contrib/microg/microg_main.c b/contrib/microg/microg_main.c new file mode 100644 index 000..f71a0be --- /dev/null +++ b/contrib/microg/microg_main.c @@ -0,0 +1,445 @@ +/* + * + * microg_main.c + * solution to the query optimization problem + * by means of a Micro Genetic Algorithm (micro-GA) + * + * Portions Copyright (c) 2014-2015, PostgreSQL Global Development Group + * Portions Copyright (c) 2015, Universidad Central de Las Villas + * + * contrib/microg_main.c + * + *- + */ + +/* contributed by: + =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*= + * Martin Utesch * Institute of Automatic Control * + = = University of Mining and Technology = + * ute...@aut.tu-freiberg.de * Freiberg, Germany * + =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*= + * Laura Perez Triana + * Centro de Estudios Informaticos + *== Universidad Central de Las Villas =* ltri...@uclv.cu *Villa Clara, Cuba + =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*= + * Alegandro G. Gomez Boix + * Centro de Estudios Informaticos + *== Universidad Central de Las Villas =* b...@uclv.cu *Villa Clara, Cuba + */ + +/* -- parts of this are adapted from D. Whitley's Genitor algorithm -- */ + +#include postgres.h + +#include math.h + +#include optimizer/geqo_misc.h +#include optimizer/geqo_mutation.h +#include optimizer/geqo_pool.h +#include optimizer/geqo_random.h +#include optimizer/geqo_selection.h +#include optimizer/geqo_gene.h +#include optimizer/paths.h +#include sys/timeb.h +#include fmgr.h + +PG_MODULE_MAGIC +; + +/* + * Configuration options + */ +int Geqo_effort; +int Geqo_pool_size; +int Geqo_generations; +double Geqo_selection_bias; +double Geqo_seed; + +join_search_hook_type join_search_hook = NULL; + +void _PG_init(void); +void _PG_fini(void); + +/* new functions of microg */ +void random_init_poolMG(PlannerInfo *root, Pool *pool); +int meanCommonEdgeInPool(Chromosome *pool, int pool_size, int number_of_rels); +int existEdge(Gene a, Gene b, Gene* s2, int lenght); +int commonEdge(Gene* s1, Gene* s2, int number_of_rels); +Chromosome * localSerch2_opt(PlannerInfo *root, Gene* solution, int sizeProblem); +RelOptInfo *microg(PlannerInfo *root, int number_of_rels, List *initial_rels); + +/* define edge recombination crossover [ERX] per default */ +#if !defined(ERX) \ + !defined(PMX) \ + !defined(CX) \ + !defined(PX) \ + !defined(OX1) \ + !defined(OX2) +#define ERX +#endif + +/* + * microg + * solution of the query optimization problem + * similar to a constrained Traveling Salesman Problem (TSP) + */ + +RelOptInfo * +microg(PlannerInfo *root, int number_of_rels, List *initial_rels) { + GeqoPrivateData private; + int generation; + Chromosome *momma; + Chromosome *daddy; + Chromosome *kid, *result; + Pool *pool; + int pool_size, number_generations; + + struct
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
At 2015-05-28 19:56:15 +0530, a...@2ndquadrant.com wrote: I have a separate patch to initdb with the corresponding changes, which I will post after dinner and a bit more testing. Here's that patch too. I was a bit unsure how far to go with this. It fixes the problem of not following pg_xlog if it's a symlink (Andres) and the one where it died on bogus stuff in pg_tblspc (Tom) and unreadable files anywhere else (it now spews errors to stderr, but carries on for as long as it can). I've done a bit more than strictly necessary to fix those problems, though, and made the code largely similar to what's in the other patch. If you want something minimal, let me know and I'll cut it down to size. -- Abhijit P.S. If this patch gets applied, then the Adapted from walktblspc_links in initdb.c comment in fd.c should be changed: s/links/entries/. From 8e69433fae0b4f51879793f6594c76b99d764818 Mon Sep 17 00:00:00 2001 From: Abhijit Menon-Sen a...@2ndquadrant.com Date: Thu, 28 May 2015 22:02:29 +0530 Subject: Make initdb -S behave like xlog.c:fsync_pgdata() In particular, it should silently skip unreadable/unexpected files in the data directory and follow symlinks only for pg_xlog and under pg_tblspc. --- src/bin/initdb/initdb.c | 235 +--- 1 file changed, 122 insertions(+), 113 deletions(-) diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index f1c4920..8ebaa55 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -218,9 +218,9 @@ static char **filter_lines_with_token(char **lines, const char *token); static char **readfile(const char *path); static void writefile(char *path, char **lines); static void walkdir(char *path, void (*action) (char *fname, bool isdir)); -static void walktblspc_links(char *path, void (*action) (char *fname, bool isdir)); +static void walktblspc_entries(char *path, void (*action) (char *fname, bool isdir)); static void pre_sync_fname(char *fname, bool isdir); -static void fsync_fname(char *fname, bool isdir); +static void fsync_fname_ext(char *fname, bool isdir); static FILE *popen_check(const char *command, const char *mode); static void exit_nicely(void); static char *get_id(void); @@ -248,7 +248,7 @@ static void load_plpgsql(void); static void vacuum_db(void); static void make_template0(void); static void make_postgres(void); -static void perform_fsync(void); +static void fsync_pgdata(void); static void trapsig(int signum); static void check_ok(void); static char *escape_quotes(const char *src); @@ -518,38 +518,38 @@ writefile(char *path, char **lines) * walkdir: recursively walk a directory, applying the action to each * regular file and directory (including the named directory itself). * - * Adapted from copydir() in copydir.c. + * Adapted from walkdir() in backend/storage/file/fd.c. */ static void walkdir(char *path, void (*action) (char *fname, bool isdir)) { DIR *dir; - struct dirent *direntry; - char subpath[MAXPGPATH]; + struct dirent *de; dir = opendir(path); if (dir == NULL) { fprintf(stderr, _(%s: could not open directory \%s\: %s\n), progname, path, strerror(errno)); - exit_nicely(); + return; } - while (errno = 0, (direntry = readdir(dir)) != NULL) + while (errno = 0, (de = readdir(dir)) != NULL) { + char subpath[MAXPGPATH]; struct stat fst; - if (strcmp(direntry-d_name, .) == 0 || - strcmp(direntry-d_name, ..) == 0) + if (strcmp(de-d_name, .) == 0 || + strcmp(de-d_name, ..) == 0) continue; - snprintf(subpath, MAXPGPATH, %s/%s, path, direntry-d_name); + snprintf(subpath, MAXPGPATH, %s/%s, path, de-d_name); if (lstat(subpath, fst) 0) { fprintf(stderr, _(%s: could not stat file \%s\: %s\n), progname, subpath, strerror(errno)); - exit_nicely(); + continue; } if (S_ISDIR(fst.st_mode)) @@ -559,75 +559,72 @@ walkdir(char *path, void (*action) (char *fname, bool isdir)) } if (errno) - { fprintf(stderr, _(%s: could not read directory \%s\: %s\n), progname, path, strerror(errno)); - exit_nicely(); - } - if (closedir(dir)) - { - fprintf(stderr, _(%s: could not close directory \%s\: %s\n), -progname, path, strerror(errno)); - exit_nicely(); - } + (void) closedir(dir); - /* - * It's important to fsync the destination directory itself as individual - * file fsyncs don't guarantee that the directory entry for the file is - * synced. Recent versions of ext4 have made the window much wider but - * it's been an issue for ext3 and other filesystems in the past. - */ (*action) (path, true); } /* - * walktblspc_links: call walkdir on each entry under the given - * pg_tblspc directory, or do nothing if pg_tblspc doesn't exist. + * walktblspc_entries -- apply the action to the entries in pb_tblspc + * + * We expect to find only symlinks to tablespace directories here, but + * we'll apply the action to regular files, and call walkdir() on any + * directories as well. + * + *
Re: [HACKERS] Improving GEQO
On Wed, May 27, 2015 at 3:06 PM, boix b...@uclv.cu wrote: Hello, my partner and me are working with the goal of improve the GEQO's performance, we tried with Ant Colony Optimization, but it does not improve, actually we are trying with a new variant of Genetic Algorithm, specifically Micro-GA. This algorithm finds a better solution than GEQO in less time, however the total query execution time is higher. The fitness is calculated by geqo_eval function. Does anybody know why this happens? We attach the patch made with the changes in postgresql-9.2.0. can you submit more details? for example 'explain analyze' (perhaps here: http://explain.depesz.com/) of the plans generated GEQO vs GA vs stock? It sounds like you might be facing an estimation miss which is not really an issue a better planner could solve. That said, assuming you're getting 'better' plans in less time suggest you might be on to something. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About that re-release ...
On 5/28/2015 7:10 PM, Josh Berkus wrote: On 05/28/2015 02:37 AM, Marco Atzeri wrote: On 5/28/2015 5:00 AM, Tom Lane wrote: Assuming that we can get a fix for the fsync-failure-during-restart problem committed by the end of the week, there will be a new set of back-branch minor releases next week. Usual schedule, wrap Monday for public announcement Thursday. regards, tom lane Tom, thanks for the advise. I will postpone the deployment of new packages for cygwin until 9.4.3 will be available. You're doing the cygwin packages? You should probably be on the packagers list, then, no? There is a dedicate packagers mailing list ? I don't see one on: http://www.postgresql.org/list/ Could you clarify ? Marco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1
On Thu, May 28, 2015 at 8:51 AM, Robert Haas robertmh...@gmail.com wrote: [ speculation ] OK, I finally managed to reproduce this, after some off-list help from Steve Kehlet (the reporter), Alvaro, and Thomas Munro. Here's how to do it: 1. Install any pre-9.3 version of the server and generate enough multixacts to create at least TWO new segments. When you shut down the server, all segments except for the most current one will be removed. At this point, the only thing in $PGDATA/pg_multixact/offsets should be a single file, and the name of that file should not be or 0001. 2. Use pg_upgrade to upgrade to 9.3.4. It is possible that versions 9.3.4 will also work here, but you must not use 9.3.5 or higher, because 9.3.5 includes Bruce's commit 3d2e18510, which arranged to preserve relminmxid and datminmxid values. At this point, pg_controldata on the new cluster should show an oldestMultiXid value greater than 1 (copied from the old cluster), but all the datminmxid values are 1. Also, initdb will have left behind a bogus file in pg_multixact/offsets. 3. Move to 9.3.5 (or 9.3.6), not via pg_upgrade, but just by dropping in the new binaries. Follow the instructions in the 9.3.5 release notes; since you created at least TWO new segments in step one, there will be no 0001 file, and the query there will say that you should remove the bogus file. So do that, leaving just the good file in pg_multixact/offsets. At this point, pg_multixact/offsets is OK, and pg_controldata still says that oldestMultiXid 1, so that is also OK. The only problem is that we've got some bogus datminmxid values floating around. Our next step will be to convince vacuum to propagate the bogus datminmxid values back into pg_controldata. 4. Consume at least one transaction ID (e.g. SELECT txid_current()) and then do this: postgres=# set vacuum_freeze_min_age = 0; SET postgres=# set vacuum_freeze_table_age = 0; SET postgres=# vacuum; VACUUM Setting the GUCs forces full table scans, so that we advance relfrozenxid. But notice that we were careful not to just run VACUUM FREEZE, which would have also advanced relminmxid, which, for purposes of reproducing this bug, is not what we want to happen. So relminmxid is still (incorrectly) set to 1 for every database. However, since the vacuum did advance relfrozenxid, it will call vac_truncate_clog, which will call SetMultiXactIdLimit, which will propagate the bogus datminmxid = 1 setting into shared memory. (In my testing, this step doesn't work if performed on 9.3.4; you have to do it on 9.3.5. I think that's because of Tom's commit 78db307bb, but I believe in a more complex test scenario you might be able to get this to happen on 9.3.4 also.) I believe it's the case that an autovacuum of even a single table can substitute for this step if it happens to advance relfrozenxid but not relminmxid. 5. The next checkpoint, or the shutdown checkpoint in any event, will propagate the bogus value of 1 from shared memory back into the control file. 6. Now try to start 9.3.7. It will see the bogus oldestMultiXid = 1 value in the control file, attempt to read the corresponding offsets file, and die. In the process of investigating this, we found a few other things that seem like they may also be bugs: - As noted upthread, replaying an older checkpoint after a newer checkpoint has already happened may lead to similar problems. This may be possible when starting from an online base backup; or when restarting a standby that did not perform a restartpoint when replaying the last checkpoint before the shutdown. - pg_upgrade sets datminmxid = old_cluster.controldata.chkpnt_nxtmulti, which is correct only if there are ZERO multixacts in use at the time of the upgrade. It would be best, I think, to set this to the same value it had in the old cluster, but if we're going to use a blanket value, I think it needs to be chkpnt_oldstMulti. - There's a third possible problem related to boundary cases in SlruScanDirCbRemoveMembers, but I don't understand that one well enough to explain it. Maybe Thomas can jump in here and explain the concern. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
On Thu, May 28, 2015 at 1:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, May 28, 2015 at 10:26 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: 2. Robert, are you comfortable with what fsync_pgdata() does in xlog.c? I wasn't sure if I should move that to fd.c as well. I think it's borderline OK for now. I think if the function is specific as fsync_pgdata(), fd.c is not the right place. I'm not sure xlog.c is either, though. ISTM xlog.c is clearly the *wrong* place; if this behavior has anything to do with WAL logging as such, it's not apparent to me. fd.c is not a great place perhaps, but in view of the fact that we have things like RemovePgTempFiles() in there, it's not unreasonable to see fsync_pgdata as something to put there as well (perhaps with a name more chosen to match fd.c names...) OK, sure, makes sense. Since Robert appears to be busy worrying about the multixact issue reported by Steve Kehlet, I suggest he focus on that and I'll take care of getting this thing committed. AFAICT we have consensus on what it should do and we're down to arguing about code style. Thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving GEQO
On Fri, May 29, 2015 at 12:59 AM, Merlin Moncure mmonc...@gmail.com wrote: On Wed, May 27, 2015 at 3:06 PM, boix b...@uclv.cu wrote: Hello, my partner and me are working with the goal of improve the GEQO's performance, we tried with Ant Colony Optimization, but it does not improve, actually we are trying with a new variant of Genetic Algorithm, specifically Micro-GA. This algorithm finds a better solution than GEQO in less time, however the total query execution time is higher. The fitness is calculated by geqo_eval function. Does anybody know why this happens? We attach the patch made with the changes in postgresql-9.2.0. can you submit more details? for example 'explain analyze' (perhaps here: http://explain.depesz.com/) of the plans generated GEQO vs GA vs stock? It sounds like you might be facing an estimation miss which is not really an issue a better planner could solve. That said, assuming you're getting 'better' plans in less time suggest you might be on to something. merlin What sort of tests are you running? I suspect that anything which is not too well thought out and tested might end up performing well only on small subset of tests. Also, what is the consistency of the plans generated? If you are only targeting planning time, I feel it might be of lesser value. However, if you can get large order joins to be executed in a near optimal (brute force) solution, you might be on to something. Something I would like to see done is remove the dead code that is present in existing GEQO. This might alone lead to lesser compilation times. -- Regards, Atri *l'apprenant*
Re: [HACKERS] RFC: Remove contrib entirely
JD, * Joshua D. Drake (j...@commandprompt.com) wrote: Contrib according to the docs is: These include porting tools, analysis utilities, and plug-in features that are not part of the core PostgreSQL system, mainly because they address a limited audience or are too experimental to be part of the main source tree. This does not preclude their usefulness. We certainly seem to have moved on from that definition. If nothing else, it'd be valuable to clarify what contrib is and then rearrange things accordingly. It has also been mentioned many times over the years that contrib is a holding tank for technology that would hopefully be pushed into core someday. I continue to feel this is a good use-case for contrib. What I am suggesting: 1. Analyze the current contrib modules for inclusion into -core. A few of these are pretty obvious: pg_stat_statements citext postgres_fdw hstore pg_crypto [...] We don't really have a place in -core for them.. There has been ongoing discussion about that but nothing has changed in that regard, as far as I'm aware. We have moved a few things out of contrib and into bin, but that doesn't make sense for the above. What we would need for this is an 'extensions' directory, or similar, and a clear definition of what the requirements are around getting into it are. With that, we could decide for each module currently in contrib if it should go into the 'extensions' directory. I'm not sure that we would necessairly have to remove the contrib module or any modules which are deemed to not be appropriate for the 'extensions' directory. I am sure there will be plenty of fun to be had with what should or shouldn't be merged into core. I think if we argue about the guidelines of how to analyze what should be in core versus the merits of any particular module, life will be easier. Here are some for a start: A. Must have been in contrib for at least two releases B. Must have visible community (and thus use case) I don't particularly like having a time-based requirement drive what's in which area, especially one that's double the length of our major release cycle. 2. Push the rest out into a .Org project called contrib. Let those who are interested in the technology work on them or use them. This project since it is outside of core proper can work just like other extension projects. Alternately, allow the maintainers push them wherever they like (Landscape, Github, Savannah, git.postgresql.org ...). That's an interesting idea, but it's unclear how this would be any different from PGXN..? Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] RFC: Remove contrib entirely
On 05/28/2015 04:22 PM, Joshua D. Drake wrote: On 05/28/2015 01:11 PM, Andrew Dunstan wrote: This seems to come up regularly. Maybe we should put it in an FAQ somewhere. The barriers to making non-core types into core types are very very high, possibly insurmountable. This is pretty much not an option. O.k., then either: * We install it by default and document that it is available (and how to enable it) * Push it out of core and let it be happy wherever Theory wants it to be I'd be ok with installing it by default. But the case that's a lot harder to require to be always installed is pgcrypto, as has often been discussed in the past. In any case, we will always want to have some loadable modules, not least because it's a part of eating our own dog food. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] auto_explain sample rate
Hi all It's sometimes desirable to collect auto_explain data with ANALYZE in order to track down hard-to-reproduce issues, but the performance impacts can be pretty hefty on the DB. I'm inclined to add a sample rate to auto_explain so that it can trigger only on x percent of queries, and also add a sample test hook that can be used to target statements of interest more narrowly (using a C hook function). Sound like a reasonable approach? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] RFC: Remove contrib entirely
JD, * Joshua D. Drake (j...@commandprompt.com) wrote: On 05/28/2015 06:50 PM, Peter Eisentraut wrote: On 5/28/15 3:35 PM, Stephen Frost wrote: What we would need for this is an 'extensions' directory, or similar, and a clear definition of what the requirements are around getting into it are. With that, we could decide for each module currently in contrib if it should go into the 'extensions' directory. I'm not sure that we would necessairly have to remove the contrib module or any modules which are deemed to not be appropriate for the 'extensions' directory. This seems reasonable to me. It's in line with the recent move from contrib to bin. It'll just be quite a bit bigger of an undertaking. (50 threads to discuss the merits of each module separately?) Maybe start by picking the top 5 and sort those out. The thing is, we don't have that many to argue about now, in fact: Alright, I'll bite. :) F.1. adminpack Need it- pgAdmin can't senibly install it or even include it in some way, and it'd be *very* painful to not have it for a lot of users. F.2. auth_delay Should be a core feature. Having this in a contrib module is silly. F.3. auto_explain Move to extension directory in the repo. F.4. btree_gin F.5. btree_gist Both of these should simply be in core. F.6. chkpass F.7. citext F.8. cube Push out and/or keep it in contrib in repo. F.9. dblink Move to extension directory in the repo. F.10. dict_int F.11. dict_xsyn Looks like these are just examples? Maybe move to an 'examples' directory, or into src/test/modules, or keep in contrib. F.12. earthdistance Depends on cube, so, same as whatever happens there. I don't think extensions-in-repo should depend on contrib modules, as a rule. F.13. file_fdw F.14. fuzzystrmatch F.15. hstore Move to extension directory in the repo. F.16. intagg Obsolute, per the docs. Push out and deal with the break, or keep it in contrib in repo. F.17. intarray Move to extension directory in the repo. F.18. isn F.19. lo F.20. ltree F.21. pageinspect Move to extension directory in the repo. F.22. passwordcheck Should be an in-core capability and not shoved off into an extension. F.23. pg_buffercache Pull it into core. F.24. pgcrypto Move to extension directory in the repo. F.25. pg_freespacemap Should be in core. F.26. pg_prewarm F.27. pgrowlocks Should be in core. F.28. pg_stat_statements I'd actually prefer that this be in core, but I'd be alright with it being in extension directory in the repo. F.29. pgstattuple F.30. pg_trgm Should be in core. F.31. postgres_fdw Move to extension directory in the repo. (actually, I'd be fine with both this and file_fdw being included in core.. I'm just not 100% sure how that'd look) F.32. seg F.33. sepgsql Move to extension directory in the repo. F.34. spi Maybe pull some into core.. or maybe all, or move to an extension. F.35. sslinfo Should be in core. F.36. tablefunc My gut reaction is that it should be in core for crosstab(), but David's talking about implementing PIVOT, so.. F.37. tcn Should be in core, imv, but not a position I hold very strongly. F.38. test_decoding Should be in src/test/modules, or maybe some 'examples' dir. F.39. tsearch2 I'm inclined to just drop this.. Or we could keep it in contrib in the repo. F.40. tsm_system_rows F.41. tsm_system_time These should be in core. F.42. unaccent Move to extension directory in the repo. F.43. uuid-ossp This one probably deserves its own thread, heh.. F.44. xml2 Push it out, or keep it in contrib in repo. Look at these, a lot of them are obvious... just include for goodness sakes: pg_trgm has been in contrib for a decade of not more. Either rip it out or include it by default. postgres_fdw (by the time we make the change it will be two releases) Agreed. sepgsql has no business being in core, it is: 1. An extension 2. About as linux specific as we can get Not sure that being platform agnostic has to be a requirement of being in the repo or being an extension in the repo... It does need some work though and discussion has recently started about if the sepgsql types defined in the SELinux reference policy should continue to exist or if they should be changed. I'm following that discussion with interest. Adminpack: It is for pgAdmin, give it back or push it into core proper I'd keep it in the repo as an extension. Pushing it out would just cause lots of trouble for little gain. I just don't think this would be that hard if we were willing to put our minds to it. Or.. another way: Nobody has yet to provide an argument as to why we need contrib when we have a fully functioning extension capability. pg_stat_statements is perhaps one of the best reasons. Not something that we necessairly want to force on everyone who installs PG (presumably the additional shared memory and performance impact is an issue for some people...
Re: [HACKERS] useless assignment pointer argument
Andres Freund and...@anarazel.de writes: On 2015-05-28 20:14:33 +, Gaetano Mendola wrote: src/backend/commands/explain.c:1692 src/backend/commands/explain.c:1874 src/backend/commands/explain.c:1986 there is the following assignment: ancestors = list_delete_first(ancestors); but it has no effect at all being that a function parameter and not used anymore after the assignment itself. So? It costs little to nothing, and it'll make it much less likely that a stale version of 'ancestors' is used when the code is expanded. It's completely mistaken to imagine that the statement has no effect: it does change the underlying List structure, so removing it would be wrong. It's true that we could, today, write it as (void) list_delete_first(ancestors); but this is not any more clear IMO, and it would fail if the code were ever changed so that these functions did use the ancestors list after the point of popping the context they pushed for their children. That risk outweighs any argument about how deleting the assignment would quiet some tool or other. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
On 5/28/15 3:35 PM, Stephen Frost wrote: What we would need for this is an 'extensions' directory, or similar, and a clear definition of what the requirements are around getting into it are. With that, we could decide for each module currently in contrib if it should go into the 'extensions' directory. I'm not sure that we would necessairly have to remove the contrib module or any modules which are deemed to not be appropriate for the 'extensions' directory. This seems reasonable to me. It's in line with the recent move from contrib to bin. It'll just be quite a bit bigger of an undertaking. (50 threads to discuss the merits of each module separately?) Maybe start by picking the top 5 and sort those out. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1
On Thu, May 28, 2015 at 10:41 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: 2. If you pg_upgrade to 9.3.7 or 9.4.2, then you may have datminmxid values which are equal to the next-mxid counter instead of the correct value; in other words, they are too new. [ discussion of how the control file's oldestMultiXid gets set ] I'm talking about the datminmxid in pg_database. You're talking about the contents of pg_control. Those are two different things. The relevant code is not what you quote, but rather this: /* set pg_database.datminmxid */ PQclear(executeQueryOrDie(conn_template1, UPDATE pg_catalog.pg_database SET datminmxid = '%u', old_cluster.controldata.chkpnt_nxtmulti)); Tom previously observed this to be wrong, here: http://www.postgresql.org/message-id/9879.1405877...@sss.pgh.pa.us Although Tom was correct to note that it's wrong, nothing ever got fixed. :-( A. Most obviously, we should fix pg_upgrade so that it installs chkpnt_oldstMulti instead of chkpnt_nxtmulti into datfrozenxid, so that we stop creating new instances of this problem. That won't get us out of the hole we've dug for ourselves, but we can at least try to stop digging. (This is assuming I'm right that chkpnt_nxtmulti is the wrong thing - anyone want to double-check me on that one?) I don't think there's anything that we need to fix here. I see your followup now agreeing this is broken. Since I wrote the previous email, I've had two new ideas that I think are both better than the above. 1. Figure out the oldest multixact offset that actually exists in pg_multixacts/offsets, and use that value. If any older MXIDs still exist, they won't be able to be looked up anyway, so if they wrap around, it doesn't matter. The only value that needs to be reliable in order to do this is pg_controldata's NextMultiXactId, which to the best of my knowledge is not implicated in any of these bugs. pg_upgrade can check that the offsets file containing that value exists, and if not bail out. Then, start stepping backwards a file at a time. When it hits a missing file, the first multixact in the next file is a safe value of datfrozenxid for every database in the new cluster. If older MXIDs exist, they're unreadable anyway, so if they wrap, nothing lost. If the value is older than necessary, the first vacuum in each database will fix it. We have to be careful: if we step back too many files, such that our proposed datfrozenxid might wrap, then we've got a confusing situation and had better bail out - or at least think really carefully about what to do. 2. When we're upgrading from a version 9.3 or higher, copy the EXACT datminmxid from each old database to the corresponding new database. This seems like it ought to be really simple. - In DetermineSafeOldestOffset, find_multixact_start() is used to set MultiXactState-offsetStopLimit. If it fails here, we don't know when to refuse multixact creation to prevent wraparound. Again, in recovery, that's fine. If it happens in normal running, it's not clear what to do. Refusing multixact creation is an awfully blunt instrument. Maybe we can scan pg_multixact/offsets to determine a workable stop limit: the first file greater than the current file that exists, minus two segments, is a good stop point. Perhaps we ought to use this mechanism here categorically, not just when find_multixact_start() fails. It might be more robust than what we have now. Blunt instruments have the desirable property of being simple. We don't want any more clockwork here, I think --- this stuff is pretty complicated already. As far as I understand, if during normal running we see that find_multixact_start has failed, sufficient vacuuming should get it straight eventually with no loss of data. Unfortunately, I don't believe that to be true. If find_multixact_start() fails, we have no idea how close we are to the member wraparound point. Sure, we can start vacuuming, but the user can be creating new, large multixacts at top speed while we're doing that, which could cause us to wrap around before we can finish vacuuming. Furthermore, if we adopted the blunt instrument, users who are in this situation would update to 9.4.3 (or whenever these fixes get released) and find that they can't create new MXIDs for a possibly very protracted period of time. That amounts to an outage for which users won't thank us. Looking at the files in the directory seems pretty simple in this case, and quite a bit more fail-safe than what we're doing right now. The current logic purports to leave a one-file gap in the member space, but there's no guarantee that the gap really exists on disk the way we think it does. With this approach, we can be certain that there is a gap. And that is a darned good thing to be certain about. C. I think we should also change
Re: [HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1
On Thu, May 28, 2015 at 4:06 PM, Joshua D. Drake j...@commandprompt.com wrote: FTR: Robert, you have been a Samurai on this issue. Our many thanks. Thanks! I really appreciate the kind words. So, in thinking through this situation further, it seems to me that the situation is pretty dire: 1. If you pg_upgrade to 9.3 before 9.3.5, then you may have relminmxid or datminmxid values which are 1 instead of the correct value. Setting the value to 1 was too far in the past if your MXID counter is 2B, and too far in the future if your MXID counter is 2B. 2. If you pg_upgrade to 9.3.7 or 9.4.2, then you may have datminmxid values which are equal to the next-mxid counter instead of the correct value; in other words, they are two new. 3. If you pg_upgrade to 9.3.5, 9.3.6, 9.4.0, or 9.4.1, then you will have the first problem for tables in template databases, and the second one for the rest. (See 866f3017a.) 4. Wrong relminmxid or datminmxid values can eventually propagate into the control file, as demonstrated in my previous post. Therefore, we can't count on relminmxid to be correct, we can't count on datminmxid to be correct, and we can't count on the control file to be correct. That's a sack of sad. 5. If the values are too far in the past, then nothing really terrible will happen unless you upgrade to 9.3.7 or 9.4.2, at which point the system will refuse to start. Forcing a VACUUM FREEZE on every database, including the unconnectable ones, should fix this and allow you to upgrade safely - which you want to do, because 9.3.7 and 9.4.2 fix a different set of multixact data loss bugs. 6. If the values are too far in the future, the system may fail to prevent wraparound, leading to data loss. I am not totally clear on whether a VACUUM FREEZE will fix this problem. It seems like the chances are better if you are running at least 9.3.5+ or 9.4.X, because of 78db307bb. But I'm not sure how complete a fix that is. So what do we do about this? I have a few ideas: A. Most obviously, we should fix pg_upgrade so that it installs chkpnt_oldstMulti instead of chkpnt_nxtmulti into datfrozenxid, so that we stop creating new instances of this problem. That won't get us out of the hole we've dug for ourselves, but we can at least try to stop digging. (This is assuming I'm right that chkpnt_nxtmulti is the wrong thing - anyone want to double-check me on that one?) B. We need to change find_multixact_start() to fail softly. This is important because it's legitimate for it to fail in recovery, as discussed upthread, and also because we probably want to eliminate the fail-to-start hazard introduced in 9.4.2 and 9.3.7. find_multixact_start() is used in three places, and they each require separate handling: - In SetMultiXactIdLimit, find_multixact_start() is used to set MultiXactState-oldestOffset, which is used to determine how aggressively to vacuum. If find_multixact_start() fails, we don't know how aggressively we need to vacuum to prevent members wraparound; it's probably best to decide to vacuum as aggressively as possible. Of course, if we're in recovery, we won't vacuum either way; the fact that it fails softly is good enough. - In DetermineSafeOldestOffset, find_multixact_start() is used to set MultiXactState-offsetStopLimit. If it fails here, we don't know when to refuse multixact creation to prevent wraparound. Again, in recovery, that's fine. If it happens in normal running, it's not clear what to do. Refusing multixact creation is an awfully blunt instrument. Maybe we can scan pg_multixact/offsets to determine a workable stop limit: the first file greater than the current file that exists, minus two segments, is a good stop point. Perhaps we ought to use this mechanism here categorically, not just when find_multixact_start() fails. It might be more robust than what we have now. - In TruncateMultiXact, find_multixact_start() is used to set the truncation point for the members SLRU. If it fails here, I'm guessing the right solution is not to truncate anything - instead, rely on intense vacuuming to eventually advance oldestMXact to a value whose member data still exists; truncate then. C. I think we should also change TruncateMultiXact() to truncate offsets first, and then members. As things stand, if we truncate members first, we increase the risk of seeing an offset that will fail when passed to find_multixact_start(), because TruncateMultiXact() might get interrupted before it finishes. That seem like an unnecessary risk. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade resets timeline to 1
On Thu, May 28, 2015 at 05:26:56PM +0200, Christoph Berg wrote: Re: Noah Misch 2015-05-28 20150528150234.ga4111...@tornado.leadboat.com To clarify for the archives, the 2015-05-16 changes did not revert to 9.3 and earlier behavior. Rather, they standardized on the {9.0,9.1,9.2}-to-{9.3,9.4} upgrade behavior, bringing that behavior to all supported branches and source versions. Here is the history of timeline restoration in pg_upgrade: Ok, sorry for the noise then. It's not a regression, but I still think the behavior needs improvement, but this is indeed 9.6 material. No, thank you for the report. It had strong signs of being a regression, considering recent changes and the timing of your discovery. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade resets timeline to 1
On 29/05/15 12:59, Noah Misch wrote: On Thu, May 28, 2015 at 05:26:56PM +0200, Christoph Berg wrote: Re: Noah Misch 2015-05-28 20150528150234.ga4111...@tornado.leadboat.com To clarify for the archives, the 2015-05-16 changes did not revert to 9.3 and earlier behavior. Rather, they standardized on the {9.0,9.1,9.2}-to-{9.3,9.4} upgrade behavior, bringing that behavior to all supported branches and source versions. Here is the history of timeline restoration in pg_upgrade: Ok, sorry for the noise then. It's not a regression, but I still think the behavior needs improvement, but this is indeed 9.6 material. No, thank you for the report. It had strong signs of being a regression, considering recent changes and the timing of your discovery. From my experience, I would far rather a user raise concerns that are important to them, and find there is no real problem, than users not raising things and a serious bug or system shorting coming go unnoticed. This is a major concern of mine, for example: in my current project, where users were NOT raising problems in a timely manner, caused unnecessary work rather later in the project than I would have liked! So not just for PostgreSQL, but in general if a user has concerns, please raise them!!! Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Free indexed_tlist memory explicitly within set_plan_refs()
On Thu, May 28, 2015 at 2:37 PM, Peter Geoghegan p...@heroku.com wrote: A second attached patch fixes another, largely independent bug. I noticed array assignment with ON CONFLICT DO UPDATE was broken. See commit message for full details. Finally, here is a third patch, fixing the final bug that I discussed with you privately. There are now fixes for all bugs that I'm currently aware of. This concerns a thinko in unique index inference. See the commit message for full details. -- Peter Geoghegan From 904f46f4d9e7758c588254e2c2bb3ef3ef93f3c9 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan peter.geoghega...@gmail.com Date: Thu, 28 May 2015 15:45:48 -0700 Subject: [PATCH 3/3] Fix bug in unique index inference ON CONFLICT unique index inference had a thinko that could affect cases where the user-supplied inference clause required that an attribute match a particular (user named) collation and/or opclass. Firstly, infer_collation_opclass_match() matched on opclass and/or collation. Secondly, the attribute had to be in the list of attributes or expressions known to be in the definition of the index under consideration. The second step wasn't correct though, because having some match doesn't necessarily mean that the second step found the same index attribute as the (collation/opclass wise) match from the first step. To fix, make infer_collation_opclass_match() more discriminating in its second step. --- src/backend/optimizer/util/plancat.c | 45 +-- src/test/regress/expected/insert_conflict.out | 18 +++ src/test/regress/sql/insert_conflict.sql | 12 +++ 3 files changed, 59 insertions(+), 16 deletions(-) diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index b04dc2e..1b81f4c 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -52,7 +52,7 @@ get_relation_info_hook_type get_relation_info_hook = NULL; static bool infer_collation_opclass_match(InferenceElem *elem, Relation idxRel, - Bitmapset *inferAttrs, List *idxExprs); + List *idxExprs); static int32 get_rel_data_width(Relation rel, int32 *attr_widths); static List *get_relation_constraints(PlannerInfo *root, Oid relationObjectId, RelOptInfo *rel, @@ -615,8 +615,7 @@ infer_arbiter_indexes(PlannerInfo *root) * this for both expressions and ordinary (non-expression) * attributes appearing as inference elements. */ - if (!infer_collation_opclass_match(elem, idxRel, inferAttrs, - idxExprs)) + if (!infer_collation_opclass_match(elem, idxRel, idxExprs)) goto next; /* @@ -681,11 +680,10 @@ next: * infer_collation_opclass_match - ensure infer element opclass/collation match * * Given unique index inference element from inference specification, if - * collation was specified, or if opclass (represented here as opfamily + - * opcintype) was specified, verify that there is at least one matching - * indexed attribute (occasionally, there may be more). Skip this in the - * common case where inference specification does not include collation or - * opclass (instead matching everything, regardless of cataloged + * collation was specified, or if opclass was specified, verify that there is + * at least one matching indexed attribute (occasionally, there may be more). + * Skip this in the common case where inference specification does not include + * collation or opclass (instead matching everything, regardless of cataloged * collation/opclass of indexed attribute). * * At least historically, Postgres has not offered collations or opclasses @@ -707,11 +705,12 @@ next: */ static bool infer_collation_opclass_match(InferenceElem *elem, Relation idxRel, - Bitmapset *inferAttrs, List *idxExprs) + List *idxExprs) { AttrNumber natt; - Oid inferopfamily = InvalidOid; /* OID of att opfamily */ - Oid inferopcinputtype = InvalidOid; /* OID of att opfamily */ + Oid inferopfamily = InvalidOid; /* OID of opclass opfamily */ + Oid inferopcinputtype = InvalidOid; /* OID of opclass input type */ + int nplain = 0; /* # plain attrs observed */ /* * If inference specification element lacks collation/opclass, then no @@ -734,6 +733,10 @@ infer_collation_opclass_match(InferenceElem *elem, Relation idxRel, Oid opfamily = idxRel-rd_opfamily[natt - 1]; Oid opcinputtype = idxRel-rd_opcintype[natt - 1]; Oid collation = idxRel-rd_indcollation[natt - 1]; + int attno = idxRel-rd_index-indkey.values[natt - 1]; + + if (attno != 0) + nplain++; if (elem-inferopclass != InvalidOid (inferopfamily != opfamily || inferopcinputtype != opcinputtype)) @@ -749,12 +752,22 @@ infer_collation_opclass_match(InferenceElem *elem, Relation idxRel, continue; } - if ((IsA(elem-expr, Var) - bms_is_member(((Var *) elem-expr)-varattno, inferAttrs)) || - list_member(idxExprs, elem-expr)) + /* If one matching
[HACKERS] [PATCH] Document that directly callable functions may use fn_extra
Hi all I was a puzzled by src/backend/utils/fmgr/README and fmgr.h's descriptions of fcinfo-flinfo-fn_extra (FmgrInfo.fn_extra) as they seem to conflict with actual usage. The docs suggest that fl_extra is for the use of function call handlers, but in practice it's also used heavily by function implementations as a cache space. For example, SQL functions use fn_extra in init_sql_fcache, plpgsql uses it in plpgsql_compile, but also most of the array functions use it for a type cache. I'm inclined to change the docs to say that functions called directly by the fmgr may also use fn_extra (per existing practice). Handlers that wrap functions usually called directly by the fmgr must save and restore fn_extra or leave it untouched. Trivial patch to do the above attached. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From c58dec9d20911a91d7db63f313091d69f7999b17 Mon Sep 17 00:00:00 2001 From: Craig Ringer cr...@2ndquadrant.com Date: Fri, 29 May 2015 10:13:41 +0800 Subject: [PATCH] Document that fn_extra is also usable as a cache by direct-call fns --- src/backend/utils/fmgr/README | 6 +- src/include/fmgr.h| 2 +- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/src/backend/utils/fmgr/README b/src/backend/utils/fmgr/README index e7e7ae9..0913d28 100644 --- a/src/backend/utils/fmgr/README +++ b/src/backend/utils/fmgr/README @@ -71,7 +71,7 @@ typedef struct boolfn_strict; /* function is strict (NULL in = NULL out) */ boolfn_retset; /* function returns a set (over multiple calls) */ unsigned char fn_stats; /* collect stats if track_functions this */ -void *fn_extra; /* extra space for use by handler */ +void *fn_extra; /* extra space for use by function or handler */ MemoryContext fn_mcxt; /* memory context to store fn_extra in */ Node *fn_expr;/* expression parse tree for call, or NULL */ } FmgrInfo; @@ -98,6 +98,10 @@ field really is information about the arguments rather than information about the function, but it's proven to be more convenient to keep it in FmgrInfo than in FunctionCallInfoData where it might more logically go. +Functions with signature PGFunction (those that are callable directly from the +fmgr without a handler) may also use fn_extra to cache expensive-to-generate +information across calls. Handlers that wrap direct function calls must leave +fn_extra untouched or save and restore it around calls to the wrapped function. During a call of a function, the following data structure is created and passed to the function: diff --git a/src/include/fmgr.h b/src/include/fmgr.h index a901770..1bbff1e 100644 --- a/src/include/fmgr.h +++ b/src/include/fmgr.h @@ -58,7 +58,7 @@ typedef struct FmgrInfo bool fn_strict; /* function is strict (NULL in = NULL out) */ bool fn_retset; /* function returns a set */ unsigned char fn_stats; /* collect stats if track_functions this */ - void *fn_extra; /* extra space for use by handler */ + void *fn_extra; /* extra space for use by function or handler */ MemoryContext fn_mcxt; /* memory context to store fn_extra in */ fmNodePtr fn_expr; /* expression parse tree for call, or NULL */ } FmgrInfo; -- 2.1.0 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
On 05/28/2015 06:50 PM, Peter Eisentraut wrote: On 5/28/15 3:35 PM, Stephen Frost wrote: What we would need for this is an 'extensions' directory, or similar, and a clear definition of what the requirements are around getting into it are. With that, we could decide for each module currently in contrib if it should go into the 'extensions' directory. I'm not sure that we would necessairly have to remove the contrib module or any modules which are deemed to not be appropriate for the 'extensions' directory. This seems reasonable to me. It's in line with the recent move from contrib to bin. It'll just be quite a bit bigger of an undertaking. (50 threads to discuss the merits of each module separately?) Maybe start by picking the top 5 and sort those out. The thing is, we don't have that many to argue about now, in fact: F.1. adminpack F.2. auth_delay F.3. auto_explain F.4. btree_gin F.5. btree_gist F.6. chkpass F.7. citext F.8. cube F.9. dblink F.10. dict_int F.11. dict_xsyn F.12. earthdistance F.13. file_fdw F.14. fuzzystrmatch F.15. hstore F.16. intagg F.17. intarray F.18. isn F.19. lo F.20. ltree F.21. pageinspect F.22. passwordcheck F.23. pg_buffercache F.24. pgcrypto F.25. pg_freespacemap F.26. pg_prewarm F.27. pgrowlocks F.28. pg_stat_statements F.29. pgstattuple F.30. pg_trgm F.31. postgres_fdw F.32. seg F.33. sepgsql F.34. spi F.35. sslinfo F.36. tablefunc F.37. tcn F.38. test_decoding F.39. tsearch2 F.40. tsm_system_rows F.41. tsm_system_time F.42. unaccent F.43. uuid-ossp F.44. xml2 Look at these, a lot of them are obvious... just include for goodness sakes: pg_trgm has been in contrib for a decade of not more. Either rip it out or include it by default. postgres_fdw (by the time we make the change it will be two releases) sepgsql has no business being in core, it is: 1. An extension 2. About as linux specific as we can get Adminpack: It is for pgAdmin, give it back or push it into core proper I just don't think this would be that hard if we were willing to put our minds to it. Or.. another way: Nobody has yet to provide an argument as to why we need contrib when we have a fully functioning extension capability. Ego... is not a good reason. Of course the other option: Freeze contrib. What is in there now, is all there will ever be in there and the goal is to slowly reduce it to the point that it doesn't matter. Sincerely, jD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing I'm offended is basically telling the world you can't control your own emotions, so everyone else should do it for you. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1
Robert Haas wrote: 2. If you pg_upgrade to 9.3.7 or 9.4.2, then you may have datminmxid values which are equal to the next-mxid counter instead of the correct value; in other words, they are too new. What you describe is what happens if you upgrade from 9.2 or earlier. For this case we use this call: exec_prog(UTILITY_LOG_FILE, NULL, true, \%s/pg_resetxlog\ -m %u,%u \%s\, new_cluster.bindir, old_cluster.controldata.chkpnt_nxtmulti + 1, old_cluster.controldata.chkpnt_nxtmulti, new_cluster.pgdata); This uses the old cluster's nextMulti value as oldestMulti in the new cluster, and that value+1 is used as nextMulti. This is correct: we don't want to preserve any of the multixact state from the previous cluster; anything before that value can be truncated with no loss of critical data. In fact, there is no critical data before that value at all. If you upgrade from 9.3, this other call is used instead: /* * we preserve all files and contents, so we must preserve both next * counters here and the oldest multi present on system. */ exec_prog(UTILITY_LOG_FILE, NULL, true, \%s/pg_resetxlog\ -O %u -m %u,%u \%s\, new_cluster.bindir, old_cluster.controldata.chkpnt_nxtmxoff, old_cluster.controldata.chkpnt_nxtmulti, old_cluster.controldata.chkpnt_oldstMulti, new_cluster.pgdata); In this case we use the oldestMulti from the old cluster as oldestMulti in the new cluster, which is also correct. A. Most obviously, we should fix pg_upgrade so that it installs chkpnt_oldstMulti instead of chkpnt_nxtmulti into datfrozenxid, so that we stop creating new instances of this problem. That won't get us out of the hole we've dug for ourselves, but we can at least try to stop digging. (This is assuming I'm right that chkpnt_nxtmulti is the wrong thing - anyone want to double-check me on that one?) I don't think there's anything that we need to fix here. B. We need to change find_multixact_start() to fail softly. This is important because it's legitimate for it to fail in recovery, as discussed upthread, and also because we probably want to eliminate the fail-to-start hazard introduced in 9.4.2 and 9.3.7. find_multixact_start() is used in three places, and they each require separate handling: - In SetMultiXactIdLimit, find_multixact_start() is used to set MultiXactState-oldestOffset, which is used to determine how aggressively to vacuum. If find_multixact_start() fails, we don't know how aggressively we need to vacuum to prevent members wraparound; it's probably best to decide to vacuum as aggressively as possible. Of course, if we're in recovery, we won't vacuum either way; the fact that it fails softly is good enough. Sounds good. - In DetermineSafeOldestOffset, find_multixact_start() is used to set MultiXactState-offsetStopLimit. If it fails here, we don't know when to refuse multixact creation to prevent wraparound. Again, in recovery, that's fine. If it happens in normal running, it's not clear what to do. Refusing multixact creation is an awfully blunt instrument. Maybe we can scan pg_multixact/offsets to determine a workable stop limit: the first file greater than the current file that exists, minus two segments, is a good stop point. Perhaps we ought to use this mechanism here categorically, not just when find_multixact_start() fails. It might be more robust than what we have now. Blunt instruments have the desirable property of being simple. We don't want any more clockwork here, I think --- this stuff is pretty complicated already. As far as I understand, if during normal running we see that find_multixact_start has failed, sufficient vacuuming should get it straight eventually with no loss of data. - In TruncateMultiXact, find_multixact_start() is used to set the truncation point for the members SLRU. If it fails here, I'm guessing the right solution is not to truncate anything - instead, rely on intense vacuuming to eventually advance oldestMXact to a value whose member data still exists; truncate then. Fine. C. I think we should also change TruncateMultiXact() to truncate offsets first, and then members. As things stand, if we truncate members first, we increase the risk of seeing an offset that will fail when passed to find_multixact_start(), because TruncateMultiXact() might get interrupted before it finishes. That seem like an unnecessary risk. Not sure about this point. We did it the way you propose previously, and found it to be a problem because sometimes we tried to read an offset file that was no longer there. Do we really read member files anywhere? I thought we only tried to read offset files. If we remove member files, what is it that we try to read and find not to be present? --
Re: [HACKERS] hstore_plpython regression test does not work on Python 3
Peter Eisentraut pete...@gmx.net writes: On 5/26/15 5:19 PM, Oskari Saarenmaa wrote: Looks like that animal uses Python 3.4. Python 3.3 and newer versions default to using a random seed for hashing objects into dicts which makes the order of dict elements random; see https://docs.python.org/3/using/cmdline.html#cmdoption-R Ah, good catch. That explains the, well, randomness. I can reproduce the test failures with PYTHONHASHSEED=2. But I haven't been successful getting that environment variable set so that it works in the installcheck case. Yeah, there's pretty much no chance of controlling the postmaster's environment in installcheck cases. Instead, I have rewritten the tests to use asserts instead of textual comparisons. See attached patch. Comments? If that works back to Python 2.3 or whatever is the oldest we support, sounds good to me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Non-user-resettable SET SESSION AUTHORISATION
On Tue, May 26, 2015 at 10:06:59PM -0400, Robert Haas wrote: On Sat, May 23, 2015 at 8:14 PM, Noah Misch n...@leadboat.com wrote: On Tue, May 19, 2015 at 04:49:26PM -0400, Robert Haas wrote: A protocol extension avoids all of that trouble, and can be target for 9.6 just like any other approach we might come up with. I actually suspect the protocol extension will be FAR easier to fully secure, and thus less work, not more. All true. Here's another idea. Have the pooler open one additional connection, for out-of-band signalling. Add a pair of functions: pg_userchange_grant(recipient_pid int, user oid) pg_userchange_accept(sender_pid int, user oid) To change the authenticated user of a pool connection, the pooler would call pg_userchange_grant in the signalling connection and pg_userchange_accept in the target connection. This requires no protocol change or confidential nonce. The inevitably-powerful signalling user is better insulated from other users, because the pool backends have no need to become that user at any point. Bugs in the pooler's protocol state machine are much less likely to enable privilege escalation. On the other hand, it can't be quite as fast as the other ideas on this thread. I'm sure this could be made to work, but it would require complex signalling in return for no obvious value. I don't see avoiding a protocol extension as particularly beneficial. New protocol messages that are sent by the server cause a hard compatibility break for clients, but new protocol messages that are client-initiated and late enough in the protocol flow that the client knows the server version have no such problem. I didn't realize a protocol addition could be that simple, but you're right. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] readlink missing nul-termination in pg_rewind
On Fri, May 29, 2015 at 1:46 AM, Tom Lane t...@sss.pgh.pa.us wrote: Abhijit Menon-Sen a...@2ndquadrant.com writes: P.S. Also in passing, I note that pg_rewind will follow links under any directory anywhere named pg_tblspc (which probably doesn't matter), and does not follow pg_xlog if it's a symlink (which probably does). If you want, I can submit a trivial patch for the latter. As far as that goes, I think it does look at the whole parentpath, which means it would not be fooled by sub-subdirectories named pg_tblspc. A bigger problem is that whoever coded this forgot that parentpath could be null, which I blame on the lack of an API specification for the function. Oh, thanks for pushing a fix for that. It was missed during the review... -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
On 05/28/2015 06:25 PM, Andrew Dunstan wrote: I'd be ok with installing it by default. But the case that's a lot harder to require to be always installed is pgcrypto, as has often been discussed in the past. It used to be but IIRC we don't have those restrictions anymore. If so, then we need to pull it out and just call it the Encryption Extension or whatever JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing I'm offended is basically telling the world you can't control your own emotions, so everyone else should do it for you. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to improve a few appendStringInfo* calls
On 5/12/15 4:33 AM, David Rowley wrote: Shortly after I sent the previous patch I did a few more searches and also found some more things that are not quite right. Most of these are to use the binary append method when the length of the string is already known. For these cases it might be better to invent additional functions such as stringinfo_to_text() and appendStringInfoStringInfo() instead of repeating the pattern of referring to data and length separately. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
On 2015-05-29 AM 11:14, Joshua D. Drake wrote: pg_trgm has been in contrib for a decade of not more. Either rip it out or include it by default. There are jsonb gin operator class related files in src/backend/utils/adt/. Perhaps, trgm_gin.c, trgm_gist.c, trgm_op.c could be moved there. Similarly for other gin/gist operators classes - hstore, intarray, ltree maybe. Or each can have its own directory (including jsonb). But the original comment was about having these moved to an 'extensions' directory if at all, so perhaps this suggestion is moot. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hstore_plpython regression test does not work on Python 3
On 5/26/15 5:19 PM, Oskari Saarenmaa wrote: [1] http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=jaguarundibr=HEAD Looks like that animal uses Python 3.4. Python 3.3 and newer versions default to using a random seed for hashing objects into dicts which makes the order of dict elements random; see https://docs.python.org/3/using/cmdline.html#cmdoption-R Ah, good catch. That explains the, well, randomness. I can reproduce the test failures with PYTHONHASHSEED=2. But I haven't been successful getting that environment variable set so that it works in the installcheck case. Instead, I have rewritten the tests to use asserts instead of textual comparisons. See attached patch. Comments? diff --git a/contrib/hstore_plpython/expected/hstore_plpython.out b/contrib/hstore_plpython/expected/hstore_plpython.out index 6252836..b7a6a92 100644 --- a/contrib/hstore_plpython/expected/hstore_plpython.out +++ b/contrib/hstore_plpython/expected/hstore_plpython.out @@ -43,12 +43,10 @@ CREATE FUNCTION test1arr(val hstore[]) RETURNS int LANGUAGE plpythonu TRANSFORM FOR TYPE hstore AS $$ -plpy.info(repr(val)) +assert(val == [{'aa': 'bb', 'cc': None}, {'dd': 'ee'}]) return len(val) $$; SELECT test1arr(array['aa=bb, cc=NULL'::hstore, 'dd=ee']); -INFO: [{'aa': 'bb', 'cc': None}, {'dd': 'ee'}] -CONTEXT: PL/Python function test1arr test1arr -- 2 @@ -88,18 +86,14 @@ LANGUAGE plpythonu TRANSFORM FOR TYPE hstore AS $$ rv = plpy.execute(SELECT 'aa=bb, cc=NULL'::hstore AS col1) -plpy.info(repr(rv[0][col1])) +assert(rv[0][col1] == {'aa': 'bb', 'cc': None}) val = {'a': 1, 'b': 'boo', 'c': None} plan = plpy.prepare(SELECT $1::text AS col1, [hstore]) rv = plpy.execute(plan, [val]) -plpy.info(repr(rv[0][col1])) +assert(rv[0][col1] == 'a=1, b=boo, c=NULL') $$; SELECT test3(); -INFO: {'aa': 'bb', 'cc': None} -CONTEXT: PL/Python function test3 -INFO: 'a=1, b=boo, c=NULL' -CONTEXT: PL/Python function test3 test3 --- @@ -118,7 +112,7 @@ CREATE FUNCTION test4() RETURNS trigger LANGUAGE plpythonu TRANSFORM FOR TYPE hstore AS $$ -plpy.info(Trigger row: {'a': %r, 'b': %r} % (TD[new][a], TD[new][b])) +assert(TD[new] == {'a': 1, 'b': {'aa': 'bb', 'cc': None}}) if TD[new][a] == 1: TD[new][b] = {'a': 1, 'b': 'boo', 'c': None} @@ -126,8 +120,6 @@ return MODIFY $$; CREATE TRIGGER test4 BEFORE UPDATE ON test1 FOR EACH ROW EXECUTE PROCEDURE test4(); UPDATE test1 SET a = a; -INFO: Trigger row: {'a': 1, 'b': {'aa': 'bb', 'cc': None}} -CONTEXT: PL/Python function test4 SELECT * FROM test1; a |b ---+- diff --git a/contrib/hstore_plpython/sql/hstore_plpython.sql b/contrib/hstore_plpython/sql/hstore_plpython.sql index 872d8c6..9ff2ebc 100644 --- a/contrib/hstore_plpython/sql/hstore_plpython.sql +++ b/contrib/hstore_plpython/sql/hstore_plpython.sql @@ -37,7 +37,7 @@ CREATE FUNCTION test1arr(val hstore[]) RETURNS int LANGUAGE plpythonu TRANSFORM FOR TYPE hstore AS $$ -plpy.info(repr(val)) +assert(val == [{'aa': 'bb', 'cc': None}, {'dd': 'ee'}]) return len(val) $$; @@ -74,12 +74,12 @@ CREATE FUNCTION test3() RETURNS void TRANSFORM FOR TYPE hstore AS $$ rv = plpy.execute(SELECT 'aa=bb, cc=NULL'::hstore AS col1) -plpy.info(repr(rv[0][col1])) +assert(rv[0][col1] == {'aa': 'bb', 'cc': None}) val = {'a': 1, 'b': 'boo', 'c': None} plan = plpy.prepare(SELECT $1::text AS col1, [hstore]) rv = plpy.execute(plan, [val]) -plpy.info(repr(rv[0][col1])) +assert(rv[0][col1] == 'a=1, b=boo, c=NULL') $$; SELECT test3(); @@ -94,7 +94,7 @@ CREATE FUNCTION test4() RETURNS trigger LANGUAGE plpythonu TRANSFORM FOR TYPE hstore AS $$ -plpy.info(Trigger row: {'a': %r, 'b': %r} % (TD[new][a], TD[new][b])) +assert(TD[new] == {'a': 1, 'b': {'aa': 'bb', 'cc': None}}) if TD[new][a] == 1: TD[new][b] = {'a': 1, 'b': 'boo', 'c': None} -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1
Alvaro Herrera wrote: Robert Haas wrote: 2. If you pg_upgrade to 9.3.7 or 9.4.2, then you may have datminmxid values which are equal to the next-mxid counter instead of the correct value; in other words, they are too new. What you describe is what happens if you upgrade from 9.2 or earlier. Oh, you're referring to pg_database values, not the ones in pg_control. Ugh :-( This invalidates my argument that there's nothing to fix, obviously ... it's clearly broken as is. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
On 05/28/2015 08:10 PM, Stephen Frost wrote: JD, This seems reasonable to me. It's in line with the recent move from contrib to bin. It'll just be quite a bit bigger of an undertaking. (50 threads to discuss the merits of each module separately?) Maybe start by picking the top 5 and sort those out. The thing is, we don't have that many to argue about now, in fact: Alright, I'll bite. :) I knew somebody eventually would ;) F.1. adminpack Need it- pgAdmin can't senibly install it or even include it in some way, and it'd be *very* painful to not have it for a lot of users. Fair enough, although keep in mind we aren't telling people pgAdmin isn't useful. We are just pushing it out of core. Who runs from source except developers? Distributions would take care of this for us. F.2. auth_delay Should be a core feature. Having this in a contrib module is silly. +1 F.3. auto_explain Move to extension directory in the repo. +1 F.4. btree_gin F.5. btree_gist Both of these should simply be in core. +1 F.6. chkpass F.7. citext F.8. cube Push out and/or keep it in contrib in repo. Agreed except citext which I think should install by default. F.9. dblink Move to extension directory in the repo. Agreed. F.10. dict_int F.11. dict_xsyn Looks like these are just examples? Maybe move to an 'examples' directory, or into src/test/modules, or keep in contrib. Agreed. F.12. earthdistance Depends on cube, so, same as whatever happens there. I don't think extensions-in-repo should depend on contrib modules, as a rule. F.13. file_fdw F.14. fuzzystrmatch F.15. hstore Move to extension directory in the repo. Disagree, hstore should be in core. Rest, fine. F.16. intagg Obsolute, per the docs. Push out and deal with the break, or keep it in contrib in repo. Spelling mistake aside ;) agreed F.17. intarray Move to extension directory in the repo. Agreed F.18. isn F.19. lo F.20. ltree F.21. pageinspect Move to extension directory in the repo. Except for maybe pageinspect, agreed. F.22. passwordcheck Should be an in-core capability and not shoved off into an extension. Agreed F.23. pg_buffercache Pull it into core. Agreed F.24. pgcrypto Move to extension directory in the repo. Sure. F.25. pg_freespacemap Should be in core. Agreed. F.26. pg_prewarm F.27. pgrowlocks Should be in core. With a change to pg_rowlocks, agreed. F.28. pg_stat_statements I'd actually prefer that this be in core, but I'd be alright with it being in extension directory in the repo. Agreed just not enabled by default. F.29. pgstattuple F.30. pg_trgm Should be in core. Agreed. F.31. postgres_fdw Move to extension directory in the repo. (actually, I'd be fine with both this and file_fdw being included in core.. I'm just not 100% sure how that'd look) I think they should be in core, not all FDWs of course but file and postgres are kind of obvious to me. F.32. seg F.33. sepgsql Move to extension directory in the repo. Agreed. F.34. spi Maybe pull some into core.. or maybe all, or move to an extension. No opinion. F.35. sslinfo Should be in core. Agreed. F.36. tablefunc My gut reaction is that it should be in core for crosstab(), but David's talking about implementing PIVOT, so.. Easy... give it 1 more release. If we get PIVOT, then we don't need it, if we don't... all the better for us. F.37. tcn Should be in core, imv, but not a position I hold very strongly. no opinion F.38. test_decoding Should be in src/test/modules, or maybe some 'examples' dir. agreed F.39. tsearch2 I'm inclined to just drop this.. Or we could keep it in contrib in the repo. Release a final release as a pgxn capable extension and rip it out. F.40. tsm_system_rows F.41. tsm_system_time These should be in core. Agreed F.42. unaccent Move to extension directory in the repo. Agreed F.43. uuid-ossp This one probably deserves its own thread, heh.. F.44. xml2 Push it out, or keep it in contrib in repo. Look at these, a lot of them are obvious... just include for goodness sakes: Agreed. pg_trgm has been in contrib for a decade of not more. Either rip it out or include it by default. postgres_fdw (by the time we make the change it will be two releases) Agreed. sepgsql has no business being in core, it is: 1. An extension 2. About as linux specific as we can get Not sure that being platform agnostic has to be a requirement of being in the repo or being an extension in the repo... It does need some work though and discussion has recently started about if the sepgsql types defined in the SELinux reference policy should continue to exist or if they should be changed. I'm following that discussion with interest. Adminpack: It is for pgAdmin, give it back or push it into core proper I'd keep it in the repo as an extension. Pushing it out would just
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
Abhijit Menon-Sen a...@2ndquadrant.com writes: I have to leave shortly, so I'll look at the initdb cleanup tomorrow. Here's a revision of that patch that's more along the lines of what you committed. Will look at that tomorrow ... It occurred to me that we should probably also silently skip EACCES on opendir and stat/lstat in walkdir. That's what the attached initdb patch does. If you think it's a good idea, there's also a small fixup attached for the backend version too. Actually, I was seriously considering removing the don't-log-EACCES special case (at least for files, perhaps not for directories). The reason being that it's darn hard to verify that the code is doing what it's supposed to when there is no simple way to get it to log any complaints. I left it as you wrote it in the committed version, but I think both that and the ETXTBSY special case do not really have value as long as their only effect is to suppress a LOG entry. Speaking of which, could somebody test that the committed patch does what it's supposed to on Windows? You were worried upthread about whether the tests for symlinks (aka junction points) behaved correctly, and I have no way to check that either. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
At 2015-05-28 17:37:16 -0400, t...@sss.pgh.pa.us wrote: I've committed this after some mostly-cosmetic rearrangements. Thank you. I have to leave shortly, so I'll look at the initdb cleanup tomorrow. Here's a revision of that patch that's more along the lines of what you committed. It occurred to me that we should probably also silently skip EACCES on opendir and stat/lstat in walkdir. That's what the attached initdb patch does. If you think it's a good idea, there's also a small fixup attached for the backend version too. -- Abhijit From 26bb25e99a2954381587bbfe296a50b19a7f047c Mon Sep 17 00:00:00 2001 From: Abhijit Menon-Sen a...@2ndquadrant.com Date: Thu, 28 May 2015 22:02:29 +0530 Subject: Make initdb -S behave like xlog.c:fsync_pgdata() In particular, it should silently skip unreadable/unexpected files in the data directory and follow symlinks only for pg_xlog and under pg_tblspc. --- src/bin/initdb/initdb.c | 305 1 file changed, 151 insertions(+), 154 deletions(-) diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index f1c4920..9d5478c 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -71,6 +71,13 @@ /* Ideally this would be in a .h file, but it hardly seems worth the trouble */ extern const char *select_default_timezone(const char *share_path); +/* Define PG_FLUSH_DATA_WORKS if we have an implementation for pg_flush_data */ +#if defined(HAVE_SYNC_FILE_RANGE) +#define PG_FLUSH_DATA_WORKS 1 +#elif defined(USE_POSIX_FADVISE) defined(POSIX_FADV_DONTNEED) +#define PG_FLUSH_DATA_WORKS 1 +#endif + static const char *auth_methods_host[] = {trust, reject, md5, password, ident, radius, #ifdef ENABLE_GSS gss, @@ -217,10 +224,13 @@ static char **filter_lines_with_token(char **lines, const char *token); #endif static char **readfile(const char *path); static void writefile(char *path, char **lines); -static void walkdir(char *path, void (*action) (char *fname, bool isdir)); -static void walktblspc_links(char *path, void (*action) (char *fname, bool isdir)); -static void pre_sync_fname(char *fname, bool isdir); -static void fsync_fname(char *fname, bool isdir); +static void walkdir(const char *path, + void (*action) (const char *fname, bool isdir), + bool process_symlinks); +#ifdef PG_FLUSH_DATA_WORKS +static void pre_sync_fname(const char *fname, bool isdir); +#endif +static void fsync_fname_ext(const char *fname, bool isdir); static FILE *popen_check(const char *command, const char *mode); static void exit_nicely(void); static char *get_id(void); @@ -248,7 +258,7 @@ static void load_plpgsql(void); static void vacuum_db(void); static void make_template0(void); static void make_postgres(void); -static void perform_fsync(void); +static void fsync_pgdata(void); static void trapsig(int signum); static void check_ok(void); static char *escape_quotes(const char *src); @@ -521,56 +531,58 @@ writefile(char *path, char **lines) * Adapted from copydir() in copydir.c. */ static void -walkdir(char *path, void (*action) (char *fname, bool isdir)) +walkdir(const char *path, + void (*action) (const char *fname, bool isdir), + bool process_symlinks) { DIR *dir; - struct dirent *direntry; - char subpath[MAXPGPATH]; + struct dirent *de; dir = opendir(path); if (dir == NULL) { - fprintf(stderr, _(%s: could not open directory \%s\: %s\n), -progname, path, strerror(errno)); - exit_nicely(); + if (errno != EACCES) + fprintf(stderr, _(%s: could not open directory \%s\: %s\n), + progname, path, strerror(errno)); + return; } - while (errno = 0, (direntry = readdir(dir)) != NULL) + while (errno = 0, (de = readdir(dir)) != NULL) { + char subpath[MAXPGPATH]; struct stat fst; + int sret; - if (strcmp(direntry-d_name, .) == 0 || - strcmp(direntry-d_name, ..) == 0) + if (strcmp(de-d_name, .) == 0 || + strcmp(de-d_name, ..) == 0) continue; - snprintf(subpath, MAXPGPATH, %s/%s, path, direntry-d_name); + snprintf(subpath, MAXPGPATH, %s/%s, path, de-d_name); + + if (process_symlinks) + sret = stat(subpath, fst); + else + sret = lstat(subpath, fst); - if (lstat(subpath, fst) 0) + if (sret 0) { - fprintf(stderr, _(%s: could not stat file \%s\: %s\n), - progname, subpath, strerror(errno)); - exit_nicely(); + if (errno != EACCES) +fprintf(stderr, _(%s: could not stat file \%s\: %s\n), + progname, subpath, strerror(errno)); + continue; } - if (S_ISDIR(fst.st_mode)) - walkdir(subpath, action); - else if (S_ISREG(fst.st_mode)) + if (S_ISREG(fst.st_mode)) (*action) (subpath, false); + else if (S_ISDIR(fst.st_mode)) + walkdir(subpath, action, false); } if (errno) - { fprintf(stderr, _(%s: could not read directory \%s\: %s\n), progname, path, strerror(errno)); - exit_nicely(); - } - if (closedir(dir)) - { - fprintf(stderr, _(%s: could not close directory \%s\: %s\n), -progname, path,
Re: [HACKERS] auto_explain sample rate
Craig Ringer cr...@2ndquadrant.com writes: It's sometimes desirable to collect auto_explain data with ANALYZE in order to track down hard-to-reproduce issues, but the performance impacts can be pretty hefty on the DB. I'm inclined to add a sample rate to auto_explain so that it can trigger only on x percent of queries, That sounds reasonable ... and also add a sample test hook that can be used to target statements of interest more narrowly (using a C hook function). You'd have to be pretty desperate, *and* knowledgeable, to write a C function for that. Can't we invent something a bit more user-friendly for the purpose? No idea what it should look like though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hstore_plpython regression test does not work on Python 3
29.05.2015, 03:12, Peter Eisentraut kirjoitti: On 5/26/15 5:19 PM, Oskari Saarenmaa wrote: [1] http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=jaguarundibr=HEAD Looks like that animal uses Python 3.4. Python 3.3 and newer versions default to using a random seed for hashing objects into dicts which makes the order of dict elements random; see https://docs.python.org/3/using/cmdline.html#cmdoption-R Ah, good catch. That explains the, well, randomness. I can reproduce the test failures with PYTHONHASHSEED=2. But I haven't been successful getting that environment variable set so that it works in the installcheck case. Instead, I have rewritten the tests to use asserts instead of textual comparisons. See attached patch. Comments? Looks good to me. / Oskari -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
Le 29 mai 2015 5:33 AM, Joshua D. Drake j...@commandprompt.com a écrit : On 05/28/2015 08:10 PM, Stephen Frost wrote: JD, This seems reasonable to me. It's in line with the recent move from contrib to bin. It'll just be quite a bit bigger of an undertaking. (50 threads to discuss the merits of each module separately?) Maybe start by picking the top 5 and sort those out. The thing is, we don't have that many to argue about now, in fact: Alright, I'll bite. :) I knew somebody eventually would ;) F.1. adminpack Need it- pgAdmin can't senibly install it or even include it in some way, and it'd be *very* painful to not have it for a lot of users. Painful? The adminpack allows pgadmin to change the config files remotely with a UI that doesn't make it easy to say the least. You may well trash your pg_hba.conf file and not be able to connect again after reloading. It also allows you to read your log files remotely... if it only contains UTF8 characters (which doesn't happen much with my french customers). And loading a 1GB log file is definitely painful. I would be of the idea it doesn't give much (though it doesn't mean I want it to be dropped), but I'm pretty much telling my customers to drop it whenever I can. Fair enough, although keep in mind we aren't telling people pgAdmin isn't useful. We are just pushing it out of core. Who runs from source except developers? Distributions would take care of this for us. Yeah. The way I see this is that distributions would make packages for each extension. And I don't see the difference between doing a yum install postgresql94-contrib And a yum install postgresql94-adminpack for example, except I would have to run various yum install commands to install every extensions I need, but this is much better for me than bloating my system with extensions I never use (earthdistance comes to mind for example). FWIW, I don't mind which one we put in core and which one we put out of core. But I like Joshua's idea of getting rid of contribs and pushing them out as any other extensions. F.2. auth_delay Should be a core feature. Having this in a contrib module is silly. +1 F.3. auto_explain Move to extension directory in the repo. +1 F.4. btree_gin F.5. btree_gist Both of these should simply be in core. +1 F.6. chkpass F.7. citext F.8. cube Push out and/or keep it in contrib in repo. Agreed except citext which I think should install by default. F.9. dblink Move to extension directory in the repo. Agreed. F.10. dict_int F.11. dict_xsyn Looks like these are just examples? Maybe move to an 'examples' directory, or into src/test/modules, or keep in contrib. Agreed. F.12. earthdistance Depends on cube, so, same as whatever happens there. I don't think extensions-in-repo should depend on contrib modules, as a rule. F.13. file_fdw F.14. fuzzystrmatch F.15. hstore Move to extension directory in the repo. Disagree, hstore should be in core. Rest, fine. F.16. intagg Obsolute, per the docs. Push out and deal with the break, or keep it in contrib in repo. Spelling mistake aside ;) agreed F.17. intarray Move to extension directory in the repo. Agreed F.18. isn F.19. lo F.20. ltree F.21. pageinspect Move to extension directory in the repo. Except for maybe pageinspect, agreed. F.22. passwordcheck Should be an in-core capability and not shoved off into an extension. Agreed F.23. pg_buffercache Pull it into core. Agreed F.24. pgcrypto Move to extension directory in the repo. Sure. F.25. pg_freespacemap Should be in core. Agreed. F.26. pg_prewarm F.27. pgrowlocks Should be in core. With a change to pg_rowlocks, agreed. F.28. pg_stat_statements I'd actually prefer that this be in core, but I'd be alright with it being in extension directory in the repo. Agreed just not enabled by default. F.29. pgstattuple F.30. pg_trgm Should be in core. Agreed. F.31. postgres_fdw Move to extension directory in the repo. (actually, I'd be fine with both this and file_fdw being included in core.. I'm just not 100% sure how that'd look) I think they should be in core, not all FDWs of course but file and postgres are kind of obvious to me. F.32. seg F.33. sepgsql Move to extension directory in the repo. Agreed. F.34. spi Maybe pull some into core.. or maybe all, or move to an extension. No opinion. F.35. sslinfo Should be in core. Agreed. F.36. tablefunc My gut reaction is that it should be in core for crosstab(), but David's talking about implementing PIVOT, so.. Easy... give it 1 more release. If we get PIVOT, then we don't need it, if we don't... all the better for us. F.37. tcn Should be in core, imv, but not a position I hold very strongly. no opinion F.38. test_decoding Should be in src/test/modules, or maybe some 'examples' dir.
Re: [HACKERS] pg_upgrade resets timeline to 1
On Wed, May 27, 2015 at 05:40:09PM +0200, Christoph Berg wrote: commit 4c5e060049a3714dd27b7f4732fe922090edea69 Author: Bruce Momjian br...@momjian.us Date: Sat May 16 00:40:18 2015 -0400 pg_upgrade: force timeline 1 in the new cluster Previously, this prevented promoted standby servers from being upgraded because of a missing WAL history file. (Timeline 1 doesn't need a history file, and we don't copy WAL files anyway.) Pardon me for starting a fresh thread, but I couldn't find where this was discussed. I've just had trouble getting barman to work again after a 9.1-9.4.2 upgrade, and I think part of the problem was that the WAL for this cluster got reset from timeline 2 to 1, which made barman's incoming WALs processor drop the files, probably because the new filename 0001... is now less than the 0002... before. It looks like an upgrade from 9.1.x to 9.3.0 or later has always set the new timeline identifier (TLI) to 1. My testing confirms this for an upgrade from 9.1.16 to 9.4.1 and for an upgrade from 9.1.16 to 9.4.2, so I failed to reproduce your report. Would you verify the versions you used? If you were upgrading from 9.3.x, I _can_ reproduce that. Since the 2015-05-16 commits you cite, pg_upgrade always sets TLI=1. Behavior before those commits depended on the source and destination major versions. PostgreSQL 9.0, 9.1 and 9.2 restored the TLI regardless of source version. PostgreSQL 9.3 and 9.4 restored the TLI when upgrading from 9.3 or 9.4, but they set TLI=1 when upgrading from 9.2 or earlier. (Commit 038f3a0 introduced this inconsistent behavior of 9.3 and later.) The commit you cite fixed this symptom: http://www.postgresql.org/message-id/flat/d5359e0908278642bb1747131d62694dab225...@ausmxmbx01.mrws.biz I'm attaching a test script that I used to observe TLI assignment and to test for that problem. pg_upgrade has been restoring TLI without history files since 9.0.0 or earlier, and that was always risky. The reported symptom became possible with the introduction of the TIMELINE_HISTORY walsender command in 9.3.0. (It was hard to encounter before 9.4, because 9.3 to 9.3 pg_upgrade runs are rare outside of hacker testing.) Since you observed barman breakage less than a week after a release that changed the post-pg_upgrade TLI, it seems prudent to figure that other folks will be affected. At the same time, I don't understand why that release would prompt the first report. Any upgrade from {9.0,9.1,9.2} to {9.3,9.4} already had the behavior you experienced. Ideas? I don't expect to be able to recover through a pg_upgrade operation, but pg_upgrade shouldn't make things more complicated than they should be for backup tools. (If there's a problem with the history files, shouldn't pg_upgrade copy them instead?) In fact, I'm wondering if pg_upgrade shouldn't rather *increase* the timeline to make sure the archive_command doesn't clobber any files from the old cluster when reused in the new cluster? It's worth considering that, as a major-release change. Do note this in the documentation, though: The archive command should generally be designed to refuse to overwrite any pre-existing archive file. This is an important safety feature to preserve the integrity of your archive in case of administrator error (such as sending the output of two different servers to the same archive directory). -- http://www.postgresql.org/docs/devel/static/continuous-archiving.html upgrade-timeline.sh Description: Bourne shell script -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Row-Level Security Policies (RLS)
On 27 May 2015 at 14:51, Stephen Frost sfr...@snowman.net wrote: On 27 May 2015 at 02:42, Stephen Frost sfr...@snowman.net wrote: Now, looking at the code, I'm actually failing to see a case where we use the RowSecurityPolicy-policy_name.. Perhaps *that's* what we should be looking to remove? If we add support for restrictive policies, it would be possible, and I think desirable, to report on which policy was violated. For that, having the policy name would be handy. We might also arguably decide to enforce restrictive RLS policies in name order, like check constraints. Of course none of that means it must be kept now, but it feels like a useful field to keep nonetheless. I agree that it could be useful, but we really shouldn't have fields in the current code base which are just in case.. The one exception which comes to mind is if we should keep it for use by extensions. Those operate on an independent cycle from our major releases and would likely find having the name field useful. Hmm, when I wrote that I had forgotten that we already allow extensions to add restrictive policies. I think it would be good to allow those policies to have names, and if they do, to copy those names to any WCOs created. Then, if a WCO is violated, and it has a non-NULL policy name associated with it, we should include that in the error report. One thing which now occurs to me, however, is that, while the current coding is fine, using InvalidOid as an indicator for the default-deny policy, in general, does fall over when we consider policies added by extensions. Those policies are necessairly going to need to put InvalidOid into that field, unless they acquire an OID somehow themselves (it doesn't seem reasonable to make that a requirement, though I suppose we could..), and, therefore, perhaps we should add a boolean field which indicates which is the defaultDeny policy anyway and not use that field for that purpose. Thoughts? Actually I think a new boolean field is unnecessary, and probably the policy_id field is too. Re-reading the code in rowsecurity.c, I think it could use a bit of refactoring. Essentially what it needs to do (for permissive policies at least) is just * fetch a list of internal policies * fetch a list of external policies * concatenate them * if the resulting list is empty, add a default-deny qual and/or WCO By only building the default-deny qual/WCO at the end, rather than half-way through and pretending that it's a fully-fledged policy, the code ought to be simpler. I might get some time at the weekend, so I can take a look and see if refactoring it this way works out in practice. BTW, I just spotted another problem with the current code, which is that policies from extensions aren't being checked against the current role (policy-roles is just being ignored). I think it would be neater and safer to move the check_role_for_policy() check up and apply it to the entire concatenated list of policies, rather than having the lower level code have to worry about that. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade resets timeline to 1
On 27 May 2015 at 18:42, Bruce Momjian br...@momjian.us wrote: On Wed, May 27, 2015 at 05:40:09PM +0200, Christoph Berg wrote: commit 4c5e060049a3714dd27b7f4732fe922090edea69 Author: Bruce Momjian br...@momjian.us Date: Sat May 16 00:40:18 2015 -0400 pg_upgrade: force timeline 1 in the new cluster Previously, this prevented promoted standby servers from being upgraded because of a missing WAL history file. (Timeline 1 doesn't need a history file, and we don't copy WAL files anyway.) Pardon me for starting a fresh thread, but I couldn't find where this was discussed. I've just had trouble getting barman to work again after a 9.1-9.4.2 upgrade, and I think part of the problem was that the WAL for this cluster got reset from timeline 2 to 1, which made barman's incoming WALs processor drop the files, probably because the new filename 0001... is now less than the 0002... before. I don't expect to be able to recover through a pg_upgrade operation, but pg_upgrade shouldn't make things more complicated than they should be for backup tools. (If there's a problem with the history files, shouldn't pg_upgrade copy them instead?) In fact, I'm wondering if pg_upgrade shouldn't rather *increase* the timeline to make sure the archive_command doesn't clobber any files from the old cluster when reused in the new cluster? https://bugs.debian.org/786993 Uh, WAL files and WAL history files are not compatible across PG major versions so you should never be fetching them after a major upgrade. I have noticed some people are putting their WAL files in directories with PG major version numbers to avoid this problem. We could have pg_upgrade increment the timeline and allow for missing history files, but that doesn't fix problems with non-pg_upgrade upgrades, which also should never be sharing WAL files from previous major versions. Maybe, but I thought we had a high respect for backwards compatibility and we clearly just broke quite a few things that didn't need to be broken. Hmm, it looks like the change to TimeLine 1 is just a kludge anyway. The rule that TimeLine 1 doesn't need a history file is itself a hack. What we should be saying is that the last timeline doesn't need a history file. Then no change is needed here. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] [COMMITTERS] pgsql: Add pg_audit, an auditing extension
On 05/28/2015 06:04 AM, Joshua D. Drake wrote: On 05/27/2015 07:02 PM, Stephen Frost wrote: regardless of if they are included in the main git repo or not. Being in the repo represents the support of the overall community and PGDG, which is, understandably in my view, highly valuable to the organizations which look to PostgreSQL as an open source solution for their needs. I can't get behind this. Yes, there is a mysticism about the core support of modules but it is just that mysticism. People are going to run what the experts tell them to run. If pg_audit is that good the experts will tell people to use it... Yeah, there are many popular tools and extensions that people use together with PostgreSQL. PostGIS, pgbouncer, slony, pgbarman etc. The criteria for what should be in contrib, in core, or a 3rd party extension is a contentious topic. The argument here is basically that if something is in core, it's officially supported and endorsed by the PGDG. If that's the argument for putting this in contrib, then you have to ask yourself if the PGDG community is really willing to endorse this. I'm hearing a lot of pushback from other committers, which points to no, even if all their technical arguments and worries turn out to be wrong. I wrote the above without looking at the code or the documentation. I haven't followed the discussion at all. I'm now looking at the documentation, and have some comments based on just that: * I think it's a bad idea for the audit records to go to the same log as other messages. Maybe that's useful as an option, but in general there is no reason to believe that the log format used for general logging is also a good format for audit logging. You probably wouldn't care about process ID for audit logging, for example. Also, how do you prevent spoofing audit records with something like SELECT \nAUDIT: SESSION, 2, Maybe the log formatting options can be used to prevent that, but just by looking at the examples in the manual, I don't see how to do it. * I don't understand how the pg_audit.role setting and the audit role system works. * Using GUCs for configuring it seems like a bad idea, because: 1. it's not flexible enough. How do you specify that all READs on super_secret table must be logged, but on less_secret table, it's enough to log only WRITEs? 2. GUCs can be changed easily on-the-fly, and configured flexibly. But that's not really what you want for auditing. You want to have a clear specification in one place. You'd want it to be more like pg_hba.conf than postgresql.conf. Or more like Mandatory Access Control, rather than Discretionary Access Control. A separate config file in $PGDATA would be a better way to configure this. You would then have all the configuration in one place, and that file could have a more flexible format, with per-schema rules etc. (Wouldn't have to implement all that in the first version, but that's the direction this should go to) I recommend making pg_audit an external extension, not a contrib module. As an extension, it can have its own life-cycle and not be tied to PostgreSQL releases. That would be a huge benefit for pg_audit. There is a lot of potential for new features to be added: more flexible configuration, more details to be logged, more ways to log, email alerts, etc. As an extension, all of those things could be developed independent of the PostgreSQL life-cycle. If there is need to fix vulnerabilities or other bugs, those can also be fixed independently of PostgreSQL minor releases. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Enhanced ALTER OPERATOR
On Wed, May 27, 2015 at 9:28 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, May 18, 2015 at 10:21 AM, Tom Lane t...@sss.pgh.pa.us wrote: Uriy Zhuravlev u.zhurav...@postgrespro.ru writes: I have attached a patch that extends ALTER OPERATOR to support COMMUTATOR, NEGATOR, RESTRICT and JOIN. There are fairly significant reasons why we have not done this, based on the difficulty of updating existing cached plans that might have incidentally depended on those operator properties during creation. Perhaps it's all right to simply ignore such concerns, but I would like to see a defense of why. I don't think there's a direct problem with cached plans, because it looks like plancache.c blows away the entire plan cache for any change to pg_operator. OperatorUpd() can already update oprcom and oprnegate, which seems to stand for the proposition that it's OK to set those from InvalidOid to something else. But that doesn't prove that other kinds of changes are safe. A search of other places where oprcom is used in the code led me to ComputeIndexAttrs(). If an operator whose commutator is itself were changed so that the commutator was anything else, then we'd end up with a broken exclusion constraint. That's a problem. targetIsInSortList is run during parse analysis, and that too tests whether sortop == get_commutator(scl-sortop). Those decisions wouldn't get reevaluated if the truth of that expression changed after the fact, which I suspect is also a problem. Could we address both this problems by denying changing existing commutators and negator? ISTM that setting absent commutator and negator is quite enough for ALTER OPERATOR. User extensions could need setting of commutator and negator because they could add new operators which don't exist before. But it's rather uncommon to unset or change commutator or negator. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [COMMITTERS] pgsql: Row-Level Security Policies (RLS)
Dean, * Dean Rasheed (dean.a.rash...@gmail.com) wrote: On 27 May 2015 at 14:51, Stephen Frost sfr...@snowman.net wrote: On 27 May 2015 at 02:42, Stephen Frost sfr...@snowman.net wrote: Now, looking at the code, I'm actually failing to see a case where we use the RowSecurityPolicy-policy_name.. Perhaps *that's* what we should be looking to remove? If we add support for restrictive policies, it would be possible, and I think desirable, to report on which policy was violated. For that, having the policy name would be handy. We might also arguably decide to enforce restrictive RLS policies in name order, like check constraints. Of course none of that means it must be kept now, but it feels like a useful field to keep nonetheless. I agree that it could be useful, but we really shouldn't have fields in the current code base which are just in case.. The one exception which comes to mind is if we should keep it for use by extensions. Those operate on an independent cycle from our major releases and would likely find having the name field useful. Hmm, when I wrote that I had forgotten that we already allow extensions to add restrictive policies. I think it would be good to allow those policies to have names, and if they do, to copy those names to any WCOs created. Then, if a WCO is violated, and it has a non-NULL policy name associated with it, we should include that in the error report. I'm certainly not against that and I agree that it'd be nicer than simply reporting that there was a violation, but we combine all of the various pieces together, no? Are we really going to be able to confidently say which policy was violated? Even if we are able to say the first which was violated, more than one might be. Is that an issue we need to address? Perhaps not, but it's something to consider. One thing which now occurs to me, however, is that, while the current coding is fine, using InvalidOid as an indicator for the default-deny policy, in general, does fall over when we consider policies added by extensions. Those policies are necessairly going to need to put InvalidOid into that field, unless they acquire an OID somehow themselves (it doesn't seem reasonable to make that a requirement, though I suppose we could..), and, therefore, perhaps we should add a boolean field which indicates which is the defaultDeny policy anyway and not use that field for that purpose. Actually I think a new boolean field is unnecessary, and probably the policy_id field is too. Re-reading the code in rowsecurity.c, I think it could use a bit of refactoring. Essentially what it needs to do (for permissive policies at least) is just * fetch a list of internal policies * fetch a list of external policies * concatenate them * if the resulting list is empty, add a default-deny qual and/or WCO By only building the default-deny qual/WCO at the end, rather than half-way through and pretending that it's a fully-fledged policy, the code ought to be simpler. I tend to agree. I might get some time at the weekend, so I can take a look and see if refactoring it this way works out in practice. That would certainly be fantastic and most appreciated. Beyond that, we have the add-a-default-deny-policy logic in multiple places, as I recall, and I wonder if that's overkill and done out of paranoia rather than sound judgement. I'm certainly not suggesting that we take unnecessary risks, and so perhaps we should keep it, but I do think it's something which should be reviewed and considered (I've been planning to do so for a while, in fact). BTW, I just spotted another problem with the current code, which is that policies from extensions aren't being checked against the current role (policy-roles is just being ignored). I think it would be neater and safer to move the check_role_for_policy() check up and apply it to the entire concatenated list of policies, rather than having the lower level code have to worry about that. Excellent point... We should address that and your proposed approach sounds reasonable to me. If you're able to work on that this weekend, I'd be happy to review next week. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_upgrade resets timeline to 1
Re: Noah Misch 2015-05-28 20150528072721.ga4102...@tornado.leadboat.com I've just had trouble getting barman to work again after a 9.1-9.4.2 upgrade, and I think part of the problem was that the WAL for this cluster got reset from timeline 2 to 1, which made barman's incoming WALs processor drop the files, probably because the new filename 0001... is now less than the 0002... before. It looks like an upgrade from 9.1.x to 9.3.0 or later has always set the new timeline identifier (TLI) to 1. My testing confirms this for an upgrade from 9.1.16 to 9.4.1 and for an upgrade from 9.1.16 to 9.4.2, so I failed to reproduce your report. Would you verify the versions you used? If you were upgrading from 9.3.x, I _can_ reproduce that. Sorry, the 9.1 was a typo, the system was on 9.2.11 before/during pg_upgrade. Do note this in the documentation, though: The archive command should generally be designed to refuse to overwrite any pre-existing archive file. This is an important safety feature to preserve the integrity of your archive in case of administrator error (such as sending the output of two different servers to the same archive directory). -- http://www.postgresql.org/docs/devel/static/continuous-archiving.html (Except that this wasn't possible in practise since ~9.2 until very recently because some files got archived again during a timeline switch :-/ ) Christoph -- c...@df7cb.de | http://www.df7cb.de/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add pg_audit, an auditing extension
On 05/28/2015 11:14 AM, Stephen Frost wrote: * Heikki Linnakangas (hlinn...@iki.fi) wrote: 1. it's not flexible enough. How do you specify that all READs on super_secret table must be logged, but on less_secret table, it's enough to log only WRITEs? This is actually what that pg_audit.role setting was all about. To do the above, you would do: CREATE ROLE pgaudit; SET pg_audit.role = pgaudit; GRANT SELECT ON TABLE super_secret TO pgaudit; GRANT INSERT, UPDATE, DELETE ON TABLE less_secret TO pgaudit; With this, all commands executed which require SELECT rights on the table super_secret are logged, and all commands execute which require INSERT, UPDATE, or DELETE on the less_secret table are logged. Ah, I see. Wow, that's really Rube Goldbergian. 2. GUCs can be changed easily on-the-fly, and configured flexibly. But that's not really what you want for auditing. You want to have a clear specification in one place. You'd want it to be more like pg_hba.conf than postgresql.conf. Or more like Mandatory Access Control, rather than Discretionary Access Control. I certainly appreciate the MAC vs. DAC discussion here, but I don't believe our AUDIT capability should explicitly require restarts of PG to be adjusted. Sure, I didn't mean we should require a restart. Requiring SIGHUP seems reasonable though. A separate config file in $PGDATA would be a better way to configure this. You would then have all the configuration in one place, and that file could have a more flexible format, with per-schema rules etc. (Wouldn't have to implement all that in the first version, but that's the direction this should go to) The difficulty with a separate config file is that we don't have any way of properly attaching information to the existing tables in the database- table renames, dropped columns, etc, all become an issue then. True. I wouldn't be too worried about that though. If you rename a table, that hopefully gets flagged in the audit log and someone will ask why did you rename that table?. If you're paranoid enough, you could have a catch-all rule that audits everything that's not explicitly listed, so a renamed table always gets audited. Of course, you could still support a labeling system, similar to the pg_audit.role setting and GRANTs. For example, you could tag tables with a label like reads_need_auditing, and then in the config file, you could specify that all READs on tables with that label need to be audited. I think security labels would be a better system to abuse for that than GRANT. You'd want to just label the objects, and specify the READ/WRITE etc. attributes in the config file. Who do you assume you can trust? Is it OK if the table owner can disable/enable auditing for that table? I'm certainly all about adding new capabilities to pg_audit, but, as discussed elsewhere, I believe we really want many of those same capabilities in core (eg: being able to send logs to different places; my thinking is a rabbitMQ type of store rather than email, but perhaps we could support both..) and that's what I'm hoping to work towards in the near future. Sure, adding features like sending logs to different places in core is totally reasonable, independently of pg_audit. (Or not, but in any case, it's orthogonal :-) ) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] pg_get_functiondef() does not show LEAKPROOF for leakproof functions
On Thu, May 28, 2015 at 5:52 PM, Jeevan Chalke jeevan.cha...@enterprisedb.com wrote: If function is created with the LEAKPROOF option, then pg_get_functiondef() does not show that in the returned definition. Is it expected OR are we missing that option in pg_get_functiondef(). However only superuser can define a leakproof function. Was this the reason we are not showing that in pg_get_functiondef() output? I don't think we should hide this detail. Agreed. I guess that it has been simply forgotten. pg_proc can be easily queried, so functions marked as leakproof are easy to find out in any case. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1
On Thu, May 28, 2015 at 8:03 AM, Robert Haas robertmh...@gmail.com wrote: Steve, is there any chance we can get your pg_controldata output and a list of all the files in pg_clog? Err, make that pg_multixact/members, which I assume is at issue here. You didn't show us the DETAIL line from this message, which would presumably clarify: FATAL: could not access status of transaction 1 And I'm still wrong, probably. The new code in 9.4.2 cares about being able to look at an *offsets* file to find the corresponding member offset. So most likely it is an offsets file that is missing here. The question is, how are we ending up with an offsets file that is referenced by the control file but not actually present on disk? It seems like it would be good to compare the pg_controldata output to what is actually present in pg_multixact/offsets (hopefully that's the right directory, now that I'm on my third try) and try to understand what is going on here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Memory leak with XLogFileCopy since de768844 (WAL file with .partial)
Hi all, Since commit de768844, XLogFileCopy of xlog.c returns to caller a pstrdup'd string that can be used afterwards for other things. XLogFileCopy is used in only one place, and it happens that the result string is never freed at all, leaking memory. Attached is a patch to fix the problem. Regards, -- Michael diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 087b6be..2bb3acc 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -5242,7 +5242,11 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog) */ tmpfname = XLogFileCopy(NULL, xlogfname, endOfLog % XLOG_SEG_SIZE); if (!InstallXLogFileSegment(endLogSegNo, tmpfname, false, 0, false)) + { + pfree(tmpfname); elog(ERROR, InstallXLogFileSegment should not have failed); + } + pfree(tmpfname); } else { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible pointer dereference
Robert Haas robertmh...@gmail.com writes: On Wed, May 27, 2015 at 8:57 PM, Haribabu Kommi kommi.harib...@gmail.com wrote: By correcting the following way will solve the problem. return ts ? (*ts != 0) : false; instead of retun *ts != 0; Attached a patch for it. If the only caller always passes a valid pointer, there's no point in adding this check. We have many functions in our source base that assume that the caller will pass a valid pointer, and changing them all would make the code bigger, harder to read, and possibly slower, without any real benefit. Well, we should either install something like Haribabu's patch, or else remove the existing tests in the function that allow ts to be NULL. And the function's API contract comment needs to be clarified in either case; the real bug here is lack of a specification. I don't particularly have an opinion on whether it's valuable to allow this function to be called without receiving a timestamp back. Perhaps the authors of the patch can comment on that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade resets timeline to 1
On Thu, May 28, 2015 at 08:47:07AM +0100, Simon Riggs wrote: We could have pg_upgrade increment the timeline and allow for missing history files, but that doesn't fix problems with non-pg_upgrade upgrades, which also should never be sharing WAL files from previous major versions. Maybe, but I thought we had a high respect for backwards compatibility and we clearly just broke quite a few things that didn't need to be broken. I can't break something that was never intended to work, and mixing WAL from previous major versions was never designed to work. Hmm, it looks like the change to TimeLine 1 is just a kludge anyway. The rule that TimeLine 1 doesn't need a history file is itself a hack. What we should be saying is that the last timeline doesn't need a history file. Then no change is needed here. Yes, that would make a lot more sense than what we have now, but this had to be backpatched, so reverting to the 9.3 and earlier behavior seemed logical. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
Hi. Here's an updated patch for the fsync problem(s). A few points that may be worth considering: 1. I've made the ReadDir → ReadDirExtended change, but in retrospect I don't think it's really worth it. Using ReadDir and letting it die is probably fine. (Aside: I left ReadDirExtended static for now.) 2. Robert, are you comfortable with what fsync_pgdata() does in xlog.c? I wasn't sure if I should move that to fd.c as well. I think it's borderline OK for now. 3. There's a comment in fsync_fname that we need to open directories with O_RDONLY and non-directories with O_RDWR. Does this also apply to pre_sync_fname (i.e. sync_file_range and posix_fadvise) on any platforms we care about? It doesn't seem so, but I'm not sure. 4. As a partial aside, why does fsync_fname use OpenTransientFile? It looks like it should use BasicOpenFile as pre_sync_fname does. We close the fd immediately after calling fsync anyway. Or is there some reason I'm missing that we should use OpenTransientFile in pre_sync_fname too? (Aside²: it's pointless to specify S_IRUSR|S_IWUSR in fsync_fname since we're not setting O_CREAT. Minor, so will submit separate patch.) 5. I made walktblspc_entries use stat() instead of lstat(), so that we can handle symlinks and directories the same way. Note that we won't continue to follow links inside the sub-directories because we use walkdir instead of recursing. But this depends on S_ISDIR() returning true after stat() on Windows with a junction. Does that work? And are the entries in pb_tblspc on Windows actually junctions? I've tested this with unreadable files in PGDATA and junk inside pb_tblspc. Feedback welcome. I have a separate patch to initdb with the corresponding changes, which I will post after dinner and a bit more testing. -- Abhijit From 491dcb8c7203fd992dd9c441f5463a75c88e6f52 Mon Sep 17 00:00:00 2001 From: Abhijit Menon-Sen a...@2ndquadrant.com Date: Thu, 28 May 2015 17:22:18 +0530 Subject: Skip unreadable files in fsync_pgdata; follow only expected symlinks --- src/backend/access/transam/xlog.c | 40 +- src/backend/storage/file/fd.c | 262 ++ src/include/storage/fd.h | 6 +- 3 files changed, 226 insertions(+), 82 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 087b6be..32447e7 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -11605,20 +11605,53 @@ SetWalWriterSleeping(bool sleeping) /* * Issue fsync recursively on PGDATA and all its contents. + * + * We fsync regular files and directories wherever they are, but we + * follow symlinks only for pg_xlog and under pg_tblspc. */ static void fsync_pgdata(char *datadir) { + bool xlog_is_symlink; + char pg_xlog[MAXPGPATH]; + char pg_tblspc[MAXPGPATH]; + struct stat st; + if (!enableFsync) return; + snprintf(pg_xlog, MAXPGPATH, %s/pg_xlog, datadir); + snprintf(pg_tblspc, MAXPGPATH, %s/pg_tblspc, datadir); + + /* + * If pg_xlog is a symlink, then we need to recurse into it separately, + * because the first walkdir below will ignore it. + */ + xlog_is_symlink = false; + + if (lstat(pg_xlog, st) 0) + ereport(ERROR, +(errcode_for_file_access(), +errmsg(could not stat directory \pg_xlog\: %m))); + +#ifndef WIN32 + if (S_ISLNK(st.st_mode)) + xlog_is_symlink = true; +#else + if (pgwin32_is_junction(subpath)) + xlog_is_symlink = true; +#endif + /* * If possible, hint to the kernel that we're soon going to fsync the data * directory and its contents. */ #if defined(HAVE_SYNC_FILE_RANGE) || \ (defined(USE_POSIX_FADVISE) defined(POSIX_FADV_DONTNEED)) - walkdir(datadir, pre_sync_fname); + walkdir(DEBUG1, datadir, pre_sync_fname); + if (xlog_is_symlink) + walkdir(DEBUG1, pg_xlog, pre_sync_fname); + walktblspc_entries(DEBUG1, pg_tblspc, pre_sync_fname); #endif /* @@ -11628,5 +11661,8 @@ fsync_pgdata(char *datadir) * file fsyncs don't guarantee that the directory entry for the file is * synced. */ - walkdir(datadir, fsync_fname); + walkdir(LOG, datadir, fsync_fname_ext); + if (xlog_is_symlink) + walkdir(LOG, pg_xlog, fsync_fname_ext); + walktblspc_entries(LOG, pg_tblspc, fsync_fname_ext); } diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index 68d43c6..916f8b5 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -1921,6 +1921,37 @@ TryAgain: } /* + * Read a directory opened with AllocateDir and return the next dirent. + * On error, ereports at a caller-specified level and returns NULL if + * the level is not ERROR or more severe. + */ +static struct dirent * +ReadDirExtended(int elevel, DIR *dir, const char *dirname) +{ + struct dirent *dent; + + /* Give a generic message for AllocateDir failure, if caller didn't */ + if (dir == NULL) + { + ereport(elevel, +(errcode_for_file_access(), + errmsg(could not
Re: [HACKERS] rhel6 rpm file locations
Are there any other packagers following the Fedora 'standards' that you are aware of? On Wed, May 27, 2015 at 2:40 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, May 27, 2015 at 3:25 PM, Ted Toth txt...@gmail.com wrote: On Wed, May 27, 2015 at 1:31 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, May 27, 2015 at 11:35 AM, Ted Toth txt...@gmail.com wrote: I'm trying to use a newer version than is available from RH in our custom distro but am having problems install both x86_64 and i686 rpms because file conflicts. We have some i686 packages that use libgdal which pulls in libpq which ends up in the same location in both the x86_64 and i686 postgresql lib and devel rpms. Why doesn't postgresql use _libdir and other standard rpm macros? From where did you get the RPMs that you are using? There is more than one set, and they are maintained by different people. http://yum.postgresql.org/9.4/redhat/rhel-6.5-x86_64/ Those are maintained by Devrim Gunduz and Jeff Frost. Adding them to the Cc list. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] More Vacuum Statistics
Naoya Anzai nao-an...@xc.jp.nec.com writes: In my much experience up until now,I have an idea that we can add 2 new vacuum statistics into pg_stat_xxx_tables. Adding new stats in that way requires adding per-table counters, which bloat the statistics files (especially in database with very many tables). I do not think you've made a case for these stats being valuable enough to justify such overhead for everybody. As far as the first one goes, I don't even think it's especially useful. There might be value in tracking the times of the last few vacuums on a table, but knowing the time for only the latest one doesn't sound like it would prove much. So I'd be inclined to think more along the lines of scanning the postmaster log for autovacuum runtimes, instead of squeezing it into the pg_stats views. A possible alternative so far as the second one goes is to add a function (perhaps in contrib/pg_freespacemap) that simply runs through a table's VM and counts the number of set bits. This would be more accurate (no risk of lost counter updates) and very possibly cheaper overall: it would take longer to find out the number when you wanted it, but you wouldn't be paying the distributed overhead of tracking it when you didn't want it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] pg_get_functiondef() does not show LEAKPROOF for leakproof functions
Michael Paquier michael.paqu...@gmail.com writes: On Thu, May 28, 2015 at 5:52 PM, Jeevan Chalke jeevan.cha...@enterprisedb.com wrote: If function is created with the LEAKPROOF option, then pg_get_functiondef() does not show that in the returned definition. Is it expected OR are we missing that option in pg_get_functiondef(). Agreed. I guess that it has been simply forgotten. pg_proc can be easily queried, so functions marked as leakproof are easy to find out in any case. Looks like a clear oversight to me. I had first thought that this might have been intentional because pg_dump needed it to act like that --- but pg_dump doesn't use pg_get_functiondef. I think it was simply forgotten. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade resets timeline to 1
On Thu, May 28, 2015 at 10:13:14AM +0200, Christoph Berg wrote: Re: Bruce Momjian 2015-05-28 20150527221607.ga7...@momjian.us Well, if you used pg_dump/pg_restore, you would have had even larger problems as the file names would have matched. True, but even here it's possible that files get overwritten. If you had a server running on TL 1 for files 0001001..00010020, and then did a PITR at location 10, you'll have a server writing to 00020010. If you pg_upgrade that, it will keep its WAL position, but start at 1 again, overwriting files 00010011 and following. We could have pg_upgrade increment the timeline and allow for missing history files, but that doesn't fix problems with non-pg_upgrade upgrades, which also should never be sharing WAL files from previous major versions. pg_upgrade-style upgrades have a chance to know which timeline to use. That other methods have less knowledge about the old system shouldn't mean that pg_upgrade shouldn't care. That is an open question, whether pg_upgrade should try to avoid this, even if other methods do not, or should we better document not to do this. Actually, if initdb could be told to start at an arbitrary timeline, it would be trivial to avoid the problem with pg_dump upgrades as well. Yes, that would make sense. Perhaps we should revisit this for 9.6. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade resets timeline to 1
On Thu, May 28, 2015 at 10:18:18AM +0200, Christoph Berg wrote: Re: Noah Misch 2015-05-28 20150528072721.ga4102...@tornado.leadboat.com I've just had trouble getting barman to work again after a 9.1-9.4.2 upgrade, and I think part of the problem was that the WAL for this cluster got reset from timeline 2 to 1, which made barman's incoming WALs processor drop the files, probably because the new filename 0001... is now less than the 0002... before. It looks like an upgrade from 9.1.x to 9.3.0 or later has always set the new timeline identifier (TLI) to 1. My testing confirms this for an upgrade from 9.1.16 to 9.4.1 and for an upgrade from 9.1.16 to 9.4.2, so I failed to reproduce your report. Would you verify the versions you used? If you were upgrading from 9.3.x, I _can_ reproduce that. Sorry, the 9.1 was a typo, the system was on 9.2.11 before/during pg_upgrade. I ran 9.2.11-to-9.4.1 and 9.2.11-to-9.4.2 upgrades through my script. Both of them set TLI=1. I would be inclined to restore compatibility if this were a 9.4.2 regression, but upgrades from 9.2 to 9.4 have always done that. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade resets timeline to 1
On Thu, May 28, 2015 at 10:20:58AM -0400, Bruce Momjian wrote: On Thu, May 28, 2015 at 08:47:07AM +0100, Simon Riggs wrote: What we should be saying is that the last timeline doesn't need a history file. Then no change is needed here. Yes, that would make a lot more sense than what we have now, but this had to be backpatched, so reverting to the 9.3 and earlier behavior seemed logical. To clarify for the archives, the 2015-05-16 changes did not revert to 9.3 and earlier behavior. Rather, they standardized on the {9.0,9.1,9.2}-to-{9.3,9.4} upgrade behavior, bringing that behavior to all supported branches and source versions. Here is the history of timeline restoration in pg_upgrade: On Thu, May 28, 2015 at 03:27:21AM -0400, Noah Misch wrote: Since the 2015-05-16 commits you cite, pg_upgrade always sets TLI=1. Behavior before those commits depended on the source and destination major versions. PostgreSQL 9.0, 9.1 and 9.2 restored the TLI regardless of source version. PostgreSQL 9.3 and 9.4 restored the TLI when upgrading from 9.3 or 9.4, but they set TLI=1 when upgrading from 9.2 or earlier. (Commit 038f3a0 introduced this inconsistent behavior of 9.3 and later.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PGCon hacker lounge
On May 27, 2015, at 12:06 PM, Alexander Korotkov a.korot...@postgrespro.ru wrote: On Wed, May 27, 2015 at 7:00 PM, Dan Langille d...@langille.org mailto:d...@langille.org wrote: Have you been to PGCon before? Do you remember the hacker lounge? Do you remember going there to work on stuff? Do you recall anything about it? I remember I've tried to visit it in 2012 or 2013. That time I found empty room and nobody there. Didn't try to visit it anytime after. The reason I asked: I was trying to gauge the usefulness of the PGCon hacking lounge since it was first added to the schedule in 2012. It seems it goes unused, and I was trying to see if anyone found it useful in the past. At BSDCan, for example, you can find people there every night discussing and working. Or perhaps just socializing. It's a major gathering point. If there is interest, we'll retain for 2015, but it seems best to remove it from the schedule. — Dan Langille http://langille http://langille/.org/ signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [HACKERS] pg_upgrade resets timeline to 1
On Thu, May 28, 2015 at 10:39:15AM -0400, Noah Misch wrote: It looks like an upgrade from 9.1.x to 9.3.0 or later has always set the new timeline identifier (TLI) to 1. My testing confirms this for an upgrade from 9.1.16 to 9.4.1 and for an upgrade from 9.1.16 to 9.4.2, so I failed to reproduce your report. Would you verify the versions you used? If you were upgrading from 9.3.x, I _can_ reproduce that. Sorry, the 9.1 was a typo, the system was on 9.2.11 before/during pg_upgrade. I ran 9.2.11-to-9.4.1 and 9.2.11-to-9.4.2 upgrades through my script. Both of them set TLI=1. I would be inclined to restore compatibility if this were a 9.4.2 regression, but upgrades from 9.2 to 9.4 have always done that. Right, it was only 9.3 to 9.4.0 (and 9.4.1) that restored the timeline. Restores to 9.4.2 do not. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] proleakproof vs opr_sanity test
In view of http://www.postgresql.org/message-id/CAM2+6=u5ylzbre3v3wf9fful0gxr1ega3miph1gu0jpseud...@mail.gmail.com I went trawling for other places where the LEAKPROOF patch might have overlooked something, by dint of grepping for prosecdef and seeing if there was or should be parallel code for proleakproof. I found one: opr_sanity.sql has a test to see if multiple pg_proc references to the same underlying built-in function all have equivalent properties, and that check is comparing prosecdef properties but not proleakproof. I tried adding proleakproof, and behold, I got this: *** /home/postgres/pgsql/src/test/regress/expected/opr_sanity.out Tue May 19 11:43:02 2015 --- /home/postgres/pgsql/src/test/regress/results/opr_sanity.outThu May 28 10:59:18 2015 *** *** 130,142 (p1.prolang != p2.prolang OR p1.proisagg != p2.proisagg OR p1.prosecdef != p2.prosecdef OR p1.proisstrict != p2.proisstrict OR p1.proretset != p2.proretset OR p1.provolatile != p2.provolatile OR p1.pronargs != p2.pronargs); ! oid | proname | oid | proname ! -+-+-+- ! (0 rows) -- Look for uses of different type OIDs in the argument/result type fields -- for different aliases of the same built-in function. --- 130,144 (p1.prolang != p2.prolang OR p1.proisagg != p2.proisagg OR p1.prosecdef != p2.prosecdef OR + p1.proleakproof != p2.proleakproof OR p1.proisstrict != p2.proisstrict OR p1.proretset != p2.proretset OR p1.provolatile != p2.provolatile OR p1.pronargs != p2.pronargs); ! oid | proname | oid | proname ! -+-+--+--- ! 68 | xideq | 1319 | xideqint4 ! (1 row) -- Look for uses of different type OIDs in the argument/result type fields -- for different aliases of the same built-in function. So I think we ought to fix xideqint4 to be marked leakproof and then add this test. That would only be in HEAD though since it'd require an initdb. Any objections? Is there a reason to believe that a built-in function might be leakproof when invoked from one function OID but not another? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rhel6 rpm file locations
On May 27, 2015, at 12:40 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, May 27, 2015 at 3:25 PM, Ted Toth txt...@gmail.com wrote: On Wed, May 27, 2015 at 1:31 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, May 27, 2015 at 11:35 AM, Ted Toth txt...@gmail.com wrote: I'm trying to use a newer version than is available from RH in our custom distro but am having problems install both x86_64 and i686 rpms because file conflicts. We have some i686 packages that use libgdal which pulls in libpq which ends up in the same location in both the x86_64 and i686 postgresql lib and devel rpms. Why doesn't postgresql use _libdir and other standard rpm macros? From where did you get the RPMs that you are using? There is more than one set, and they are maintained by different people. http://yum.postgresql.org/9.4/redhat/rhel-6.5-x86_64/ Those are maintained by Devrim Gunduz and Jeff Frost. Adding them to the Cc list. Could you show us the copy/paste of your yum install output? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] RFC: Remove contrib entirely
Hello, This is a topic that has come up in various ways over the years. After the long thread on pg_audit, I thought it might be time to bring it up again. Contrib according to the docs is: These include porting tools, analysis utilities, and plug-in features that are not part of the core PostgreSQL system, mainly because they address a limited audience or are too experimental to be part of the main source tree. This does not preclude their usefulness. It has also been mentioned many times over the years that contrib is a holding tank for technology that would hopefully be pushed into core someday. What I am suggesting: 1. Analyze the current contrib modules for inclusion into -core. A few of these are pretty obvious: pg_stat_statements citext postgres_fdw hstore pg_crypto [...] I am sure there will be plenty of fun to be had with what should or shouldn't be merged into core. I think if we argue about the guidelines of how to analyze what should be in core versus the merits of any particular module, life will be easier. Here are some for a start: A. Must have been in contrib for at least two releases B. Must have visible community (and thus use case) 2. Push the rest out into a .Org project called contrib. Let those who are interested in the technology work on them or use them. This project since it is outside of core proper can work just like other extension projects. Alternately, allow the maintainers push them wherever they like (Landscape, Github, Savannah, git.postgresql.org ...). Why I am suggesting this: 1. Less code to maintain in core 2. Eliminates the mysticism of contrib 3. Removal of experimental code from core 4. Most of the distributions package contrib separately anyway 5. Some of core is extremely small use case (sepgsql, tsearch2, lo ...) 6. Finding utilities for PostgreSQL used to be harder. It is rather dumb simple teenage snapchat user easy now. 8. Isn't this what pgxs is for? 9. Everybody hates cleaning the closet until the end result. 10. Several of these modules would make PostgreSQL look good anyway (default case insensitive index searching with citext? It is a gimme) 11. Contrib has been getting smaller and smaller. Let's cut the cord. 12. Isn't this the whole point of extensions? Sincerely, jD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing I'm offended is basically telling the world you can't control your own emotions, so everyone else should do it for you. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
On Thu, May 28, 2015 at 10:26 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: 2. Robert, are you comfortable with what fsync_pgdata() does in xlog.c? I wasn't sure if I should move that to fd.c as well. I think it's borderline OK for now. I think if the function is specific as fsync_pgdata(), fd.c is not the right place. I'm not sure xlog.c is either, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rhel6 rpm file locations
Hi, On Thu, 2015-05-28 at 08:54 -0500, Ted Toth wrote: Are there any other packagers following the Fedora 'standards' that you are aware of? It is not about following the standards or not. Unlike distro RPMs, you can install multiple PostgreSQL versions into the same box using community RPMS. That caused a bit of breakage. So, instead of using %{_libdir}, we install our libs and binaries under /usr/pgsql-X.Y, (like 9.4, 9.3), and use lib/ and bin/ subdirectory under that one. So, we are are not multiarch compatible. I am not sure if we can ship 32-bit libs with 64-bit packages or not, though, to fix this issue. May I ask you to subscribe pgsql-pkg-...@postgresql.org if you want to discuss more? Regards, -- Devrim GÜNDÜZ Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer Twitter: @DevrimGunduz , @DevrimGunduzTR -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add pg_audit, an auditing extension
Heikki, * Heikki Linnakangas (hlinn...@iki.fi) wrote: On 05/28/2015 11:14 AM, Stephen Frost wrote: * Heikki Linnakangas (hlinn...@iki.fi) wrote: 1. it's not flexible enough. How do you specify that all READs on super_secret table must be logged, but on less_secret table, it's enough to log only WRITEs? This is actually what that pg_audit.role setting was all about. To do the above, you would do: CREATE ROLE pgaudit; SET pg_audit.role = pgaudit; GRANT SELECT ON TABLE super_secret TO pgaudit; GRANT INSERT, UPDATE, DELETE ON TABLE less_secret TO pgaudit; With this, all commands executed which require SELECT rights on the table super_secret are logged, and all commands execute which require INSERT, UPDATE, or DELETE on the less_secret table are logged. Ah, I see. Wow, that's really Rube Goldbergian. It's certainly not ideal. It was discussed back in January, iirc. 2. GUCs can be changed easily on-the-fly, and configured flexibly. But that's not really what you want for auditing. You want to have a clear specification in one place. You'd want it to be more like pg_hba.conf than postgresql.conf. Or more like Mandatory Access Control, rather than Discretionary Access Control. I certainly appreciate the MAC vs. DAC discussion here, but I don't believe our AUDIT capability should explicitly require restarts of PG to be adjusted. Sure, I didn't mean we should require a restart. Requiring SIGHUP seems reasonable though. I wouldn't have any issue with that. A separate config file in $PGDATA would be a better way to configure this. You would then have all the configuration in one place, and that file could have a more flexible format, with per-schema rules etc. (Wouldn't have to implement all that in the first version, but that's the direction this should go to) The difficulty with a separate config file is that we don't have any way of properly attaching information to the existing tables in the database- table renames, dropped columns, etc, all become an issue then. True. I wouldn't be too worried about that though. If you rename a table, that hopefully gets flagged in the audit log and someone will ask why did you rename that table?. If you're paranoid enough, you could have a catch-all rule that audits everything that's not explicitly listed, so a renamed table always gets audited. The general 'catch-all' approach was how we approached pg_audit in general, so, yes, that's the kind of auditing we expect people to want (or, at least, we would need to support it). You're right, in some environments that may be workable, but then it also has to be entirely managed outside of the database, which would make it extremely difficult to use in many environments, if not impossible.. Of course, you could still support a labeling system, similar to the pg_audit.role setting and GRANTs. For example, you could tag tables with a label like reads_need_auditing, and then in the config file, you could specify that all READs on tables with that label need to be audited. I think security labels would be a better system to abuse for that than GRANT. You'd want to just label the objects, and specify the READ/WRITE etc. attributes in the config file. Using SECURITY LABELs is certainly an interesting idea. GRANT was chosen because, with it, we also get information regarding the action that the user wants to audit (select/insert/update/delete, etc), without having to build all of that independently with some custom structure in the SECURITY LABEL system. Who do you assume you can trust? Is it OK if the table owner can disable/enable auditing for that table? In an ideal world, you would segregate the rights which the table owner has from both the permission system and the audit system. This has come up a number of times already and is, really, an independent piece of work. Having the permissions and auditing controlled by the same group is better than having all three pieces controlled by the ownership role, but having three distinct groups would be preferred. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] About that re-release ...
On Wed, May 27, 2015 at 11:08 PM, Robert Haas robertmh...@gmail.com wrote: What about http://www.postgresql.org/message-id/20150527222142.ge5...@postgresql.org ? I believe that is also a 9.4.2 regression, and a serious one. Oh? There was nothing in the thread that suggested to me that it was a new-in-9.4.2 bug. I think it is. The executive summary here is that 9.4.2 and 9.3.7 fail to start if pg_control's oldestMultiXid points to a pg_multixact/offsets file that does not exist. Earlier versions tolerated that, but the new versions don't. So people who have this situation will be unable to start the database after upgrading. That's quite bad. However, the new set of releases is not entirely responsible for the problem, because the situation that causes 9.4.2 and 9.3.7 to fail to start isn't supposed to occur. The database really SHOULD NOT remove an offsets file that does not precede oldestMultiXid, and if it does, then either oldestMultiXid is set wrong (which would be a bug), or the database removed an offsets file to which references may still exist (which would be a data loss issue). Thomas Munro, Alvaro, and I have been discussing this on Skype, but have so far been unable to construct a series of events which would lead to an occurrence of this kind. We speculate that pg_upgrade may play a role, but there's no proof of that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] readlink missing nul-termination in pg_rewind
This is just something I noticed in passing. (I did a quick check of all the other uses of readlink in the source, and they do get this right.) -- Abhijit P.S. Also in passing, I note that pg_rewind will follow links under any directory anywhere named pg_tblspc (which probably doesn't matter), and does not follow pg_xlog if it's a symlink (which probably does). If you want, I can submit a trivial patch for the latter. From d1e5cbea21bb916253bce2ad189deb1924864508 Mon Sep 17 00:00:00 2001 From: Abhijit Menon-Sen a...@2ndquadrant.com Date: Thu, 28 May 2015 21:03:50 +0530 Subject: readlink() doesn't nul-terminate the buffer, so we have to --- src/bin/pg_rewind/copy_fetch.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/bin/pg_rewind/copy_fetch.c b/src/bin/pg_rewind/copy_fetch.c index 9e317a2..4a7150b 100644 --- a/src/bin/pg_rewind/copy_fetch.c +++ b/src/bin/pg_rewind/copy_fetch.c @@ -126,6 +126,7 @@ recurse_dir(const char *datadir, const char *parentpath, fullpath); } + link_target[len] = '\0'; callback(path, FILE_TYPE_SYMLINK, 0, link_target); /* -- 1.9.1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Enhanced ALTER OPERATOR
Alexander Korotkov a.korot...@postgrespro.ru writes: Could we address both this problems by denying changing existing commutators and negator? ISTM that setting absent commutator and negator is quite enough for ALTER OPERATOR. User extensions could need setting of commutator and negator because they could add new operators which don't exist before. But it's rather uncommon to unset or change commutator or negator. Note that this functionality is already covered, in that you can specify the commutator/negator linkage when you create the second operator. I'm not particularly convinced that we need to have it in ALTER OPERATOR. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade resets timeline to 1
Re: Noah Misch 2015-05-28 20150528150234.ga4111...@tornado.leadboat.com On Thu, May 28, 2015 at 10:20:58AM -0400, Bruce Momjian wrote: On Thu, May 28, 2015 at 08:47:07AM +0100, Simon Riggs wrote: What we should be saying is that the last timeline doesn't need a history file. Then no change is needed here. Yes, that would make a lot more sense than what we have now, but this had to be backpatched, so reverting to the 9.3 and earlier behavior seemed logical. To clarify for the archives, the 2015-05-16 changes did not revert to 9.3 and earlier behavior. Rather, they standardized on the {9.0,9.1,9.2}-to-{9.3,9.4} upgrade behavior, bringing that behavior to all supported branches and source versions. Here is the history of timeline restoration in pg_upgrade: Ok, sorry for the noise then. It's not a regression, but I still think the behavior needs improvement, but this is indeed 9.6 material. Christoph -- c...@df7cb.de | http://www.df7cb.de/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] readlink missing nul-termination in pg_rewind
Abhijit Menon-Sen a...@2ndquadrant.com writes: This is just something I noticed in passing. (I did a quick check of all the other uses of readlink in the source, and they do get this right.) There's more random inconsistency than just this. I think we should standardize on the coding exhibited at, eg, basebackup.c:1023ff, which positively ensures that it won't scribble on random memory if the call returns an unexpected value. Will fix. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] readlink missing nul-termination in pg_rewind
Abhijit Menon-Sen a...@2ndquadrant.com writes: P.S. Also in passing, I note that pg_rewind will follow links under any directory anywhere named pg_tblspc (which probably doesn't matter), and does not follow pg_xlog if it's a symlink (which probably does). If you want, I can submit a trivial patch for the latter. As far as that goes, I think it does look at the whole parentpath, which means it would not be fooled by sub-subdirectories named pg_tblspc. A bigger problem is that whoever coded this forgot that parentpath could be null, which I blame on the lack of an API specification for the function. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
Robert Haas robertmh...@gmail.com writes: On Thu, May 28, 2015 at 10:26 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: 2. Robert, are you comfortable with what fsync_pgdata() does in xlog.c? I wasn't sure if I should move that to fd.c as well. I think it's borderline OK for now. I think if the function is specific as fsync_pgdata(), fd.c is not the right place. I'm not sure xlog.c is either, though. ISTM xlog.c is clearly the *wrong* place; if this behavior has anything to do with WAL logging as such, it's not apparent to me. fd.c is not a great place perhaps, but in view of the fact that we have things like RemovePgTempFiles() in there, it's not unreasonable to see fsync_pgdata as something to put there as well (perhaps with a name more chosen to match fd.c names...) Since Robert appears to be busy worrying about the multixact issue reported by Steve Kehlet, I suggest he focus on that and I'll take care of getting this thing committed. AFAICT we have consensus on what it should do and we're down to arguing about code style. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers