Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
On 2016/02/16 16:02, Ashutosh Bapat wrote: On Tue, Feb 16, 2016 at 12:26 PM, Etsuro Fujita mailto:fujita.ets...@lab.ntt.co.jp>> wrote: On 2016/02/16 15:22, Ashutosh Bapat wrote: During join planning, the planner tries multiple combinations of joining relations, thus the same base or join relation can be part of multiple of combination. Hence remote_conds or joinclauses will get linked multiple times as they are bidirectional lists, thus breaking linkages of previous join combinations tried. E.g. while planning A join B join C join D planner will come up with combinations like A(B(CD)) or (AB)(CD) or ((AB)C)D etc. and remote_conds from A will first be linked into A(B(CD)), then AB breaking the first linkages. Exactly, but I don't think that that needs to be considered because we have this at the beginning of postgresGetGForeignJoinPaths: /* * Skip if this join combination has been considered already. */ if (joinrel->fdw_private) return; There will be different joinrels for A(B(CD)) and (AB) where A's remote_conds need to be pulled up. Agreed. The check you have mentioned above only protects us from adding paths multiple times to (AB) when we encounter it for (AB)(CD) and ((AB)C)D. Sorry, I don't understand this fully. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] exposing pg_controldata and pg_config as functions
On Tue, Feb 16, 2016 at 5:29 AM, Joe Conway wrote: > I believe this takes care of all open issues with this, so I plan to > commit it as attached in a day or two. Thanks for your reviews and comments! Here are just a couple of cosmetic comments. + The view pg_config describes the + compile-time configuration parameters of the currently installed + version of PostgreSQL. It is intended, for example, to be used by + software packages that want to interface to PostgreSQL to facilitate + finding the required header files and libraries. It provides the same + basic information as the PostgreSQL Client + Application. There is a System Information Function Missing markup around PostgreSQL. + Application. There is a System Information Function Why is using upper-case characters necessary here? This could just say system function. The paragraph in func.sgml is a copy-paste of the one in catalogs.sgml. We may want to remove the duplication. + /* let the caller know we're sending back a tuplestore */ + rsinfo->returnMode = SFRM_Materialize; I guess one can recognize your style here for SRF functions :) @@ -61,7 +74,7 @@ libpgcommon_srv.a: $(OBJS_SRV) # a hack that might fail someday if there is a *_srv.o without a # corresponding *.o, but it works for now. %_srv.o: %.c %.o - $(CC) $(CFLAGS) $(subst -DFRONTEND,, $(CPPFLAGS)) -c $< -o $@ + $(CC) $(CFLAGS) $(subst -DFRONTEND ,, $(CPPFLAGS)) -c $< -o $@ Diff noise? --- /dev/null +++ b/src/common/config_info.c [...] + * IDENTIFICATION + * src/common/controldata_utils.c This is incorrect. + * IDENTIFICATION + * src/backend/utils/misc/pg_config.c + * + */ I am nitpicking here but this header block should have a long "" at its bottom. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
On Mon, Feb 15, 2016 at 2:54 PM, Michael Paquier wrote: > On Mon, Feb 15, 2016 at 2:11 PM, Kyotaro HORIGUCHI wrote: >> Surprizingly yes. The list is handled as an identifier list and >> parsed by SplitIdentifierString thus it can accept double-quoted >> names. > Attached latest version patch which has only feature logic so far. I'm writing document patch about this feature now, so this version patch doesn't have document and regression test patch. > | $ postgres > | FATAL: syntax error: unexpected character "*" > Mmm.. It should be tough to find what has happened.. I'm trying to implement better error message, but that change is not included in this version patch yet. > malloc/free are used in create_name_node and other functions to > be used in scanner, but syncgroup_gram.y is said to use > palloc/pfree. Maybe they should use the same memory > allocation/freeing functions. Setting like this, I think that we use malloc/free funcion when we allocate/free memory for SyncRepStandbys variables. OTOH, we use palloc/pfree function during parsing SyncRepStandbyString. Am I missing something? > I suppose SyncRepGetSyncedLsnsFn, or SyncRepGetSyncedLsnsPriority > can return InvalidXLogRecPtr as cur_*_pos even when it returns > true. And, I suppose comparison of LSN values with > InvalidXLogRecPtr is not well-defined. Anyway the condition goes > wrong when cur_write_pos = InvalidXLogRecPtr (but ret = true). In this version patch, it's not possible to return InvalidXLogRecPtr with got_lsns = false (was ret = false). So we can ensure that we got valid LSNs when got_lsns = true. > At a glance, SyncRepGetSyncedLsnsPriority and > SyncRepGetSyncStandbysPriority does almost the same thing and both > runs loops over group members. Couldn't they run at once? Yeah, I've optimized that logic. > We may want to be careful with the use of '[' in application_name. > I am not much thrilled with forbidding the use of []() in application_name, > so we may > want to recommend user to use a backslash when using s_s_names when a > group is defined. > s_s_names='abc, def, " abc,""def"' > > Result list is ["abc", "def", " abc,\"def"] > > Simplly supporting the same notation addresses the problem and > accepts strings like the following. > > s_s_names='2["comma,name", "foo[bar,baz]"]' I've changed s_s_names parser so that it can handle special 4 characters (\,\ \[\]) and can handle double-quoted string accurately same as what SplitIdentifierString does. We can not use special 4 characters (\,\ \[ \]) without using double-quoted string. Also if we use "(double-quote) character in double-quoted string, we should use ""(double double-quotes). For example, if application_name = 'hoge " bar', s_s_name = '"hoge "" bar"' would be matched. Other given comments are fixed. Remaining tasks are; - Document patch. - Regression test patch. - Syntax error message for s_s_names improvement. Regards, -- Masahiko Sawada 000_multi_sync_replication_v10.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
On Tue, Feb 16, 2016 at 12:26 PM, Etsuro Fujita wrote: > On 2016/02/16 15:22, Ashutosh Bapat wrote: > >> During join planning, the planner tries multiple combinations of joining >> relations, thus the same base or join relation can be part of multiple >> of combination. Hence remote_conds or joinclauses will get linked >> multiple times as they are bidirectional lists, thus breaking linkages >> of previous join combinations tried. E.g. while planning A join B join C >> join D planner will come up with combinations like A(B(CD)) or (AB)(CD) >> or ((AB)C)D etc. and remote_conds from A will first be linked into >> A(B(CD)), then AB breaking the first linkages. >> > > Exactly, but I don't think that that needs to be considered because we > have this at the beginning of postgresGetGForeignJoinPaths: > > /* > * Skip if this join combination has been considered already. > */ > if (joinrel->fdw_private) > return; > > There will be different joinrels for A(B(CD)) and (AB) where A's remote_conds need to be pulled up. The check you have mentioned above only protects us from adding paths multiple times to (AB) when we encounter it for (AB)(CD) and ((AB)C)D. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] planstats.sgml
>> Tatsuo Ishii writes: >>> While reading planstats.sgml, I encounted a sentence which I don't >>> understand. >> >>> These numbers are current as of the last VACUUM or >>> ANALYZE on the table. The planner then fetches the >>> actual current number of pages in the table (this is a cheap operation, >>> not requiring a table scan). If that is different from >>> relpages then >>> reltuples is scaled accordingly to >>> arrive at a current number-of-rows estimate. In this case the value of >>> relpages is up-to-date so the rows estimate >>> is >>> the same as reltuples. >> >>> I don't understand the last sentence (In this case...). For me it >>> seems it is talking about the case when replages is not different from >>> what the planner fetches from the table. If so, why "In this case"? >> >> I think what it meant is "In the example above, ...". Feel free to >> change it if you think that is clearer. > > Oh, I see. I'm going to change "In this case" to "In the example above". Done. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extend pgbench expressions with functions
On Tue, Feb 16, 2016 at 9:18 AM, Robert Haas wrote: > I experimented with trying to do this and ran into a problem: where > exactly would you store the evaluated arguments when you don't know > how many of them there will be? And even if you did know how many of > them there will be, wouldn't that mean that evalFunc or evaluateExpr > would have to palloc a buffer of the correct size for each invocation? > That's far more heavyweight than the current implementation, and > minimizing CPU usage inside pgbench is a concern. It would be > interesting to do some pgbench runs with this patch, or the final > patch, and see what effect it has on the TPS numbers, if any, and I > think we should. But the first concern is to minimize any negative > impact, so let's talk about how to do that. Good point. One simple idea here would be to use a custom pgbench script that has no SQL commands and just calculates the values of some parameters to measure the impact without depending on the backend, with a fixed number of transactions. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
On 2016/02/16 15:22, Ashutosh Bapat wrote: During join planning, the planner tries multiple combinations of joining relations, thus the same base or join relation can be part of multiple of combination. Hence remote_conds or joinclauses will get linked multiple times as they are bidirectional lists, thus breaking linkages of previous join combinations tried. E.g. while planning A join B join C join D planner will come up with combinations like A(B(CD)) or (AB)(CD) or ((AB)C)D etc. and remote_conds from A will first be linked into A(B(CD)), then AB breaking the first linkages. Exactly, but I don't think that that needs to be considered because we have this at the beginning of postgresGetGForeignJoinPaths: /* * Skip if this join combination has been considered already. */ if (joinrel->fdw_private) return; Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
On Tue, Feb 16, 2016 at 5:58 AM, Alvaro Herrera wrote: > Peter Geoghegan wrote: > >> Michael (the CF manager at the time) remembered to change the status >> to "Ready for Committer" again; you see this entry immediately >> afterwards: >> >> "New status: Ready for Committer" >> >> but I gather from the CF app history that Alvaro (the current CF >> manager) did not remember to do that second step when he later moved >> the patch to the "next" (current) CF. Or maybe he just wasn't aware of >> this quirk of the CF app. I recall switching this patch as "Ready for committer" based on the status of the thread. > That seems a bug in the CF app to me. > > (FWIW I'm not the "current" CF manager, because the CF I managed is > already over. I'm not sure that we know who the manager for the > upcoming one is.) When a patch with status "Ready for committer" on CF N is moved to CF (N+1), its status is automatically changed to "Needs Review". That's something to be aware of when cleaning up the CF app entries. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
On 2016/02/15 21:33, Ashutosh Bapat wrote: Here's patch with better way to fix it. I think while concatenating the lists, we need to copy the lists being appended and in all the cases. If we don't copy, a change in those lists can cause changes in the upward linkages and thus lists of any higher level joins. Maybe I'm missing something, but I don't understand why such a change in those lists happens. Could you explain about that in more detail? Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
During join planning, the planner tries multiple combinations of joining relations, thus the same base or join relation can be part of multiple of combination. Hence remote_conds or joinclauses will get linked multiple times as they are bidirectional lists, thus breaking linkages of previous join combinations tried. E.g. while planning A join B join C join D planner will come up with combinations like A(B(CD)) or (AB)(CD) or ((AB)C)D etc. and remote_conds from A will first be linked into A(B(CD)), then AB breaking the first linkages. On Tue, Feb 16, 2016 at 11:36 AM, Etsuro Fujita wrote: > On 2016/02/15 21:33, Ashutosh Bapat wrote: > >> Here's patch with better way to fix it. I think while concatenating the >> lists, we need to copy the lists being appended and in all the cases. If >> we don't copy, a change in those lists can cause changes in the upward >> linkages and thus lists of any higher level joins. >> > > Maybe I'm missing something, but I don't understand why such a change in > those lists happens. Could you explain about that in more detail? > > Best regards, > Etsuro Fujita > > > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Incorrect formula for SysV IPC parameters
On Fri, Feb 12, 2016 at 11:19 PM, Fujii Masao wrote: > On Fri, Feb 5, 2016 at 2:17 PM, Kyotaro HORIGUCHI > wrote: >> At Thu, 4 Feb 2016 21:43:04 +0900, Fujii Masao wrote >> in >>> On Wed, Feb 3, 2016 at 12:51 PM, Kyotaro HORIGUCHI >>> wrote: >>> > Hello, I found that the formulas to calculate SEMMNI and SEMMNS >>> > are incorrect in 9.2 and later. >>> > >>> > http://www.postgresql.org/docs/9.5/static/kernel-resources.html >>> > >>> > But actually the number of semaphores PostgreSQL needs is >>> > calculated as following in 9.4 and later. >> ... >>> > So, the formula for SEMMNI should be >>> > >>> > ceil((max_connections + autovacuum_max_workers + max_worker_processes + >>> > 5) / 16) >>> > >>> > and SEMMNS should have the same fix. >>> > >>> > >>> > In 9.3 and 9.2, the documentation says the same thing but >> ... >>> > ceil((max_connections + autovacuum_max_workers + 5) / 16) >>> > >>> > In 9.1, NUM_AUXILIARY_PROCS is 3 so the documentations is correct. >>> >>> Good catch! >> >> Thanks. >> >>> ISTM that you also need to change the descriptions about SEMMNI and SEMMNS >>> under the Table 17-1. >> >> Oops! Thank you for pointing it out. >> >> The original description doesn't mention the magic-number ('5' in >> the above formulas, which previously was '4') so I haven't added >> anything about it. >> >> Process of which the number is limited by max_worker_processes is >> called 'background process' (not 'backend worker') in the >> documentation so I used the name to mention it in the additional >> description. >> >> The difference in the body text for 9.2, 9.3 is only a literal >> '4' to '5' in the formula. > > Thanks for updating the patches! > > They look good to me except that the formulas don't include the number of > background processes requesting shared memory access, i.e., > GetNumShmemAttachedBgworkers(), in 9.3. Isn't it better to use the following > formula in 9.3? > > ceil((max_connections + autovacuum_max_workers + number of > background proceses + 5) / 16) > > Attached patch uses the above formula for 9.3. I'm thinking to push your > patches to 9.2, 9.4, 9.5, master, also push the attached one to 9.3. > Comments? Pushed. Thanks for the report and patches! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] planstats.sgml
On 16/02/16 12:46, David G. Johnston wrote: On Mon, Feb 15, 2016 at 4:23 PM, Tatsuo Ishii mailto:is...@postgresql.org>>wrote: While reading planstats.sgml, I encounted a sentence which I don't understand. These numbers are current as of the last VACUUM or ANALYZE on the table. The planner then fetches the actual current number of pages in the table (this is a cheap operation, not requiring a table scan). If that is different from relpages then reltuples is scaled accordingly to arrive at a current number-of-rows estimate. In this case the value of relpages is up-to-date so the rows estimate is the same as reltuples. I don't understand the last sentence (In this case...). For me it seems it is talking about the case when replages is not different from what the planner fetches from the table. If so, why "In this case"? Isn't "In this case" referrers to the previous sentence (If that is different from...)? Maybe "In this case" should be "Otherwise" or some such? The whole sentence is awkward but you are correct in your reading - and "otherwise" would be a solid choice. A long while ago when I wrote that, I was expressing the fact that *in the example* the numbers matched perfectly, but *in general* the planner would scale 'em. It still reads that way to me...but change away if you like :-) Though iIt seems the whole thing could be simplified to: These numbers are current as of the last VACUUM or ANALYZE on the table. To account for subsequent changes the planner obtains the actual page count for the table and scales pg_class.reltuples by a factor of "actual page count" over pg_class.relpages. The "cheap operation" comment seems gratuitous though could still be injected if desired. Well folk interested in performance, like to keep an eye on whether these sort of probes cost a lot... regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] custom function for converting human readable sizes to bytes
Hi 2016-02-15 10:16 GMT+01:00 Dean Rasheed : > > On 12/02/16 10:19, Dean Rasheed wrote: > >> This seems like a reasonable first patch for me as a committer, so > >> I'll take it unless anyone else was planning to do so. > > > > So looking at this, it seems that for the most part pg_size_bytes() > will parse any output produced by pg_size_pretty(). The exception is > that there are 2 versions of pg_size_pretty(), one that takes bigint > and one that takes numeric, whereas pg_size_bytes() returns bigint, so > it can't handle all inputs. Is there any reason not to make > pg_size_bytes() return numeric? > It would still be compatible with the example use cases, but it would > be a better inverse of both variants of pg_size_pretty() and would be > more future-proof. It already works internally using numeric, so it's > a trivial change to make now, but impossible to change in the future > without introducing a new function with a different name, which is > messy. > > Thoughts? > This is a question. I have not a strong opinion about it. There are no any technical objection - the result will be +/- same. But you will enforce Numeric into outer expression evaluation. The result will not be used together with function pg_size_pretty, but much more with functions pg_relation_size, pg_relation_size, .. and these functions doesn't return Numeric. These functions returns bigint. Bigint is much more natural type for this purpose. Is there any use case for Numeric? Regards Pavel > > Regards, > Dean >
Re: [HACKERS] xlc atomics
On 5 July 2015 at 06:54, Andres Freund wrote: > I wrote this entirely blindly, as evidenced here by the changes you > needed. At the time somebody had promised to soon put up an aix animal, > but that apparently didn't work out. > Similarly, I asked IBM for XL/C for a POWER7 Linux VM to improve test cover on the build farm but was told to just use gcc. It was with regards to testing hardware decfloat support and the folks I spoke to said that gcc's support was derived from that in xl/c anyway. They're not desperately keen to get their products out there for testing. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] innocuous: pgbench does FD_ISSET on invalid socket
On Tue, Feb 16, 2016 at 8:47 AM, Alvaro Herrera wrote: > Michael Paquier wrote: > >> Different issues in the same area: >> 1) In vacuumdb.c, init_slot() does not check for the return value of >> PQsocket(): >> slot->sock = PQsocket(conn); >> 2) In isolationtester.c, try_complete_step() should do the same. >> 3) In pg_recvlogical.c for StreamLogicalLog() I am spotting the same problem. >> I guess those ones should be fixed as well, no? > > Hmm, yeah, perhaps it's worth closing these too. Do you think that it would be better starting a new thread? The only difficulty of this patch is to be sure that each error handling is done correctly for each one of those frontend modules. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning
Hi Josh, On 2016/02/16 11:41, Josh berkus wrote: > On 02/15/2016 04:28 PM, Amit Langote wrote: >> Also, you won't see any optimizer and executor changes. Queries will still >> use the same plans as existing inheritance-based partitioned tables, >> although as I mentioned, constraint exclusion won't yet kick in. That will >> be fixed very shortly. > > We're not going to use CE for the new partitioning long-term, are we? This > is just the first version, right? Yes. My approach in previous versions of stuffing major planner changes in with the syntax patch was not quite proper in retrospect. So, I thought I'd propose any major planner (and executor) changes later. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning
On 02/15/2016 04:28 PM, Amit Langote wrote: Also, you won't see any optimizer and executor changes. Queries will still use the same plans as existing inheritance-based partitioned tables, although as I mentioned, constraint exclusion won't yet kick in. That will be fixed very shortly. We're not going to use CE for the new partitioning long-term, are we? This is just the first version, right? -- -- Josh Berkus Red Hat OSAS (any opinions are my own) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A bit of PG archeology uncovers an interesting Linux/Unix factoid
On 02/15/16 20:03, Greg Stark wrote: > On Tue, Feb 16, 2016 at 12:51 AM, Chapman Flack wrote: >> If the calling process subsequently waits for its >> children, and the process has no unwaited for children that were >> transformed into zombie processes, it will block until all of its >> children terminate, and wait(), wait3(), waitid() and waitpid() will >> fail and set errno to [ECHILD]. > And actually looking at that documentation it's not clear to me why > it's the case. I would have expected system to immediately call > waitpid after the fork and unless the subprocess was very quick that > should be sufficient to get the exit code. One might even imagine > having system intentionally have some kind interlock to ensure that > the parent has called waitpid before the child execs the shell. Doesn't the wording suggest that even if the parent is fast enough to call waitpid before the child exits, waitpid will only block until the child terminates and then say ECHILD anyway? I wouldn't be surprised if they specified it that way to avoid creating a race condition where you would *sometimes* think it was doing what you wanted. Agree that the language for ECHILD in system(3) doesn't clearly reflect that in the "status ... is no longer available" description it gives for ECHILD. -Chap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove or weaken hints about "effective resolution of sleep delays is 10 ms"?
On Wed, Feb 10, 2016 at 4:15 PM, Andres Freund wrote: > Hi, > > Several places in our docs have blurbs like >> Note that on many systems, the effective resolution of sleep delays is >> 10 milliseconds; setting wal_writer_delay to a value that >> is not a multiple of 10 might have the same results as setting it to >> the next higher multiple of 10. > Afaik that's not the case on any recent operating system/hardware. So > perhaps we should just remove all of those blurbs, or just replace them > with something like "on some older systems the effective resolution of > sleep delays is limited to multiples of 10 milliseconds"? I guess we should probably explain what is actually happening, namely that the precise sleep duration is delegated to the operating system scheduler which may cause the process to sleep longer than requested. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning
> > Also, you won't see any optimizer and executor changes. Queries will still > use the same plans as existing inheritance-based partitioned tables, > although as I mentioned, constraint exclusion won't yet kick in. That will > be fixed very shortly. > > And of course, comments on syntax are welcome as well. > > Thanks, > Amit > > > Good to know the current limitations/expectations. Our ETL has a great number of workers that do something like this: 1. grab a file 2. based on some metadata of that file, determine the partition that that would receive ALL of the rows in that file. It's actually multiple tables, all of which are partitioned, all of which fully expect the file data to fit in exactly one partition. 3. \copy into a temp table 4. Transform the data and insert the relevant bits into each of the target partitions derived in #2. So while ATR is a *major* feature of true partitioning, it's not something we'd actually need in our current production environment, but I can certainly code it that way to benchmark ATR vs "know the destination partition ahead of time" vs "insane layered range_partitioning trigger + pg_partman trigger". Currently we don't do anything like table swapping, but I've done that enough in the past that I could probably concoct a test of that too, once it's implemented. As for the syntax, I'm not quite sure your patch addresses the concerned I voiced earlier: specifically if the VALUES IN works for RANGE as well as LIST, but I figured that would become clearer once I tried to actually use it. Currently we have partitioning on C-collated text ranges (no, they don't ship with postgres, I had to make a custom type) something like this: part0: (,BIG_CLIENT) part1: [BIG_CLIENT,BIG_CLIENT] part2: (BIG_CLIENT,L) part3: [L,MONSTROUSLY_BIG_CLIENT) part4: [MONSTROUSLY_BIG_CLIENT,MONSTROUSLY_BIG_CLIENT] part5: (MONSTROUSLY_BIG_CLIENT,RANDOM_CLIENT_LATE_IN_ALPHABET] part6: (RANDOM_CLIENT_LATE_IN_ALPHABET,) I can't implement that with a simple VALUES LESS THAN clause, unless I happen to know 'x' in 'BIG_CLIENTx', where 'x' is the exact first character in the collation sequence, which has to be something unprintable, and that would make those who later read my code to say something unprintable. So yeah, I'm hoping there's some way to cleanly represent such ranges.
Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
Hello, On 2016/02/15 20:21, Kyotaro HORIGUCHI wrote: > At Mon, 8 Feb 2016 11:37:17 +0900, Amit Langote wrote: >> On 2016/02/05 17:15, poku...@pm.nttdata.co.jp wrote: >>> Please find attached updated patch. [ ... ] >> >> Instead of passing the array of char *'s, why not just pass a single char >> *, because that's what it's doing - updating a single message. So, >> something like: > > As I might have written upthread, transferring the whole string > as a progress message is useless at least in this scenario. Since > they are a set of fixed messages, each of them can be represented > by an identifier, an integer number. I don't see a reason for > sending the whole of a string beyond a backend. This tends to make sense. Perhaps, they could be macros: #define VACUUM_PHASE_SCAN_HEAP 1 #define VACUUM_PHASE_VACUUM_INDEX_HEAP 2 > Next, the function pg_stat_get_command_progress() has a somewhat > generic name, but it seems to reuturn the data only for the > backends with beentry->st_command = COMMAND_LAZY_VACUUM and has > the column names specific for vucuum like process. If the > function is intended to be generic, it might be better to return > a set of integer[] for given type. Otherwise it should have a > name represents its objective. Agreed. > > CREATE FUNCTION > pg_stat_get_command_progress(IN cmdtype integer) > RETURNS SETOF integer[] as $$ > > SELECT * from pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as x > x > -_ > {1233, 16233, 1, } > {3244, 16236, 2, } > I am not sure what we would pass as argument to the (SQL) function pg_stat_get_command_progress() in the system view definition for individual commands - what is PROGRESS_COMMAND_VACUUM exactly? Would string literals like "vacuum", "cluster", etc. to represent command names work? > > CREATE VIEW pg_stat_vacuum_progress AS > SELECT S.s[1] as pid, > S.s[2] as relid, > CASE S.s[3] >WHEN 1 THEN 'Scanning Heap' >WHEN 2 THEN 'Vacuuming Index and Heap' >ELSE 'Unknown phase' > END, > > FROM pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as S; > > # The name of the function could be other than *_command_progress. > > Any thoughts or opinions? How about pg_stat_get_progress_info()? >> + snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, "%s", >> phase2); >> + pgstat_report_progress_update_message(0, progress_message); >> /* Remove index entries */ >> for (i = 0; i < nindexes; i++) >> + { >> lazy_vacuum_index(Irel[i], >> &indstats[i], >> vacrelstats); >> + scanned_index_pages += >> RelationGetNumberOfBlocks(Irel[i]); >> + /* Update the scanned index pages and number of index >> scan */ >> + pgstat_report_progress_update_counter(3, >> scanned_index_pages); >> + pgstat_report_progress_update_counter(4, >> vacrelstats->num_index_scans >> + 1); >> + } >> /* Remove tuples from heap */ >> lazy_vacuum_heap(onerel, vacrelstats); >> vacrelstats->num_index_scans++; >> + scanned_index_pages = 0; >> >> I guess num_index_scans could better be reported after all the indexes are >> done, that is, after the for loop ends. > > Precise reporting would be valuable if vacuuming indexes takes a > long time. It seems to me to be fine as it is since updating of > stat counters wouldn't add any significant overhead. Sorry, my comment may be a bit unclear. vacrelstats->num_index_scans doesn't count individual indexes vacuumed but rather the number of times "all" the indexes of a table are vacuumed, IOW, the number of times the vacuum phase runs. Purpose of counter #4 there seems to be to report the latter. OTOH, reporting scanned_index_pages per index as is done in the patch is alright. That said, there is discussion upthread about more precise reporting on index vacuuming by utilizing the lazy_tid_reaped() (the index bulk delete callback) as a place where we can report what index block number we are at. I think that would mean the current IndexBulkDeleteCallback signature is insufficient, which is the following: /* Typedef for callback function to determine if a tuple is bulk-deletable */ typedef bool (*IndexBulkDeleteCallback) (ItemPointer itemptr, void *state); One more parameter would be necessary: typedef bool (*IndexBulkDeleteCallback) (ItemPointer itemptr, BlockNumber current_index_blkno, void *state); That would also require changing all the am specific vacuumpage routines (like btvacuumpage) to also pass the new argument. Needless to say, some bookkeeping information would also need to be kept in LVRelStats (the "state" in above signature). Am I missing some
Re: [HACKERS] A bit of PG archeology uncovers an interesting Linux/Unix factoid
On Tue, Feb 16, 2016 at 12:51 AM, Chapman Flack wrote: > If the calling process subsequently waits for its > children, and the process has no unwaited for children that were > transformed into zombie processes, it will block until all of its > children terminate, and wait(), wait3(), waitid() and waitpid() will > fail and set errno to [ECHILD]. Sure, but I don't see anything saying system() should be expected to not handle this situation. At least there's nothing in the system.3 man page that says it should be expected to always return an error if SIGCHILD is ignored. And actually looking at that documentation it's not clear to me why it's the case. I would have expected system to immediately call waitpid after the fork and unless the subprocess was very quick that should be sufficient to get the exit code. One might even imagine having system intentionally have some kind interlock to ensure that the parent has called waitpid before the child execs the shell. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc
Noah Misch writes: > On Mon, Feb 15, 2016 at 07:31:40PM -0500, Robert Haas wrote: >> Oh, crap. I didn't realize that TEMP_CONFIG didn't affect the contrib >> test suites. Is there any reason for that, or is it just kinda where >> we ended up? > To my knowledge, it's just the undesirable place we ended up. Yeah. +1 for fixing that, if it's not unreasonably painful. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl
Robert Haas writes: > On Sun, Feb 14, 2016 at 5:26 PM, Tom Lane wrote: >> Also, after fixing that it would be good to add a comment explaining why >> it's not fundamentally unsafe for BecomeLockGroupMember() to examine >> leader->pgprocno without having any relevant lock. AFAICS, that's only >> okay because the pgprocno fields are never changed after InitProcGlobal, >> even when a PGPROC is recycled. > I am sort of disinclined to think that this needs a comment. I might not either, except that the entire point of that piece of code is to protect against the possibility that the leader PGPROC vanishes before we can get this lock. Since you've got extensive comments addressing that point, it seems appropriate to explain why this one line doesn't invalidate the whole design ... because it's right on the hairy edge of doing so. If we did something like memset a PGPROC to all zeroes when we free it, which in general would seem like a perfectly reasonable thing to do, this code would be broken (and not in an easy to detect way; it would indeed be indistinguishable from the way in which it's broken right now). > Do we > really have a comment every other place that pgprocno is referenced > without a lock? I suspect strongly that there is no other place that attempts to touch any part of an invalid (freed) PGPROC. If there is, more than likely it's unsafe. I don't have time right now to read the patch in detail, but will look tomorrow. In the meantime, thanks for addressing my concerns. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A bit of PG archeology uncovers an interesting Linux/Unix factoid
On 02/15/16 13:42, Greg Stark wrote: > (it returns error with errno ECHILD upon successful completion of > commands). > This fix ignores an error from system() if errno == ECHILD. > > It looks like Linux now behaves similarly, It seems to be official, in the Single Unix Specification: http://pubs.opengroup.org/onlinepubs/7908799/xsh/sigaction.html SA_NOCLDWAIT If set, and sig equals SIGCHLD, child processes of the calling processes will not be transformed into zombie processes when they terminate. If the calling process subsequently waits for its children, and the process has no unwaited for children that were transformed into zombie processes, it will block until all of its children terminate, and wait(), wait3(), waitid() and waitpid() will fail and set errno to [ECHILD]. Otherwise, terminating child processes will be transformed into zombie processes, unless SIGCHLD is set to SIG_IGN. > So just in case anyone else wants to use system() in Postgres or > indeed any other Unix application that twiddles with the SIGCHILD > handler this is something to beware of. It's not entirely clear to me > that the mention of SA_NOCLDWAIT is the only way to get this > behaviour, at least one stackoverflow answer implied just setting > SIG_IGN was enough. Yup: • If a process sets the action for the SIGCHLD signal to SIG_IGN, the behaviour is unspecified, except as specified below. If the action for the SIGCHLD signal is set to SIG_IGN, child processes of the calling processes will not be transformed into zombie processes when they terminate. If the calling process subsequently waits for its children, and the process has no unwaited for children that were transformed into zombie processes, it will block until all of its children terminate, and wait(), wait3(), waitid() and waitpid() will fail and set errno to [ECHILD]. -Chap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl
On Sun, Feb 14, 2016 at 5:26 PM, Tom Lane wrote: > Robert Haas writes: >> On Sun, Feb 14, 2016 at 1:33 PM, Tom Lane wrote: >>> pgprocno of the specific PGPROC, or of the group leader? If it's >>> the former, I'm pretty suspicious that there are deadlock and/or >>> linked-list-corruption hazards here. If it's the latter, at least >>> the comments around this are misleading. > >> Leader. The other way would be nuts. > > ... and btw, either BecomeLockGroupMember() has got this backwards, or > I'm still fundamentally confused. Yeah, you're right. Attached is a draft patch that tries to clean up that and a bunch of other things that you raised. It hasn't really been tested yet, so it might be buggy; I wrote it during my long plane flight. Fixes include: 1. It removes the groupLeader flag from PGPROC. You're right: we don't need it. 2. It fixes this problem with BecomeLockGroupMember(). You're right: the old way was backwards. 3. It fixes TopoSort() to be less inefficient by checking whether the EDGE is for the correct lock before doing anything else. I realized this while updating comments related to EDGE. 4. It adds some text to the lmgr README. 5. It reflows the existing text in the lmgr README to fit within 80 columns. 6. It adjusts some other comments. Possibly this should be broken up into multiple patches, but I'm just sending it along all together for now. > Also, after fixing that it would be good to add a comment explaining why > it's not fundamentally unsafe for BecomeLockGroupMember() to examine > leader->pgprocno without having any relevant lock. AFAICS, that's only > okay because the pgprocno fields are never changed after InitProcGlobal, > even when a PGPROC is recycled. I am sort of disinclined to think that this needs a comment. Do we really have a comment every other place that pgprocno is referenced without a lock? Or maybe there are none, but it seems like overkill to me to comment on that at every site of use. Better to comment on the pgprocno definition itself and say "never changes". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/storage/lmgr/README b/src/backend/storage/lmgr/README index cb9c7d6..8eaa91c 100644 --- a/src/backend/storage/lmgr/README +++ b/src/backend/storage/lmgr/README @@ -591,26 +591,27 @@ Group Locking As if all of that weren't already complicated enough, PostgreSQL now supports parallelism (see src/backend/access/transam/README.parallel), which means that -we might need to resolve deadlocks that occur between gangs of related processes -rather than individual processes. This doesn't change the basic deadlock -detection algorithm very much, but it makes the bookkeeping more complicated. +we might need to resolve deadlocks that occur between gangs of related +processes rather than individual processes. This doesn't change the basic +deadlock detection algorithm very much, but it makes the bookkeeping more +complicated. We choose to regard locks held by processes in the same parallel group as -non-conflicting. This means that two processes in a parallel group can hold -a self-exclusive lock on the same relation at the same time, or one process -can acquire an AccessShareLock while the other already holds AccessExclusiveLock. +non-conflicting. This means that two processes in a parallel group can hold a +self-exclusive lock on the same relation at the same time, or one process can +acquire an AccessShareLock while the other already holds AccessExclusiveLock. This might seem dangerous and could be in some cases (more on that below), but if we didn't do this then parallel query would be extremely prone to self-deadlock. For example, a parallel query against a relation on which the leader had already AccessExclusiveLock would hang, because the workers would -try to lock the same relation and be blocked by the leader; yet the leader can't -finish until it receives completion indications from all workers. An undetected -deadlock results. This is far from the only scenario where such a problem -happens. The same thing will occur if the leader holds only AccessShareLock, -the worker seeks AccessShareLock, but between the time the leader attempts to -acquire the lock and the time the worker attempts to acquire it, some other -process queues up waiting for an AccessExclusiveLock. In this case, too, an -indefinite hang results. +try to lock the same relation and be blocked by the leader; yet the leader +can't finish until it receives completion indications from all workers. An +undetected deadlock results. This is far from the only scenario where such a +problem happens. The same thing will occur if the leader holds only +AccessShareLock, the worker seeks AccessShareLock, but between the time the +leader attempts to acquire the lock and the time the worker attempts to +acquire it, some other process queues up waiting for an AccessExclusiveLock. +In this case, too,
Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc
On Mon, Feb 15, 2016 at 07:31:40PM -0500, Robert Haas wrote: > On Mon, Feb 15, 2016 at 5:52 PM, Noah Misch wrote: > > On Mon, Feb 08, 2016 at 02:49:27PM -0500, Robert Haas wrote: > >> force_parallel_mode=regress > >> max_parallel_degree=2 > >> > >> And then run this: make check-world > >> TEMP_CONFIG=/path/to/aforementioned/file > > I configured a copy of animal "mandrill" that way and launched a test run. > > The postgres_fdw suite failed as attached. A manual "make -C contrib > > installcheck" fails the same way on a ppc64 GNU/Linux box, but it passes on > > x86_64 and aarch64. Since contrib test suites don't recognize TEMP_CONFIG, > > check-world passes everywhere. > > Oh, crap. I didn't realize that TEMP_CONFIG didn't affect the contrib > test suites. Is there any reason for that, or is it just kinda where > we ended up? To my knowledge, it's just the undesirable place we ended up. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc
On Mon, Feb 15, 2016 at 5:52 PM, Noah Misch wrote: > On Mon, Feb 08, 2016 at 02:49:27PM -0500, Robert Haas wrote: >> Well, what I've done is push into the buildfarm code that will allow >> us to do *the most exhaustive* testing that I know how to do in an >> automated fashion. Which is to create a file that says this: >> >> force_parallel_mode=regress >> max_parallel_degree=2 >> >> And then run this: make check-world TEMP_CONFIG=/path/to/aforementioned/file >> >> Now, that is not going to find bugs in the deadlock.c portion of the >> group locking patch, but it's been wildly successful in finding bugs >> in other parts of the parallelism code, and there might well be a few >> more that we haven't found yet, which is why I'm hoping that we'll get >> this procedure running regularly either on all buildfarm machines, or >> on some subset of them, or on new animals that just do this. > > I configured a copy of animal "mandrill" that way and launched a test run. > The postgres_fdw suite failed as attached. A manual "make -C contrib > installcheck" fails the same way on a ppc64 GNU/Linux box, but it passes on > x86_64 and aarch64. Since contrib test suites don't recognize TEMP_CONFIG, > check-world passes everywhere. Oh, crap. I didn't realize that TEMP_CONFIG didn't affect the contrib test suites. Is there any reason for that, or is it just kinda where we ended up? Retrying it the way you did it, I see the same errors here, so I think this isn't a PPC-specific problem, but just a problem in general. I've actually seen these kinds of errors before in earlier versions of the testing code that eventually became force_parallel_mode. I got fooled into believing I'd fixed the problem because of my confusion about how TEMP_CONFIG worked. I think this is more likely to be a bug in force_parallel_mode than a bug in the code that checks whether a normal parallel query is safe, but I'll have to track it down before I can say for sure. Thanks for testing this. It's not delightful to discover that I muffed this, but better to find it now than in 6 months. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning
Hi Corey, On 2016/02/16 5:15, Corey Huinker wrote: >> >> The individual patches have commit messages that describe code changes. >> This is registered in the upcoming CF. Feedback and review is greatly >> welcomed! >> > We have a current system that is currently a mix of tables, each of which > is range partitioned into approximately 15 partitions (using the pgxn range > partitioning extension), and those partitions are themselves date-series > partitioned via pg_partman. The largest table ingests about 100M rows per > day in a single ETL. I will try this patch out and see how well it compares > in handling the workload. Do you have any areas of interest or concern that > I should monitor? Thanks a lot for willing to give it a spin! I would say this patch series is more geared toward usability. For example, you won't have to write a trigger to route tuples to correct partitions. You can try your mentioned ETL load and see if the patch's implementation of tuple routing fares any better than existing trigger-based approach. Maybe, you were expecting something like load into a stand-alone table and then ALTER TABLE INHERIT to instantly load into the partitioned table (roll-in), but that command is not yet implemented (ATTACH PARTITION command will show a "not implemented" error message). Also, you won't see any optimizer and executor changes. Queries will still use the same plans as existing inheritance-based partitioned tables, although as I mentioned, constraint exclusion won't yet kick in. That will be fixed very shortly. And of course, comments on syntax are welcome as well. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extend pgbench expressions with functions
On Mon, Feb 15, 2016 at 4:58 AM, Fabien COELHO wrote: > Indeed! > >> Taking patch 1 as a completely independent thing, there is no need to >> introduce PgBenchValueType yet. Similar remark for setIntValue and >> coerceToInt. They are useful afterwards when introducing double types to be >> able to handle double input parameters for the gaussian and other functions. > > Yes. This is exactly the pain I'm trying to avoid, creating a different > implementation for the first patch, which is just overriden when the second > part is applied... Splitting a patch means breaking it into independently committable sub-patches, not just throwing each line of the diff into a different pile as best you can. I'm with Michael: that part doesn't belong in this patch. If we want to have an infrastructure refactoring patch that just replaces every relevant use of int64 with PgBenchValue, a union supporting only integer values, then we can do that first and have a later patch introduce double as a separate change. But we can't have things that are logically part of patch #2 just tossed in with patch #1. I was in the middle of ripping that out of the patch when I realized that evalFunc() is pretty badly designed. What you've done is made it the job of each particular function to evaluate its arguments. I don't think that's a good plan. I think that when we discover that we've got a function, we should evaluate all of the arguments that were passed to it using common code that is shared across all types of functions and operators. Then, the evaluated arguments should be passed to the function-specific code, which can do as it likes with them. This way, you have less code that is specific to particular operations and more common code, which is generally a good thing. Every expression evaluation engine that I've ever heard of works this way - see, for example, the PostgreSQL backend. I experimented with trying to do this and ran into a problem: where exactly would you store the evaluated arguments when you don't know how many of them there will be? And even if you did know how many of them there will be, wouldn't that mean that evalFunc or evaluateExpr would have to palloc a buffer of the correct size for each invocation? That's far more heavyweight than the current implementation, and minimizing CPU usage inside pgbench is a concern. It would be interesting to do some pgbench runs with this patch, or the final patch, and see what effect it has on the TPS numbers, if any, and I think we should. But the first concern is to minimize any negative impact, so let's talk about how to do that. I think we should limit the number of arguments to a function to, say, 16, so that an array of int64s or PgBenchValues long enough to hold an entire argument list can be stack-allocated. The backend's limit is higher, but the only reason we need a value higher than 2 here is because you've chosen to introduce variable-argument functions; but I think 16 is enough for any practical purpose. Then, I think we should also change the parse tree representation so that transforms the linked-list into an array stored within the PgBenchExpr, so that you can access the expression for argument i as expr->u.arg[i]. Then we can write this is a loop that evaluates each expression in an array of expressions and stores the result in an array of values. That seems like it would be both cleaner and faster than what you've got here right now. It's also more similar to what you did with the function name itself: the most trivial thing the parser could do is store a pointer to the function or operator name, but that would be slow, so instead it looks up the function and stores a constant. I also went over your documentation changes. I think you inserted the new table smack dab in the middle of a section in a place that it doesn't really belong. The version attached makes it into its own , puts it in a new section a bit further down so that it doesn't break up the flow, and has a few other tweaks that I think are improvements. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index ade1b53..f39f341 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -786,7 +786,7 @@ pgbench options dbname - + \set varname expression @@ -798,8 +798,10 @@ pgbench options dbname The expression may contain integer constants such as 5432, references to variables :variablename, and expressions composed of unary (-) or binary operators - (+, -, *, /, %) - with their usual associativity, and parentheses. + (+, -, *, /, + %) with their usual associativity, + function calls, and + parentheses. @@ -994,6 +996,62 @@ END; + + Built-In Functions + + + The following functions are built into pgbench and + may b
Re: [HACKERS] planstats.sgml
> Tatsuo Ishii writes: >> While reading planstats.sgml, I encounted a sentence which I don't >> understand. > >> These numbers are current as of the last VACUUM or >> ANALYZE on the table. The planner then fetches the >> actual current number of pages in the table (this is a cheap operation, >> not requiring a table scan). If that is different from >> relpages then >> reltuples is scaled accordingly to >> arrive at a current number-of-rows estimate. In this case the value of >> relpages is up-to-date so the rows estimate is >> the same as reltuples. > >> I don't understand the last sentence (In this case...). For me it >> seems it is talking about the case when replages is not different from >> what the planner fetches from the table. If so, why "In this case"? > > I think what it meant is "In the example above, ...". Feel free to > change it if you think that is clearer. Oh, I see. I'm going to change "In this case" to "In the example above". Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] planstats.sgml
Tatsuo Ishii writes: > While reading planstats.sgml, I encounted a sentence which I don't > understand. > These numbers are current as of the last VACUUM or > ANALYZE on the table. The planner then fetches the > actual current number of pages in the table (this is a cheap operation, > not requiring a table scan). If that is different from > relpages then > reltuples is scaled accordingly to > arrive at a current number-of-rows estimate. In this case the value of > relpages is up-to-date so the rows estimate is > the same as reltuples. > I don't understand the last sentence (In this case...). For me it > seems it is talking about the case when replages is not different from > what the planner fetches from the table. If so, why "In this case"? I think what it meant is "In the example above, ...". Feel free to change it if you think that is clearer. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] innocuous: pgbench does FD_ISSET on invalid socket
I noticed that strerror(errno) wasn't the most helpful error context ever, so I changed it to PQerrorMessage(). There may be room for additional improvement there. I also noticed that if one connection dies, we terminate the whole thread, and if the thread is serving multiple connections, the other ones are not PQfinished. It may or may not be worthwhile improving this; if pgbench is used to test the server when connections are randomly dropped that would be helpful, otherwise not so much. I ended up backpatching all the way back. Fabien COELHO wrote: > > Hello Alvaro, > > >Any objections to changing it like this? I'd probably backpatch to 9.5, > >but no further (even though this pattern actually appears all the way > >back.) > > My 0.02€: if the pattern is repeated, maybe a function which incorporates > the check would save lines and improve overall readability? > >... = safePQsocket(...); Hmm, yeah, but that doesn't work too well because we're not invoking exit() in case of an error (which is what the safe pg_malloc etc do), so we'd have to check for what to do after safePQsocket returns -- no improvement in code clarity IMHO. Thanks for the suggestion. Michael Paquier wrote: > Different issues in the same area: > 1) In vacuumdb.c, init_slot() does not check for the return value of > PQsocket(): > slot->sock = PQsocket(conn); > 2) In isolationtester.c, try_complete_step() should do the same. > 3) In pg_recvlogical.c for StreamLogicalLog() I am spotting the same problem. > I guess those ones should be fixed as well, no? Hmm, yeah, perhaps it's worth closing these too. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] planstats.sgml
On Mon, Feb 15, 2016 at 4:23 PM, Tatsuo Ishii wrote: > While reading planstats.sgml, I encounted a sentence which I don't > understand. > > These numbers are current as of the last VACUUM or > ANALYZE on the table. The planner then fetches the > actual current number of pages in the table (this is a cheap operation, > not requiring a table scan). If that is different from > relpages then > reltuples is scaled accordingly to > arrive at a current number-of-rows estimate. In this case the value of > relpages is up-to-date so the rows estimate > is > the same as reltuples. > > I don't understand the last sentence (In this case...). For me it > seems it is talking about the case when replages is not different from > what the planner fetches from the table. If so, why "In this case"? > Isn't "In this case" referrers to the previous sentence (If that is > different from...)? Maybe "In this case" should be "Otherwise" or some > such? > > The whole sentence is awkward but you are correct in your reading - and "otherwise" would be a solid choice. Though iIt seems the whole thing could be simplified to: These numbers are current as of the last VACUUM or ANALYZE on the table. To account for subsequent changes the planner obtains the actual page count for the table and scales pg_class.reltuples by a factor of "actual page count" over pg_class.relpages. The "cheap operation" comment seems gratuitous though could still be injected if desired. David J.
Re: [HACKERS] Using quicksort for every external sort run
On Mon, Feb 15, 2016 at 8:43 PM, Jim Nasby wrote: > On 2/7/16 8:57 PM, Peter Geoghegan wrote: >> >> It seems less than ideal that DBAs have to be so >> conservative in sizing work_mem. > > > +10 I was thinking about this over the past couple weeks. I'm starting to think the quicksort runs gives at least the beginnings of a way forward on this front. Keeping in mind that we know how many tapes we can buffer in memory and the number is likely to be relatively large -- on the order of 100+ is typical, what if do something like the following rough sketch: Give users two knobs, a lower bound "sort in memory using quicksort" memory size and an upper bound "absolutely never use more than this" which they can set to a substantial fraction of physical memory. Then when we overflow the lower bound we start generating runs, the first one being of that length. Each run we generate we double (or increase by 50% or something) until we hit the maximum. That means that the first few runs may be shorter than necessary but we have enough tapes available that that doesn't hurt much and we'll eventually get to a large enough run size that we won't run out of tapes and can still do a single final (on the fly) merge. In fact what's really attractive about this idea is that it might give us a reasonable spot to do some global system resource management. Each time we want to increase the run size we check some shared memory counter of how much memory is in use and refuse to increase if there's too much in use (or if we're using too large a fraction of it or some other heuristic). The key point is that since we don't need to decide up front at the beginning of the sort and we don't need to track it continuously there is neither too little nor too much contention on this shared memory variable. Also the behaviour would be not too chaotic if there's a user-tunable minimum and the other activity in the system only controls how more memory it can steal from the global pool on top of that. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] planstats.sgml
While reading planstats.sgml, I encounted a sentence which I don't understand. These numbers are current as of the last VACUUM or ANALYZE on the table. The planner then fetches the actual current number of pages in the table (this is a cheap operation, not requiring a table scan). If that is different from relpages then reltuples is scaled accordingly to arrive at a current number-of-rows estimate. In this case the value of relpages is up-to-date so the rows estimate is the same as reltuples. I don't understand the last sentence (In this case...). For me it seems it is talking about the case when replages is not different from what the planner fetches from the table. If so, why "In this case"? Isn't "In this case" referrers to the previous sentence (If that is different from...)? Maybe "In this case" should be "Otherwise" or some such? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc
On Mon, Feb 15, 2016 at 06:07:48PM -0500, Tom Lane wrote: > Noah Misch writes: > > I configured a copy of animal "mandrill" that way and launched a test run. > > The postgres_fdw suite failed as attached. A manual "make -C contrib > > installcheck" fails the same way on a ppc64 GNU/Linux box, but it passes on > > x86_64 and aarch64. Since contrib test suites don't recognize TEMP_CONFIG, > > check-world passes everywhere. > > Hm, is this with or without the ppc-related atomics fix you just found? Without those. The ppc64 GNU/Linux configuration used gcc, though, and the atomics change affects xlC only. Also, the postgres_fdw behavior does not appear probabilistic; it failed twenty times in a row. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc
Noah Misch writes: > I configured a copy of animal "mandrill" that way and launched a test run. > The postgres_fdw suite failed as attached. A manual "make -C contrib > installcheck" fails the same way on a ppc64 GNU/Linux box, but it passes on > x86_64 and aarch64. Since contrib test suites don't recognize TEMP_CONFIG, > check-world passes everywhere. Hm, is this with or without the ppc-related atomics fix you just found? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock with Adaptive Replacement
Jim Nasby writes: > On 2/12/16 9:55 AM, Robert Haas wrote: >> I think it's important to spend time and energy figuring out exactly >> what the problems with our current algorithm are. We know in general >> terms that usage counts tend to converge to either 5 or 0 and >> therefore sometimes evict buffers both at great cost and almost > Has anyone done testing on the best cap to usage count? IIRC 5 was > pulled out of thin air. My recollection is that there was some testing behind it ... but that was back around 2005 so it seems safe to assume that that testing is no longer terribly relevant. In particular, I'm sure it was tested with shared_buffer counts far smaller than what we now consider sane. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
Alvaro Herrera writes: > (FWIW I'm not the "current" CF manager, because the CF I managed is > already over. I'm not sure that we know who the manager for the > upcoming one is.) We had a vict^H^H^H^Hvolunteer a few days ago: http://www.postgresql.org/message-id/56b91a6d.6030...@pgmasters.net regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Detecting SSI conflicts before reporting constraint violations
On Wed, Feb 3, 2016 at 5:12 PM, Thomas Munro wrote: > I don't see it as a difficult choice between two reasonable > alternatives. It quacks suspiciously like a bug. That seems a little strong to me; I think it would be an unacceptable change in behavior to back-patch this, for example. On the other hand, we have had multiple reports on these lists asserting that the behavior is a bug (not to mention several off-list communications to me about it), it seems like a POLA violation, it hides the information that users of serializable transactions consider most important in favor of relatively insignificant (to them) details about what table and key were involved, and it causes errors to be presented to end users that the developers would prefer to be handled discretely in the background. The current behavior provides this guarantee: "Any set of successfully committed concurrent serializable transactions will provide a result consistent with running them one at a time in some order." Users of serializable transactions would, in my experience, universally prefer to strengthen that guarantee with: "Should a serializable transaction fail only due to the action of a concurrent serializable transaction, it should fail with a serialization failure error." People have had to resort to weird heuristics like performing a limited number of retries on a duplicate key error in case it happens to be due to a serialization problem, but that wastes resources when it is not a serialization failure, and unnecessarily complicates the retry framework. > The theoretical problem with the current behaviour is that by > reporting a unique constraint violation in this case, we reach an > outcome that is neither a serialization failure nor a result that > could occur in any serial ordering of transactions. Well, not if you only consider successfully committed transactions. ;-) > The overlapping > transactions both observed that the key they planned to insert was not > present before inserting, and therefore they can't be untangled: there > is no serial order of the transactions where the second transaction to > run wouldn't see the key already inserted by the first transaction and > (presumably) take a different course of action. (If it *does* see the > value already present in its snapshot, or doesn't even look first > before inserting and it turns out to be present, then it really > *should* get a unique constraint violation when trying to insert.) > > The practical problem with the current behaviour is that the user has > to work out whether a unique constraint violation indicates: > > 1. A programming error -- something is wrong that retrying probably won't fix > > 2. An unfavourable outcome in the case that you speculatively > inserted without checking whether the value was present so you were > expecting a violation to be possible, in which case you know what > you're doing and you can figure out what to do next, probably retry or > give up > > 3. A serialization failure that has been masked because our coding > happens to check for unique constraint violations without considering > SSI conflicts first -- retrying will eventually succeed. > > It's complicated for a human to work out how to distinguish the third > category errors in each place where they might occur (and even to know > that they are possible, since AFAIK the manual doesn't point it out), > and impossible for an automatic retry-on-40001 framework to handle in > general. SERIALIZABLE is supposed to be super easy to use (and > devilishly hard to implement...). This is exactly on the mark, IMO. FWIW, at the conference in Moscow I had people for whom this is their #1 feature request. (Well, actually, they also argued it should be considered a bug fix; but on argument agreed that the current guarantee is useful and operating as designed, so were willing to see it treated as an enhancement.) Another way of stating the impact of this patch is that it changes the guarantee to: "If you write a transaction so that it does the right thing when run alone, it will always do the right thing as part of any mix of serializable transactions or will fail with a serialization failure error." Right now we have to add: "... or, er, maybe a duplicate key error." -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.
Andres Freund writes: > I wonder if we shouldn't just expose a 'which pid is process X waiting > for' API, implemented serverside. That's generally really useful, and > looks like it's actually going to be less complicated than that > query... And it's surely going to be faster. Attached is a draft patch for a new function that reports the set of PIDs directly blocking a given PID (that is, holding or awaiting conflicting locks on the lockable object it's waiting for). I replaced isolationtester's pg_locks query with this, and found that it's about 9X faster in a normal build, and 3X faster with CLOBBER_CACHE_ALWAYS turned on. That would give us some nice headroom for the isolation tests with CLOBBER_CACHE_ALWAYS animals. (Note that in view of http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor&dt=2016-02-14%2007%3A38%3A37 we still need to do *something* about the speed of the new deadlock-hard test; this patch could avoid the need to dumb down or slow down that test even further.) Not to be neglected also is that (I believe) this gives the right answer, whereas isolationtester's existing query is currently completely broken by parallel queries, and it doesn't understand non-conflicting lock modes either. (It did, at least partially, before commit 38f8bdcac4982215; I am not sure that taking out the mode checks was a good idea. But putting them back would make the query slower yet.) This is WIP, in part because I've written no user docs for the new function, and in part because I think people might want to bikeshed the API. What is here is: "pg_blocker_pids(integer) returns integer[]" returns a possibly-empty array of the PIDs of backend processes that block the backend with specified PID. You get an empty array, not an error, if the argument isn't a valid backend PID or that backend isn't waiting. In parallel query situations, the output includes PIDs that are blocking any PID in the given process's lock group, and what is reported is always the PID of the lock group leader for whichever process is kdoing the blocking. Also, in parallel query situations, the same PID might appear multiple times in the output; it didn't seem worth trying to eliminate duplicates. Reasonable API change requests might include returning a rowset rather than an array and returning more data per lock conflict. I've not bothered with either thing here because isolationtester doesn't care and it would make the query somewhat slower for isolationtester's usage (though, probably, not enough slower to really matter). It should also be noted that I've not really tested the parallel query aspects of this, because I'm not sure how to create a conflicting lock request in a parallel worker. However, if I'm not still misunderstanding the new semantics of the lock data structures, that aspect of it seems unlikely to be wrong. Comments? regards, tom lane diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index 91218d0..97e8962 100644 *** a/src/backend/storage/ipc/procarray.c --- b/src/backend/storage/ipc/procarray.c *** HaveVirtualXIDsDelayingChkpt(VirtualTran *** 2313,2318 --- 2313,2341 PGPROC * BackendPidGetProc(int pid) { + PGPROC *result; + + if (pid == 0)/* never match dummy PGPROCs */ + return NULL; + + LWLockAcquire(ProcArrayLock, LW_SHARED); + + result = BackendPidGetProcWithLock(pid); + + LWLockRelease(ProcArrayLock); + + return result; + } + + /* + * BackendPidGetProcWithLock -- get a backend's PGPROC given its PID + * + * Same as above, except caller must be holding ProcArrayLock. The found + * entry, if any, can be assumed to be valid as long as the lock remains held. + */ + PGPROC * + BackendPidGetProcWithLock(int pid) + { PGPROC *result = NULL; ProcArrayStruct *arrayP = procArray; int index; *** BackendPidGetProc(int pid) *** 2320,2327 if (pid == 0)/* never match dummy PGPROCs */ return NULL; - LWLockAcquire(ProcArrayLock, LW_SHARED); - for (index = 0; index < arrayP->numProcs; index++) { PGPROC *proc = &allProcs[arrayP->pgprocnos[index]]; --- 2343,2348 *** BackendPidGetProc(int pid) *** 2333,2340 } } - LWLockRelease(ProcArrayLock); - return result; } --- 2354,2359 diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index fef59a2..fb32769 100644 *** a/src/backend/storage/lmgr/lock.c --- b/src/backend/storage/lmgr/lock.c *** *** 21,27 * * Interface: * ! * InitLocks(), GetLocksMethodTable(), * LockAcquire(), LockRelease(), LockReleaseAll(), * LockCheckConflicts(), GrantLock() * --- 21,27 * * Interface: * ! * InitLocks(), GetLocksMethodTable(), GetLockTagsMethodTable(), * LockAcquire(), LockRelease(), LockReleaseAll(), * LockCheckConflicts(), GrantLock() * *** *** 41,46 --- 41,47
Re: [HACKERS] Freeze avoidance of very large table.
On Wed, Feb 10, 2016 at 04:39:15PM +0900, Kyotaro HORIGUCHI wrote: > > I still agree with this plugin approach, but I felt it's still > > complicated a bit, and I'm concerned that patch size has been > > increased. > > Please give me feedbacks. > > Yeah, I feel the same. What make it worse, the plugin mechanism > will get further complex if we make it more flexible for possible > usage as I proposed above. It is apparently too complicated for > deciding whether to load *just one*, for now, converter > function. And no additional converter is in sight. > > I incline to pull out all the plugin stuff of pg_upgrade. We are > so prudent to make changes of file formats so this kind of events > will happen with several-years intervals. The plugin mechanism > would be valuable if we are encouraged to change file formats > more frequently and freely by providing it, but such situation > absolutely introduces more untoward things.. I agreed on ripping out the converter plugin ability of pg_upgrade. Remember pg_upgrade was originally written by EnterpriseDB staff, and I think they expected their closed-source fork of Postgres might need a custom page converter someday, but it never needed one, and at this point I think having the code in there is just making things more complex. I see _no_ reason for community Postgres to use a plugin converter because we are going to need that code for every upgrade from pre-9.6 to 9.6+, so why not just hard-code in the functions we need. We can remove it once 9.5 is end-of-life. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
On Mon, Feb 15, 2016 at 12:58 PM, Alvaro Herrera wrote: > (FWIW I'm not the "current" CF manager, because the CF I managed is > already over. I'm not sure that we know who the manager for the > upcoming one is.) It's pretty easy to forget that this was the first time in a long time that the CF wasn't closed because the next CF began. :-) -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock with Adaptive Replacement
On 2/12/16 9:55 AM, Robert Haas wrote: I think it's important to spend time and energy figuring out exactly what the problems with our current algorithm are. We know in general terms that usage counts tend to converge to either 5 or 0 and therefore sometimes evict buffers both at great cost and almost Has anyone done testing on the best cap to usage count? IIRC 5 was pulled out of thin air. Actually, I don't recall ever seeing a clock sweep that supported more than a single bit, though often there are multiple 'pools' a buffer could be in (ie: active vs inactive in most unix VMs). If you have a reasonable amount of 1 or 0 count buffers then this probably doesn't matter too much, but if your working set is significantly higher than shared buffers then you're probably doing a LOT of full sweeps to try and get something decremented down to 0. randomly. But what's a lot less clear is how much that actually hurts us given that we are relying on the OS cache anyway. It may be that we need to fix some other things before or after improving the buffer eviction algorithm before we actually get a performance benefit. I suspect, for example, that a lot of the problems with large shared_buffers settings have to do with the bgwriter and checkpointer behavior rather than with the buffer eviction algorithm; and that others have to do with cache duplication between PostgreSQL and the operating system. So, I would suggest (although of course it's up to It would be nice if there was at least an option to instrument how long an OS read request took, so that you could guage how many requests were coming from the OS vs disk. (Obviously direct knowledge from the OS is even better, but I don't think those APIs exist.) you) that you might want to focus on experiments that will help you understand where the problems are before you plunge into writing code to fix them. +1 -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
Peter Geoghegan wrote: > Michael (the CF manager at the time) remembered to change the status > to "Ready for Committer" again; you see this entry immediately > afterwards: > > "New status: Ready for Committer" > > but I gather from the CF app history that Alvaro (the current CF > manager) did not remember to do that second step when he later moved > the patch to the "next" (current) CF. Or maybe he just wasn't aware of > this quirk of the CF app. That seems a bug in the CF app to me. (FWIW I'm not the "current" CF manager, because the CF I managed is already over. I'm not sure that we know who the manager for the upcoming one is.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using quicksort for every external sort run
On 2/7/16 8:57 PM, Peter Geoghegan wrote: It seems less than ideal that DBAs have to be so conservative in sizing work_mem. +10 -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] exposing pg_controldata and pg_config as functions
On 01/20/2016 08:08 PM, Michael Paquier wrote: > On Wed, Jan 20, 2016 at 2:32 AM, Joe Conway wrote: >> The only things I know of still lacking is: >> 1) Documentation Done and included in the attached. >> 2) Decision on REVOKE ... FROM PUBLIC > > Yep, regarding 2) I am the only one actually making noise to protect > this information by default, against at least 2 committers :) I plan to commit this way -- if the decision is made to remove the two REVOKEs it can always be done later, but I see no problem with it. > +typedef struct configdata > +{ > + char *name; > + char *setting; > +} configdata; > For a better analogy to ControlFileData, this could be renamed ConfigData? Well I was already using ConfigData as the variable name, but after some review it seems better your way, so I made the struct ConfigData and the variable configdata. > The point of the move to src/common is to remove the duplication in > src/bin/pg_config/Makefile. check > All the files in src/common should begin their include declarations with that: > #ifndef FRONTEND > #include "postgres.h" > #else > #include "postgres_fe.h" > #endif check > +configdata * > +get_configdata(char *my_exec_path, size_t *configdata_len) > +{ > It may be good to mention that the result is palloc'd and that caller > may need to free it if necessary. It does not matter in the two code > paths of this patch, but it may matter for other users calling that. check I believe this takes care of all open issues with this, so I plan to commit it as attached in a day or two. Thanks for your reviews and comments! Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 412c845..6c50f79 100644 *** a/doc/src/sgml/catalogs.sgml --- b/doc/src/sgml/catalogs.sgml *** *** 7350,7355 --- 7350,7360 + pg_config + compile-time configuration parameters + + + pg_cursors open cursors *** *** 7609,7614 --- 7614,7667 + + pg_config + + +pg_config + + + +The view pg_config describes the +compile-time configuration parameters of the currently installed +version of PostgreSQL. It is intended, for example, to be used by +software packages that want to interface to PostgreSQL to facilitate +finding the required header files and libraries. It provides the same +basic information as the PostgreSQL Client +Application. There is a System Information Function +() called pg_config +which underlies this view. + + + +pg_config Columns + + + + Name + Type + Description + + + + + + name + text + The parameter name + + + + setting + text + The parameter value + + + + + + + pg_cursors diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index f9eea76..29c36d2 100644 *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** SELECT * FROM pg_ls_dir('.') WITH ORDINA *** 15003,15008 --- 15003,15014 +pg_config() +setof record +get list of compile-time configure variable names and their values + + + pg_is_other_temp_schema(oid) boolean is schema another session's temporary schema? *** SET search_path TO schema + pg_config + + + + pg_config returns a set of records describing the + compile-time configuration parameters of the currently installed + version of PostgreSQL. It is intended, for example, to be used by + software packages that want to interface to PostgreSQL to facilitate + finding the required header files and libraries. The + name column contains the parameter name. + The setting column contains the parameter value. It + provides the same basic information as the + PostgreSQL Client Application. There + is also a system view. + + + version diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 923fe58..abf9a70 100644 *** a/src/backend/catalog/system_views.sql --- b/src/backend/catalog/system_views.sql *** CREATE VIEW pg_timezone_abbrevs AS *** 433,438 --- 433,444 CREATE VIEW pg_timezone_names AS SELECT * FROM pg_timezone_names(); + CREATE VIEW pg_config AS + SELECT * FROM pg_config(); + + REVOKE ALL on pg_config FROM PUBLIC; + REVOKE EXECUTE ON FUNCTION pg_config() FROM PUBLIC; + -- Statistics views CREATE VIEW pg_stat_all_tables AS diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile index 7889101..a0c82c1 100644 *** a/src/backend/utils/misc/Makefile --- b/s
Re: [HACKERS] Declarative partitioning
> > > The individual patches have commit messages that describe code changes. > This is registered in the upcoming CF. Feedback and review is greatly > welcomed! > > Thanks, > Amit > > We have a current system that is currently a mix of tables, each of which is range partitioned into approximately 15 partitions (using the pgxn range partitioning extension), and those partitions are themselves date-series partitioned via pg_partman. The largest table ingests about 100M rows per day in a single ETL. I will try this patch out and see how well it compares in handling the workload. Do you have any areas of interest or concern that I should monitor?
Re: [HACKERS] PoC: Partial sort
On Sun, Jan 31, 2016 at 4:16 AM, Alvaro Herrera wrote: > Great to have a new version -- there seems to be a lot of interest in > this patch. I'm moving this one to the next commitfest, thanks. I am signed up to review this patch. I was very surprised to see it in "Needs Review" state in the CF app (Alexander just rebased the patch, and didn't do anything with the CF app entry). Once again, this seems to have happened just because Alvaro moved the patch to the next CF. I've marked it "Waiting on Author" once more. Hopefully the CF app will be fixed soon, so moving a patch to the next commitfest no longer clobbers its state. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove or weaken hints about "effective resolution of sleep delays is 10 ms"?
On Mon, Feb 15, 2016 at 11:23 AM, Peter Geoghegan wrote: > commit_delay doesn't have any guidance like this, where it could > certainly matter, because optimal settings are rarely greater than 10 > milliseconds. Actually, it does, but it's in "29.4. WAL Configuration", not next to the documentation for commit_delay. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove or weaken hints about "effective resolution of sleep delays is 10 ms"?
On Wed, Feb 10, 2016 at 2:15 PM, Andres Freund wrote: > Afaik that's not the case on any recent operating system/hardware. So > perhaps we should just remove all of those blurbs, or just replace them > with something like "on some older systems the effective resolution of > sleep delays is limited to multiples of 10 milliseconds"? Or just remove it entirely. Really, I can't see that that's likely to matter when it does apply. Also, don't forget to do the same with bgwriter_delay. commit_delay doesn't have any guidance like this, where it could certainly matter, because optimal settings are rarely greater than 10 milliseconds. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Code refactoring related to -fsanitize=use-after-scope
Andres Freund writes: > On 2016-02-15 14:37:28 +0100, Martin Liška wrote: >> I've been currently working on support of -sanitize=use-after-scope in the >> GCC compiler and >> I decided to use postgresql as my test-case. The sanitation poisons every >> stack variable at the >> very beginning of a function, unpoisons a variable at the beginning of scope >> definition and finally >> poisons the variable again at the end of scope. > Generally sounds like a good check. >> Following patch fixes issues seen by the sanitizer. Hope it's acceptable? >> With the patch applied, ASAN (with the new sanitization) works fine. > But I'm not immediately seing why this is necessary? Is this about > battling a false positive? I bet a nickel that this is triggered by the goto leading into those variables' scope ("goto process_inner_tuple" at line 2038 in HEAD). That probably bypasses the "unpoison" step. However, doesn't this represent a bug in the sanitizer rather than anything we should change in Postgres? There is no rule in C that you can't execute such a goto, especially not if there is no initialization of those variables. If you can think of a reasonable refactoring that gets rid of the need for that goto, I'd be for that, because it's certainly unsightly. But I don't think it's wrong, and I don't think that the proposed patch is any improvement from a structured-programming standpoint. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xlc atomics
Noah Misch writes: > That commit (0d32d2e) permitted things to compile and usually pass tests, but > I missed the synchronization bug. Since 2015-10-01, the buildfarm has seen > sixteen duplicate-catalog-OID failures. I'd been wondering about those ... > These suggested OidGenLock wasn't doing its job. I've seen similar symptoms > around WALInsertLocks with "IBM XL C/C++ for Linux, V13.1.2 (5725-C73, > 5765-J08)" for ppc64le. The problem is generic-xlc.h > pg_atomic_compare_exchange_u32_impl() issuing __isync() before > __compare_and_swap(). __isync() shall follow __compare_and_swap(); see our > own s_lock.h, its references, and other projects' usage: Nice catch! > This patch's test case would have failed about half the time under today's > generic-xlc.h. Fast machines run it in about 1s. A test that detects the bug > 99% of the time would run far longer, hence this compromise. Sounds like a reasonable compromise to me, although I wonder about the value of it if we stick it into pgbench's TAP tests. How many of the slower buildfarm members are running the TAP tests? Certainly mine are not. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] A bit of PG archeology uncovers an interesting Linux/Unix factoid
For reasons, I was trying to compile older versions of Postgres and ran into a strange behaviour where system() worked normally but then returned -1 with errno set to ECHILD. And surprisingly it looks like we've seen this behaviour in the past but on a Solaris: commit 07d4d36aae79cf2ac365e381ed3e7ce62dcfa783 Author: Tatsuo Ishii Date: Thu May 25 06:53:43 2000 + On solaris, createdb/dropdb fails because of strange behavior of system(). (it returns error with errno ECHILD upon successful completion of commands). This fix ignores an error from system() if errno == ECHILD. It looks like Linux now behaves similarly, in fact there's a Redhat notice about this causing similar headaches in Oracle: https://access.redhat.com/solutions/37218 So just in case anyone else wants to use system() in Postgres or indeed any other Unix application that twiddles with the SIGCHILD handler this is something to beware of. It's not entirely clear to me that the mention of SA_NOCLDWAIT is the only way to get this behaviour, at least one stackoverflow answer implied just setting SIG_IGN was enough. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
On Tue, Dec 22, 2015 at 10:49 PM, Peter Geoghegan wrote: > I attach a rebased patch for 9.6 only. I marked the patch -- my own patch -- "Ready for Committer". I'm the third person to have marked the patch "Ready for Committer", even though no committer bounced the patch back for review by the reviewer, Andreas: https://commitfest.postgresql.org/9/300/ First Andreas did so, then Michael, and finally myself. The problem is that you see things like this: "Closed in commitfest 2015-11 with status: Moved to next CF" Michael (the CF manager at the time) remembered to change the status to "Ready for Committer" again; you see this entry immediately afterwards: "New status: Ready for Committer" but I gather from the CF app history that Alvaro (the current CF manager) did not remember to do that second step when he later moved the patch to the "next" (current) CF. Or maybe he just wasn't aware of this quirk of the CF app. I don't have a problem with having to resubmit a patch to the next CF manually, but it's easy to miss that the status has been changed from "Ready for Committer" to "Needs Review". I don't think anyone ever argued for that, and this patch was accidentally in "Needs Review" state for about 5 days. Can we fix it? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xlc atomics
On Mon, Feb 15, 2016 at 06:33:42PM +0100, Andres Freund wrote: > On 2016-02-15 12:11:29 -0500, Noah Misch wrote: > > These suggested OidGenLock wasn't doing its job. I've seen similar symptoms > > around WALInsertLocks with "IBM XL C/C++ for Linux, V13.1.2 (5725-C73, > > 5765-J08)" for ppc64le. The problem is generic-xlc.h > > pg_atomic_compare_exchange_u32_impl() issuing __isync() before > > __compare_and_swap(). __isync() shall follow __compare_and_swap(); see our > > own s_lock.h, its references, and other projects' usage: > > Ugh. You're right! It's about not moving code before the stwcx... > > Do you want to add the new test (no objection, curious), or is that more > for testing? The patch's test would join PostgreSQL indefinitely. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH v5] GSSAPI encryption support
David Steele writes: > Hi Robbie, > > On 2/10/16 4:06 PM, Robbie Harwood wrote: >> Hello friends, >> >> For your consideration, here is a new version of GSSAPI encryption >> support. For those who prefer, it's also available on my github: >> https://github.com/frozencemetery/postgres/commit/c92275b6605d7929cda5551de47a4c60aab7179e > > It tried out this patch and ran into a few problems: > > 1) It didn't apply cleanly to HEAD. It did apply cleanly on a455878 > which I figured was recent enough for testing. I didn't bisect to find > the exact commit that broke it. It applied to head of master (57c932475504d63d8f8a68fc6925d7decabc378a) for me (`patch -p1 < v4-GSSAPI-encryption-support.patch`). I rebased it anyway and cut a v5 anyway, just to be sure. It's attached, and available on github as well: https://github.com/frozencemetery/postgres/commit/dc10e3519f0f6c67f79abd157dc8ff1a1c293f53 > 2) While I was able to apply the patch and get it compiled it seemed > pretty flaky - I was only able to logon about 1 in 10 times on average. > Here was my testing methodology: > > a) Build Postgres from a455878 (without your patch), install/configure > Kerberos and get everything working. I was able the set the auth method > to gss in pg_hba.conf and logon successfully every time. > > b) On the same system rebuild Postgres from a455878 including your patch > and attempt authentication. > > The problems arose after step 2b. Sometimes I would try to logon twenty > times without success and sometimes it only take five or six attempts. > I was never able to logon successfully twice in a row. > > When not successful the client always output this incomplete message > (without terminating LF): > > psql: expected authentication request from server, but received > > From the logs I can see the server is reporting EOF from the client, > though the client does not core dump and prints the above message before > exiting. > > I have attached files that contain server logs at DEBUG5 and tcpdump > output for both the success and failure cases. > > Please let me know if there's any more information you would like me to > provide. What I can't tell from looking at your methodology is whether both the client and server were running my patches or no. There's no fallback here (I'd like to talk about how that should work, with example from v1-v3, if people have ideas). This means that both the client and the server need to be running my patches for the moment. Is this your setup? Thanks for taking it for a spin! --Robbie From dc10e3519f0f6c67f79abd157dc8ff1a1c293f53 Mon Sep 17 00:00:00 2001 From: Robbie Harwood Date: Tue, 17 Nov 2015 18:34:14 -0500 Subject: [PATCH] Connect encryption support for GSSAPI Existing GSSAPI authentication code is extended to support connection encryption. Connection begins as soon as possible - that is, immediately after the client and server complete authentication. --- configure | 2 + configure.in| 1 + doc/src/sgml/client-auth.sgml | 2 +- doc/src/sgml/runtime.sgml | 20 +- src/Makefile.global.in | 1 + src/backend/libpq/Makefile | 4 + src/backend/libpq/auth.c| 330 +--- src/backend/libpq/be-gssapi.c | 583 src/backend/libpq/be-secure.c | 16 + src/backend/postmaster/postmaster.c | 12 + src/include/libpq/libpq-be.h| 31 ++ src/interfaces/libpq/Makefile | 4 + src/interfaces/libpq/fe-auth.c | 182 --- src/interfaces/libpq/fe-auth.h | 5 + src/interfaces/libpq/fe-connect.c | 10 + src/interfaces/libpq/fe-gssapi.c| 474 + src/interfaces/libpq/fe-secure.c| 16 +- src/interfaces/libpq/libpq-int.h| 16 +- 18 files changed, 1191 insertions(+), 518 deletions(-) create mode 100644 src/backend/libpq/be-gssapi.c create mode 100644 src/interfaces/libpq/fe-gssapi.c diff --git a/configure b/configure index b3f3abe..a5bd629 100755 --- a/configure +++ b/configure @@ -713,6 +713,7 @@ with_systemd with_selinux with_openssl krb_srvtab +with_gssapi with_python with_perl with_tcl @@ -5491,6 +5492,7 @@ $as_echo "$with_gssapi" >&6; } + # # Kerberos configuration parameters # diff --git a/configure.in b/configure.in index 0bd90d7..4fd8f05 100644 --- a/configure.in +++ b/configure.in @@ -636,6 +636,7 @@ PGAC_ARG_BOOL(with, gssapi, no, [build with GSSAPI support], krb_srvtab="FILE:\$(sysconfdir)/krb5.keytab" ]) AC_MSG_RESULT([$with_gssapi]) +AC_SUBST(with_gssapi) AC_SUBST(krb_srvtab) diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml index 3b2935c..7d37223 100644 --- a/doc/src/sgml/client-auth.sgml +++ b/doc/src/sgml/client-auth.sgml @@ -915,7 +915,7 @@ omicron bryanh guest1 provides automatic authentication (single sign-on) for systems that support it. The authentication
Re: [HACKERS] [PATCH] Code refactoring related to -fsanitize=use-after-scope
Hi, On 2016-02-15 14:37:28 +0100, Martin Liška wrote: > I've been currently working on support of -sanitize=use-after-scope in the > GCC compiler and > I decided to use postgresql as my test-case. The sanitation poisons every > stack variable at the > very beginning of a function, unpoisons a variable at the beginning of scope > definition and finally > poisons the variable again at the end of scope. Generally sounds like a good check. > Following patch fixes issues seen by the sanitizer. Hope it's acceptable? > With the patch applied, ASAN (with the new sanitization) works fine. > diff --git a/src/backend/access/spgist/spgdoinsert.c > b/src/backend/access/spgist/spgdoinsert.c > index f090ca5..ff986c2 100644 > --- a/src/backend/access/spgist/spgdoinsert.c > +++ b/src/backend/access/spgist/spgdoinsert.c > @@ -1871,6 +1871,10 @@ spgdoinsert(Relation index, SpGistState *state, > SPPageDesc current, > parent; > FmgrInfo *procinfo = NULL; > + SpGistInnerTuple innerTuple; > + spgChooseIn in; > + spgChooseOut out; > + > > /* >* Look up FmgrInfo of the user-defined choose function once, to save > @@ -2044,9 +2048,6 @@ spgdoinsert(Relation index, SpGistState *state, >* Apply the opclass choose function to figure out how > to insert >* the given datum into the current inner tuple. >*/ > - SpGistInnerTuple innerTuple; > - spgChooseIn in; > - spgChooseOut out; But I'm not immediately seing why this is necessary? Is this about battling a false positive? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xlc atomics
On 2016-02-15 12:11:29 -0500, Noah Misch wrote: > These suggested OidGenLock wasn't doing its job. I've seen similar symptoms > around WALInsertLocks with "IBM XL C/C++ for Linux, V13.1.2 (5725-C73, > 5765-J08)" for ppc64le. The problem is generic-xlc.h > pg_atomic_compare_exchange_u32_impl() issuing __isync() before > __compare_and_swap(). __isync() shall follow __compare_and_swap(); see our > own s_lock.h, its references, and other projects' usage: Ugh. You're right! It's about not moving code before the stwcx... Do you want to add the new test (no objection, curious), or is that more for testing? Greetings, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: make NOTIFY list de-duplication optional
Small update. I had to add one thing in /contrib/tcn/. diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c index 7352b29..3a6d4f5 100644 --- a/contrib/tcn/tcn.c +++ b/contrib/tcn/tcn.c @@ -160,7 +160,7 @@ triggered_change_notification(PG_FUNCTION_ARGS) strcpy_quoted(payload, SPI_getvalue(trigtuple, tupdesc, colno), '\''); } -Async_Notify(channel, payload->data); +Async_Notify(channel, payload->data, false); } ReleaseSysCache(indexTuple); break; diff --git a/doc/src/sgml/ref/notify.sgml b/doc/src/sgml/ref/notify.sgml index 4dd5608..933c76c 100644 --- a/doc/src/sgml/ref/notify.sgml +++ b/doc/src/sgml/ref/notify.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation -NOTIFY channel [ , payload ] +NOTIFY [ ALL | DISTINCT ] channel [ , payload ] @@ -105,6 +105,10 @@ NOTIFY channel [ , ALL is specified (contrary to DISTINCT, the + default), the server will deliver all notifications, including duplicates. + Removal of duplicate notifications takes place within transaction block, + finished with COMMIT, END or SAVEPOINT. @@ -190,6 +194,12 @@ NOTIFY channel [ , NOTIFY command if you need to work with non-constant channel names and payloads. + +There is a three-argument version, pg_notify(text, +text, boolean). The third argument acts like +the ALL keyword when set to true, and +DISTINCT when set to false. + @@ -210,6 +220,21 @@ Asynchronous notification "virtual" with payload "This is the payload" received LISTEN foo; SELECT pg_notify('fo' || 'o', 'pay' || 'load'); Asynchronous notification "foo" with payload "payload" received from server process with PID 14728. + +/* Identical messages from same (sub-) transaction can be eliminated - unless you use the ALL keyword */ +LISTEN bar; +BEGIN; +NOTIFY bar, 'Coffee please'; +NOTIFY bar, 'Coffee please'; +NOTIFY bar, 'Milk please'; +NOTIFY ALL bar, 'Milk please'; +SAVEPOINT s; +NOTIFY bar, 'Coffee please'; +COMMIT; +Asynchronous notification "bar" with payload "Coffee please" received from server process with PID 31517. +Asynchronous notification "bar" with payload "Milk please" received from server process with PID 31517. +Asynchronous notification "bar" with payload "Milk please" received from server process with PID 31517. +Asynchronous notification "bar" with payload "Coffee please" received from server process with PID 31517. diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c index c39ac3a..38a8246 100644 --- a/src/backend/commands/async.c +++ b/src/backend/commands/async.c @@ -510,6 +510,7 @@ pg_notify(PG_FUNCTION_ARGS) { const char *channel; const char *payload; + bool use_all; if (PG_ARGISNULL(0)) channel = ""; @@ -521,10 +522,15 @@ pg_notify(PG_FUNCTION_ARGS) else payload = text_to_cstring(PG_GETARG_TEXT_PP(1)); + if (PG_NARGS() > 2 && ! PG_ARGISNULL(2)) + use_all = PG_GETARG_BOOL(2); + else + use_all = false; + /* For NOTIFY as a statement, this is checked in ProcessUtility */ PreventCommandDuringRecovery("NOTIFY"); - Async_Notify(channel, payload); + Async_Notify(channel, payload, use_all); PG_RETURN_VOID(); } @@ -540,7 +546,7 @@ pg_notify(PG_FUNCTION_ARGS) * ^^ */ void -Async_Notify(const char *channel, const char *payload) +Async_Notify(const char *channel, const char *payload, bool use_all) { Notification *n; MemoryContext oldcontext; @@ -570,9 +576,10 @@ Async_Notify(const char *channel, const char *payload) errmsg("payload string too long"))); } - /* no point in making duplicate entries in the list ... */ - if (AsyncExistsPendingNotify(channel, payload)) - return; + if (!use_all) + /* remove duplicate entries in the list */ + if (AsyncExistsPendingNotify(channel, payload)) + return; /* * The notification list needs to live until end of transaction, so store diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index b307b48..7203f4a 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -8528,11 +8528,12 @@ DropRuleStmt: * */ -NotifyStmt: NOTIFY ColId notify_payload +NotifyStmt: NOTIFY all_or_distinct ColId notify_payload { NotifyStmt *n = makeNode(NotifyStmt); - n->conditionname = $2; - n->payload = $3; + n->use_all = $2; + n->conditionname = $3; + n->payload = $4; $$ = (Node *)n; } ; diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c index 045f7f0..0e50561 100644 --- a/src/backend/tcop/utility.c +++ b/src/backend/tcop/utility.c @@ -599,7 +599,7 @@ standard_ProcessUtility(Node *parsetree, NotifyStmt *stmt = (NotifyStmt *) parsetree; PreventCommandDuringRecovery("NOTIFY"); -Async_Notify(stmt->conditionname, stmt->payload); +Async_Notify(stmt->conditionname, stmt->payload, stmt->use_all); } break; diff --git a/
Re: [HACKERS] New pg_upgrade data directory inside old one?
On Mon, Feb 15, 2016 at 6:29 PM, Bruce Momjian wrote: > Someone on IRC reported that if they had run the pg_upgrade-created > delete_old_cluster.sh shell script it would have deleted their old _and_ > new data directories. (Fortunately they didn't run it.) > > I was confused how this could have happened, and the user explained that > their old cluster was in /u/pgsql/data, and that they wanted to switch to > a per-major-version directory naming schema, so they put the new data > directory in /u/pgsql/data/9.5. (They could have just moved the > directory while the server was down, but didn't.) > > Unfortunately, there is no check for having the new cluster data > directory inside the old data directory, only a check for tablespace > directories in the old cluster. (I never anticipated someone would do > this.) > Interesting - I definitely wouldn't have expected that either. And it definitely seems like a foot-gun we should protect the users against. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
[HACKERS] New pg_upgrade data directory inside old one?
Someone on IRC reported that if they had run the pg_upgrade-created delete_old_cluster.sh shell script it would have deleted their old _and_ new data directories. (Fortunately they didn't run it.) I was confused how this could have happened, and the user explained that their old cluster was in /u/pgsql/data, and that they wanted to switch to a per-major-version directory naming schema, so they put the new data directory in /u/pgsql/data/9.5. (They could have just moved the directory while the server was down, but didn't.) Unfortunately, there is no check for having the new cluster data directory inside the old data directory, only a check for tablespace directories in the old cluster. (I never anticipated someone would do this.) The attached patch adds the proper check. This should be backpatched to all supported Postgres versions. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Roman grave inscription + diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c new file mode 100644 index 8c034bc..e92fc66 *** a/src/bin/pg_upgrade/check.c --- b/src/bin/pg_upgrade/check.c *** output_completion_banner(char *analyze_s *** 195,203 deletion_script_file_name); else pg_log(PG_REPORT, ! "Could not create a script to delete the old cluster's data\n" ! "files because user-defined tablespaces exist in the old cluster\n" ! "directory. The old cluster's contents must be deleted manually.\n"); } --- 195,204 deletion_script_file_name); else pg_log(PG_REPORT, ! "Could not create a script to delete the old cluster's data files\n" ! "because the new cluster's data directory or user-defined tablespaces\n" ! "exist in the old cluster directory. The old cluster's contents must\n" ! "be deleted manually.\n"); } *** create_script_for_old_cluster_deletion(c *** 496,513 { FILE *script = NULL; int tblnum; ! char old_cluster_pgdata[MAXPGPATH]; *deletion_script_file_name = psprintf("%sdelete_old_cluster.%s", SCRIPT_PREFIX, SCRIPT_EXT); /* * Some users (oddly) create tablespaces inside the cluster data * directory. We can't create a proper old cluster delete script in that * case. */ - strlcpy(old_cluster_pgdata, old_cluster.pgdata, MAXPGPATH); - canonicalize_path(old_cluster_pgdata); for (tblnum = 0; tblnum < os_info.num_old_tablespaces; tblnum++) { char old_tablespace_dir[MAXPGPATH]; --- 497,531 { FILE *script = NULL; int tblnum; ! char old_cluster_pgdata[MAXPGPATH], new_cluster_pgdata[MAXPGPATH]; *deletion_script_file_name = psprintf("%sdelete_old_cluster.%s", SCRIPT_PREFIX, SCRIPT_EXT); + strlcpy(old_cluster_pgdata, old_cluster.pgdata, MAXPGPATH); + canonicalize_path(old_cluster_pgdata); + + strlcpy(new_cluster_pgdata, new_cluster.pgdata, MAXPGPATH); + canonicalize_path(new_cluster_pgdata); + + /* Some people put the new data directory inside the old one. */ + if (path_is_prefix_of_path(old_cluster_pgdata, new_cluster_pgdata)) + { + pg_log(PG_WARNING, + "\nWARNING: new data directory should not be inside the old data directory, e.g. %s\n", old_cluster_pgdata); + + /* Unlink file in case it is left over from a previous run. */ + unlink(*deletion_script_file_name); + pg_free(*deletion_script_file_name); + *deletion_script_file_name = NULL; + return; + } + /* * Some users (oddly) create tablespaces inside the cluster data * directory. We can't create a proper old cluster delete script in that * case. */ for (tblnum = 0; tblnum < os_info.num_old_tablespaces; tblnum++) { char old_tablespace_dir[MAXPGPATH]; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Code refactoring related to -fsanitize=use-after-scope
Hello. I've been currently working on support of -sanitize=use-after-scope in the GCC compiler and I decided to use postgresql as my test-case. The sanitation poisons every stack variable at the very beginning of a function, unpoisons a variable at the beginning of scope definition and finally poisons the variable again at the end of scope. Following patch fixes issues seen by the sanitizer. Hope it's acceptable? With the patch applied, ASAN (with the new sanitization) works fine. Thanks, Martin diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c index f090ca5..ff986c2 100644 --- a/src/backend/access/spgist/spgdoinsert.c +++ b/src/backend/access/spgist/spgdoinsert.c @@ -1871,6 +1871,10 @@ spgdoinsert(Relation index, SpGistState *state, SPPageDesc current, parent; FmgrInfo *procinfo = NULL; + SpGistInnerTuple innerTuple; + spgChooseIn in; + spgChooseOut out; + /* * Look up FmgrInfo of the user-defined choose function once, to save @@ -2044,9 +2048,6 @@ spgdoinsert(Relation index, SpGistState *state, * Apply the opclass choose function to figure out how to insert * the given datum into the current inner tuple. */ - SpGistInnerTuple innerTuple; - spgChooseIn in; - spgChooseOut out; /* * spgAddNode and spgSplitTuple cases will loop back to here to -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Failover Slots
Hi, here is my code level review: 0001: This one looks ok except for broken indentation in the new notes in xlogreader.c and .h. It's maybe slightly overdocumented but given the complicated way the timeline reading works it's probably warranted. 0002: +/* + * No way to wait for the process since it's not a child + * of ours and there's no latch to set, so poll. + * + * We're checking this without any locks held, but + * we'll recheck when we attempt to drop the slot. + */ +while (slot->in_use && slot->active_pid == active_pid +&& max_sleep_micros > 0) +{ +usleep(micros_per_sleep); +max_sleep_micros -= micros_per_sleep; +} Not sure I buy this, what about postmaster crashes and fast shutdown requests etc. Also you do usleep for 10s which is quite long. I'd prefer the classic wait for latch with timeout and pg crash check here. And even if we go with usleep, then it should be 1s not 10s and pg_usleep instead of usleep. 0003: There is a lot of documentation improvements here that are not related to failover slots or timeline following, it might be good idea to split those into separate patch as they are separately useful IMHO. Other than that it looks good to me. About other things discussed in this thread. Yes it makes sense in certain situations to handle this outside of WAL and that does require notions of nodes, etc. That being said, the timeline following is needed even if this is handled outside of WAL. And once timeline following is in, the slots can be handled by the replication solution itself which is good. But I think the failover slots are still a good thing to have - it provides HA solution for anything that uses slots, and that includes physical replication as well. If the specific logical replication solution wants to handle it for some reason itself outside of WAL, it can create non-failover slot so in my opinion we ideally need both types of slots (and that's exactly what this patch gives us). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Remove or weaken hints about "effective resolution of sleep delays is 10 ms"?
Hi, Several places in our docs have blurbs like > Note that on many systems, the effective resolution of sleep delays is > 10 milliseconds; setting wal_writer_delay to a value that > is not a multiple of 10 might have the same results as setting it to > the next higher multiple of 10. Afaik that's not the case on any recent operating system/hardware. So perhaps we should just remove all of those blurbs, or just replace them with something like "on some older systems the effective resolution of sleep delays is limited to multiples of 10 milliseconds"? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Existence check for suitable index in advance when concurrently refreshing.
On Wed, Feb 10, 2016 at 10:35 AM, Michael Paquier wrote: > On Wed, Feb 10, 2016 at 2:23 AM, Fujii Masao wrote: >> On Wed, Feb 10, 2016 at 2:21 AM, Fujii Masao wrote: >>> On Tue, Feb 9, 2016 at 9:11 PM, Michael Paquier >>> wrote: On Tue, Feb 9, 2016 at 4:27 PM, Fujii Masao wrote: > Thanks for updating the patch! > Attached is the updated version of the patch. > I removed unnecessary assertion check and change of source code > that you added, and improved the source comment. > Barring objection, I'll commit this patch. So, this code basically duplicates what is already in refresh_by_match_merge to check if there is a UNIQUE index defined. If we are sure that an error is detected earlier in the code as done in this patch, wouldn't it be better to replace the error message in refresh_by_match_merge() by an assertion? >>> >>> I'm OK with an assertion if we add source comment about why >>> refresh_by_match_merge() can always guarantee that there is >>> a unique index on the matview. Probably it's because the matview >>> is locked with exclusive lock at the start of ExecRefreshMatView(), >>> i.e., it's guaranteed that we cannot drop any indexes on the matview >>> after the first check is passed. Also it might be better to add >>> another comment about that the caller of refresh_by_match_merge() >>> must always check that there is a unique index on the matview before >>> calling that function, just in the case where it's called elsewhere >>> in the future. >>> >>> OTOH, I don't think it's not so bad idea to just emit an error, instead. >> >> Sorry, s/it's not/it's > > Well, refresh_by_match_merge is called only by ExecRefreshMatView() > and it is not exposed externally in matview.h, so it is not like we > are going to break any extension-related code by having an assertion > instead of an error message, and that's less code duplication to > maintain if this extra error message is removed. But feel free to > withdraw my comment if you think that's not worth it, I won't deadly > insist on that either :) Okay, I revived Sawada's original assertion code and pushed the patch. Thanks! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xlc atomics
On Sat, Jul 04, 2015 at 08:07:49PM -0400, Noah Misch wrote: > On Sun, Jul 05, 2015 at 12:54:43AM +0200, Andres Freund wrote: > > On 2015-07-04 18:40:41 -0400, Noah Misch wrote: > > > (1) "IBM XL C/C++ for AIX, V12.1 (5765-J02, 5725-C72)". Getting it > > > working > > > required the attached patch. > > > Will you apply? Having the ability to test change seems to put you in a > > much better spot then me. > > I will. That commit (0d32d2e) permitted things to compile and usually pass tests, but I missed the synchronization bug. Since 2015-10-01, the buildfarm has seen sixteen duplicate-catalog-OID failures. The compiler has always been xlC, and the branch has been HEAD or 9.5. Examples: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2015-12-15%2004%3A12%3A49 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2015-12-08%2001%3A20%3A16 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2015-12-11%2005%3A25%3A54 These suggested OidGenLock wasn't doing its job. I've seen similar symptoms around WALInsertLocks with "IBM XL C/C++ for Linux, V13.1.2 (5725-C73, 5765-J08)" for ppc64le. The problem is generic-xlc.h pg_atomic_compare_exchange_u32_impl() issuing __isync() before __compare_and_swap(). __isync() shall follow __compare_and_swap(); see our own s_lock.h, its references, and other projects' usage: https://github.com/boostorg/smart_ptr/blob/boost-1.60.0/include/boost/smart_ptr/detail/sp_counted_base_vacpp_ppc.hpp#L55 http://redmine.gromacs.org/projects/gromacs/repository/entry/src/external/thread_mpi/include/thread_mpi/atomic/xlc_ppc.h?rev=v5.1.2#L112 https://www-01.ibm.com/support/knowledgecenter/SSGH2K_13.1.2/com.ibm.xlc131.aix.doc/language_ref/asm_example.html This patch's test case would have failed about half the time under today's generic-xlc.h. Fast machines run it in about 1s. A test that detects the bug 99% of the time would run far longer, hence this compromise. Archaeology enthusiasts may enjoy the bug's changing presentation over time. LWLockAcquire() exposed it after commit ab5194e redesigned lwlock.c in terms of atomics. Assert builds were unaffected for commits in [ab5194e, bbfd7ed); atomics.h asserts fattened code enough to mask the bug. However, removing the AssertPointerAlignment() from pg_atomic_fetch_or_u32() would have sufficed to surface it in assert builds. All builds demonstrated the bug once bbfd7ed introduced xlC pg_attribute_noreturn(), which elicits a simpler instruction sequence around the ExceptionalCondition() call embedded in each Assert. nm diff --git a/src/bin/pgbench/.gitignore b/src/bin/pgbench/.gitignore index aae819e..983a3cd 100644 --- a/src/bin/pgbench/.gitignore +++ b/src/bin/pgbench/.gitignore @@ -1,3 +1,4 @@ /exprparse.c /exprscan.c /pgbench +/tmp_check/ diff --git a/src/bin/pgbench/Makefile b/src/bin/pgbench/Makefile index 18fdf58..e9b1b74 100644 --- a/src/bin/pgbench/Makefile +++ b/src/bin/pgbench/Makefile @@ -40,3 +40,9 @@ clean distclean: maintainer-clean: distclean rm -f exprparse.c exprscan.c + +check: + $(prove_check) + +installcheck: + $(prove_installcheck) diff --git a/src/bin/pgbench/t/001_pgbench.pl b/src/bin/pgbench/t/001_pgbench.pl new file mode 100644 index 000..88e83ab --- /dev/null +++ b/src/bin/pgbench/t/001_pgbench.pl @@ -0,0 +1,25 @@ +use strict; +use warnings; + +use PostgresNode; +use TestLib; +use Test::More tests => 3; + +# Test concurrent insertion into table with UNIQUE oid column. DDL expects +# GetNewOidWithIndex() to successfully avoid violating uniqueness for indexes +# like pg_class_oid_index and pg_proc_oid_index. This indirectly exercises +# LWLock and spinlock concurrency. This test makes a 5-MiB table. +my $node = get_new_node('main'); +$node->init; +$node->start; +$node->psql('postgres', + 'CREATE UNLOGGED TABLE oid_tbl () WITH OIDS; ' + . 'ALTER TABLE oid_tbl ADD UNIQUE (oid);'); +my $script = $node->basedir . '/pgbench_script'; +append_to_file($script, + 'INSERT INTO oid_tbl SELECT FROM generate_series(1,1000);'); +$node->command_like( + [ qw(pgbench --no-vacuum --client=5 --protocol=prepared + --transactions=25 --file), $script ], + qr{processed: 125/125}, + 'concurrent OID generation'); diff --git a/src/include/port/atomics/generic-xlc.h b/src/include/port/atomics/generic-xlc.h index 1e8a11e..f24e3af 100644 --- a/src/include/port/atomics/generic-xlc.h +++ b/src/include/port/atomics/generic-xlc.h @@ -41,18 +41,22 @@ pg_atomic_compare_exchange_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 *expected, uint32 newval) { /* +* XXX: __compare_and_swap is defined to take signed parameters, but that +* shouldn't matter since we don't perform any arithmetic operations. +*/ + boolret = __compare_and_swap((volatile int*)&ptr->value, +
Re: [HACKERS] [WIP] ALTER ... OWNER TO ... CASCADE
Teodor Sigaev writes: >> So basically, a generic CASCADE facility sounds like a lot of work to >> produce something that would seldom be anything but a foot-gun. > DELETE FROM or TRUNCATE could be a foot-gun too, but it's not a reason to > remove tham. I faced with problem when I tried to change owner of datadase > with > all objects inside. Think, this feature could be useful although it should > restricted to superuser obly. That's a pretty weak argument, and I do not think you have thought through all the consequences. It is not hard at all to imagine cases where using this sort of thing could be a security vulnerability. Are you familiar with the reasons why Unix systems don't typically allow users to "give away" ownership of files? The same problem exists here. To be concrete about it: 1. Alice does, say, "CREATE EXTENSION cube". 2. Bob creates a security-definer function owned by himself, using a "cube"-type parameter so that it's dependent on the extension. (It needn't actually do anything with that parameter.) 3. Alice does ALTER EXTENSION cube OWNER TO charlie CASCADE. 4. Bob now has a security-definer function owned by (and therefore executing as) Charlie, whose contents were determined by Bob. Game over for Charlie ... and for everyone else too, if Charlie is a superuser, which is not unlikely for an extension owner. The only way Alice can be sure that the ALTER EXTENSION is safe is if she manually inspects every dependent object, in which case she might as well not use CASCADE. Moreover, the use case you've sketched (ie, change ownership of all objects inside a database) doesn't actually have anything to do with following dependencies. It's a lot closer to REASSIGN OWNED ... in fact, it's not clear to me why REASSIGN OWNED doesn't solve that use-case already. I remain of the opinion that this is a terrible idea. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small PATCH: check of 2 Perl modules
On 2/15/16 8:57 AM, Teodor Sigaev wrote: >> On 2/12/16 8:20 AM, Eugene Kazakov wrote: >>> TAP-tests need two Perl modules: Test::More and IPC::Run. >> I think those modules are part of a standard Perl installation. > > IPC::Run is not. At least for perl from ports tree in FreeBSD. Right, that's why we added the configure option. But Test::More is standard. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] ALTER ... OWNER TO ... CASCADE
TBH, this sounds like a completely terrible idea. There are far too many sorts of dependencies across which one would not expect ownership to propagate; for example, use of a function in a view, or even just a foreign key dependency between two tables. I'm not even clear that there are *any* cases where this behavior is wanted, other than perhaps ALTER OWNER on an extension --- and even there, what you would want is altering the ownership of the member objects, but not everything that depends on the member objects. So basically, a generic CASCADE facility sounds like a lot of work to produce something that would seldom be anything but a foot-gun. DELETE FROM or TRUNCATE could be a foot-gun too, but it's not a reason to remove tham. I faced with problem when I tried to change owner of datadase with all objects inside. Think, this feature could be useful although it should restricted to superuser obly. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] ALTER ... OWNER TO ... CASCADE
> Dmitry Ivanov writes: > > As of now there's no way to transfer the ownership of an object and all > > its > > dependent objects in one step. One has to manually alter the owner of each > > object, be it a table, a schema or something else. > > TBH, this sounds like a completely terrible idea. There are far too many > sorts of dependencies across which one would not expect ownership to > propagate; for example, use of a function in a view, or even just a > foreign key dependency between two tables. Well, actually this is a statement of the fact, and I don't propose enabling this option for every dependency possible. This patch includes an experimental feature that anyone can try and discuss, nothing more. Besides, it acts a lot like 'drop ... cascade' (the findDependentObjects() function is used under the hood), so the behavior is totally predictable. It also writes down all objects that have been touched, so no change goes unnoticed. > I'm not even clear that there are *any* cases where this behavior is > wanted, other than perhaps ALTER OWNER on an extension At very least this might be useful in order to change owner of all tables which inherit some table. -- Dmitry Ivanov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] GetExistingLocalJoinPath() vs. the docs
Greetings, While getting back into the thread Re: Optimization for updating foreign tables in Postgres FDW, I noticed some issues with the docs and GetExistingLocalJoinPath(): GetExistingLocalJoinPath() exists in src/backend/foreign/foreign.c, but the docs include discussion of it under 54.2 - Foreign Data Wrapper Callback Routines. Shouldn't this be under 54.3. Foreign Data Wrapper Helper Functions? Also, the prototype is incorrect in the docs (it doesn't return a void) and the paragraph about what it's for could do with some wordsmithing. A link from 54.2 to 54.3 which mentions it would be fine, of course. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index
It's very pity but author is not able to continue work on this patch, and I would like to raise this flag. I'd like to add some comments about patches: traversalValue patch adds arbitrary value assoсiated with branch in SP-GiST tree walk. Unlike to recostructedValue it could be just pointer, it havn't to be a regular pgsql type. Also, it could be used independly from reconstructedValue. This patch is used both following attached patches. range patch just switchs using reconstructedValue to traversalValue in range opclasses. reconstructedValue was used just because it was an acceptable workaround in case of range type. Range opclase stores a full value in leafs and doesn't need to use reconstructedValue to return tuple in index only scan. See http://www.postgresql.org/message-id/5399.1343512...@sss.pgh.pa.us q4d patch implements index over boxes using SP-GiST. Basic idea was an observation, range opclass thinks about one-dimensional ranges as 2D points. Following this idea, we can think that 2D box (what is 2 points or 4 numbers) could represent a 4D point. We hoped that this approach will be much more effective than traditional R-Tree in case of many overlaps in box's collection. Performance results are coming shortly. In near future (say, tommorrow) I hope to suggest a opclass over polygons based on this approach. Alexander Lebedev wrote: Hello, Hacker. * [PATCH] add a box index to sp-gist We have extended sp-gist with an index that keeps track of boxes We have used ideas underlying sp-gist range implementation to represent 2D boxes as points in 4D space. We use quad tree analogue, but in 4-dimensional space. We call this tree q4d. Each node of this tree is a box (a point in 4D space) which splits space in 16 hyperrectangles. Rationale: r-tree assumes that boxes we're trying to index don't overlap much. When this assumption fails, r-tree performs badly, while our proposal to represent a rectangle as a point in 4D space solves this problem. NB: the index uses traversalValue introduced in a separate patch. * [PATCH] add traversalValue in sp-gist During implementation of box index for sp-gist we saw that we only keep rectangles, but to determine traversal direction we may need to know boundaries of a hyperrectangle. So we calculate them on the fly and store them in traversalValue, available when traversing child nodes, because otherwise we would't be able to calculate them from inside the inner_consistent function call. This patch was written by Teodor Sigaev. * [PATCH] change reconstructValue -> traversalValue in range_spgist.c During traversal, local context is insufficient to pick traversal direction. As of now, this is worked around with the help of reconstructValue. reconstructValue only stores data of the same type as a tree node, that is, a range. We have written a patch that calculates auxillary values and makes them accessible during traversal. We then use traversalValue in a new box index and change range_spgist.c to use it in place of reconstructValue. NB: apply this patch after traversalValue patch. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ traversalValue-1.patch.gz Description: GNU Zip compressed data range-1.patch.gz Description: GNU Zip compressed data q4d-1.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small PATCH: check of 2 Perl modules
Teodor Sigaev writes: >> On 2/12/16 8:20 AM, Eugene Kazakov wrote: >>> TAP-tests need two Perl modules: Test::More and IPC::Run. >> I think those modules are part of a standard Perl installation. > IPC::Run is not. At least for perl from ports tree in FreeBSD. Yeah, I remember having had to install IPC::Run from CPAN, and a couple of other things too (but I don't remember Test::More specifically), when setting up some of my buildfarm critters. It's likely that a lot of distros bundle these, but I don't think IPC::Run is in a basic built-from-source Perl. The real question though is do we need a configure check at all. Given that investigation into a CMake conversion is actively going on, I'm hesitant to move the goalposts for it by introducing a brand new type of configure check. Maybe we should punt this issue until that patch is either accepted or decisively rejected. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] ALTER ... OWNER TO ... CASCADE
Dmitry Ivanov writes: > As of now there's no way to transfer the ownership of an object and all its > dependent objects in one step. One has to manually alter the owner of each > object, be it a table, a schema or something else. TBH, this sounds like a completely terrible idea. There are far too many sorts of dependencies across which one would not expect ownership to propagate; for example, use of a function in a view, or even just a foreign key dependency between two tables. I'm not even clear that there are *any* cases where this behavior is wanted, other than perhaps ALTER OWNER on an extension --- and even there, what you would want is altering the ownership of the member objects, but not everything that depends on the member objects. So basically, a generic CASCADE facility sounds like a lot of work to produce something that would seldom be anything but a foot-gun. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small PATCH: check of 2 Perl modules
On Mon, Feb 15, 2016 at 10:57 PM, Teodor Sigaev wrote: >> On 2/12/16 8:20 AM, Eugene Kazakov wrote: >>> >>> TAP-tests need two Perl modules: Test::More and IPC::Run. >> >> I think those modules are part of a standard Perl installation. > > IPC::Run is not. At least for perl from ports tree in FreeBSD. On OSX and on Windows as well they are now shipped AFAIK. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] Proposal for \crosstabview in psql
Dean Rasheed wrote: > My biggest problem is with the sorting, for all the reasons discussed > above. There is absolutely no reason for \crosstabview to be > re-sorting rows -- they should just be left in the original query > result order We want the option to sort the vertical the header in a late additional step when the ORDER BY of the query is already assigned to another purpose. I've submitted this example on the wiki: https://wiki.postgresql.org/wiki/Crosstabview create view v_data as select * from ( values ('v1','h2','foo', '2015-04-01'::date), ('v2','h1','bar', '2015-01-02'), ('v1','h0','baz', '2015-07-12'), ('v0','h4','qux', '2015-07-15') ) as l(v,h,c,d); Normal top-down display: select v,to_char(d,'Mon') as m, c from v_data order by d; v | m | c +-+- v2 | Jan | bar v1 | Apr | foo v1 | Jul | baz v0 | Jul | qux Crosstabview display without any additional sort: \crosstabview v m c v | Jan | Apr | Jul +-+-+- v2 | bar | | v1 | | foo | baz v0 | | | qux "d" is not present the resultset but it drives the sort so that month names come out in the natural order. \crosstabview does not discard the order of colH nor the order of colV, it follows both, so that we get v2,v1,v0 in this order in the leftmost column (vertical header) just like in the resultset. At this point, it seems to me that it's perfectly reasonable for our user to expect the possibility of sorting additionally by "v" , without changing the query and without changing the order of the horizontal header: \crosstabview +v m c v | Jan | Apr | Jul +-+-+- v0 | | | qux v1 | | foo | baz v2 | bar | | Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small PATCH: check of 2 Perl modules
On 2/12/16 8:20 AM, Eugene Kazakov wrote: TAP-tests need two Perl modules: Test::More and IPC::Run. I think those modules are part of a standard Perl installation. IPC::Run is not. At least for perl from ports tree in FreeBSD. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refectoring of receivelog.c
On Mon, Feb 15, 2016 at 7:15 AM, Craig Ringer wrote: > On 15 February 2016 at 04:48, Magnus Hagander wrote: > >> I was working on adding the tar streaming functionality we talked about >> at the developer meeting to pg_basebackup, and rapidly ran across the issue >> that Andres has been complaining about for a while. The code in >> receivelog.c just passes an insane number of parameters around. Adding or >> changing even a small thing ends up touching a huge number of places. >> > > > Other than the lack of comments on the fields in StreamCtl to indicate > their functions I think this looks good. > I copied that lack of comments from the previous implementation :P But yeah, I agree, it's probably a good idea to add them. > I didn't find any mistakes, but I do admit my eyes started glazing over > after a bit. > > I'd rather not have StreamCtl as a typedef of an anonymous struct if it's > exposed in a header though. It should have a name that can be used in > forward declarations etc. > It's not exactly uncommon with anonymous ones either in our code, but I see no problem adding that. Thanks! -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] [patch] Proposal for \crosstabview in psql
Alvaro Herrera wrote: > So please can we have that wiki page so that the syntax can be hammered > out a bit more. I've added a wiki page with explanation and examples here: https://wiki.postgresql.org/wiki/Crosstabview Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small PATCH: check of 2 Perl modules
On 2/12/16 8:20 AM, Eugene Kazakov wrote: > TAP-tests need two Perl modules: Test::More and IPC::Run. > > The patch adds checking of modules in configure.in and configure. I think those modules are part of a standard Perl installation. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] custom function for converting human readable sizes to bytes
On 2/15/16, Vitaly Burovoy wrote: > P.S.: "bytes" size unit was added just for consistency: each group > should have a name, even with an exponent of 1. Oops... Of course, "even with an exponent of 0". -- Best regards, Vitaly Burovoy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] custom function for converting human readable sizes to bytes
On 2/15/16, Dean Rasheed wrote: >> On 12/02/16 10:19, Dean Rasheed wrote: >>> This seems like a reasonable first patch for me as a committer, so >>> I'll take it unless anyone else was planning to do so. >> > > So looking at this, it seems that for the most part pg_size_bytes() > will parse any output produced by pg_size_pretty(). The exception is > that there are 2 versions of pg_size_pretty(), one that takes bigint > and one that takes numeric, whereas pg_size_bytes() returns bigint, so > it can't handle all inputs. Is there any reason not to make > pg_size_bytes() return numeric? I guess the only reason is a comparison purpose. The example from the first letter[1]: SELECT * FROM pg_class WHERE pg_table_size(oid) > pg_human_size('2GB'); ... but not for something like: WITH sel AS ( SELECT pg_size_pretty(oid) AS _size, * FROM pg_class WHERE ... ) SELECT pg_size_bytes(_size), * FROM sel; So the use case of the "pg_size_bytes" is getting a value to compare with a result from pg_table_size, ..., even with pg_database_size, _each_ of whom returns bigint (or just int), but not numeric: postgres=# \df pg_*_size List of functions Schema | Name | Result data type | Argument data types | Type ++--+-+ pg_catalog | pg_column_size | integer | "any" | normal pg_catalog | pg_database_size | bigint | name | normal pg_catalog | pg_database_size | bigint | oid | normal pg_catalog | pg_indexes_size| bigint | regclass | normal pg_catalog | pg_relation_size | bigint | regclass | normal pg_catalog | pg_relation_size | bigint | regclass, text | normal pg_catalog | pg_table_size | bigint | regclass | normal pg_catalog | pg_tablespace_size | bigint | name | normal pg_catalog | pg_tablespace_size | bigint | oid | normal pg_catalog | pg_total_relation_size | bigint | regclass | normal (10 rows) The commit message for "pg_size_pretty(numeric)" explains[2] it was introduced for using with "pg_xlog_location_diff"[3] only. Since "pg_size_bytes" supports up to (4 EiB-1B)[4] I hope nobody will want to send a query like "tell me where difference between two transaction log locations is bigger or equal than 4EiB"... But comparison between int8 and numeric is OK, so usage of rational input will not break queries: postgres=# select '10e200'::numeric > 10::int8 as check; check --- t (1 row) P.S.: "bytes" size unit was added just for consistency: each group should have a name, even with an exponent of 1. > It would still be compatible with the example use cases, but it would > be a better inverse of both variants of pg_size_pretty() and would be > more future-proof. It already works internally using numeric, so it's > a trivial change to make now, but impossible to change in the future > without introducing a new function with a different name, which is > messy. > > Thoughts? > > Regards, > Dean [1]http://www.postgresql.org/message-id/cafj8prd-tgodknxdygecza4on01_urqprwf-8ldkse-6bdh...@mail.gmail.com [2]http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=4a2d7ad76f5f275ef2d6a57e1a61d5bf756349e8 [3]http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-ADMIN-BACKUP-TABLE [4]https://en.wikipedia.org/wiki/Binary_prefix#Adoption_by_IEC.2C_NIST_and_ISO -- Best regards, Vitaly Burovoy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small PATCH: check of 2 Perl modules
13.02.2016 16:04, Michael Paquier : On Sat, Feb 13, 2016 at 1:47 PM, Robert Haas wrote: On Fri, Feb 12, 2016 at 8:20 AM, Eugene Kazakov wrote: TAP-tests need two Perl modules: Test::More and IPC::Run. The patch adds checking of modules in configure.in and configure. Why would we want that? I was doubtful at the beginning, but it doesn't hurt to have those sanity checks in configure actually. The idea is that when --enable-tap-tests is used now we simply error with "Can't locate IPC/Run.pm in @INC" when kicking the tests, this check would allow one to know if his environment is adapted to run the tests or not before compiling anything. And with this patch, we would fail now with that: configure: error: Need Perl IPC::Run module By the way, the patch given upthread by Eugene is incorrect. To begin with, AX_PROG_PERL_MODULES has not been compiled by autoconf and I can't believe that it is available on all platforms, for example on OSX 10.8 I could not see it. And it is actually here: https://www.gnu.org/software/autoconf-archive/ax_prog_perl_modules.html I would recommend grabbing a copy of this file, and change the error message as follows: Perl module IPC::Run is required to run TAP tests See the patch attached as reference, we could simplify the macro of this m4 file and remove the check for perl, which is here: +# Make sure we have perl +if test -z "$PERL"; then +AC_CHECK_PROG(PERL,perl,perl) +fi Though I kept the original script as-is in the patch attached. Regards, Michael, Thank you. You are right, of course. I missed the m4_ax_prog_perl_modules. Please, see the fixed version of patch in the attach. I added m4_ax_prog_perl_modules and change the error messages. The best regards, Eugene Kazakov, Postgres Professional diff --git a/aclocal.m4 b/aclocal.m4 index 6f930b6..e6b2aaf 100644 --- a/aclocal.m4 +++ b/aclocal.m4 @@ -10,3 +10,4 @@ m4_include([config/perl.m4]) m4_include([config/programs.m4]) m4_include([config/python.m4]) m4_include([config/tcl.m4]) +m4_include([config/m4_ax_prog_perl_modules.m4]) diff --git a/config/m4_ax_prog_perl_modules.m4 b/config/m4_ax_prog_perl_modules.m4 new file mode 100644 index 000..ed5dd30 --- /dev/null +++ b/config/m4_ax_prog_perl_modules.m4 @@ -0,0 +1,72 @@ +# === +# http://www.gnu.org/software/autoconf-archive/ax_prog_perl_modules.html +# === +# +# SYNOPSIS +# +# AX_PROG_PERL_MODULES([MODULES], [ACTION-IF-TRUE], [ACTION-IF-FALSE]) +# +# DESCRIPTION +# +# Checks to see if the given perl modules are available. If true the shell +# commands in ACTION-IF-TRUE are executed. If not the shell commands in +# ACTION-IF-FALSE are run. Note if $PERL is not set (for example by +# calling AC_CHECK_PROG, or AC_PATH_PROG), AC_CHECK_PROG(PERL, perl, perl) +# will be run. +# +# MODULES is a space separated list of module names. To check for a +# minimum version of a module, append the version number to the module +# name, separated by an equals sign. +# +# Example: +# +# AX_PROG_PERL_MODULES( Text::Wrap Net::LDAP=1.0.3, , +# AC_MSG_WARN(Need some Perl modules) +# +# LICENSE +# +# Copyright (c) 2009 Dean Povey +# +# Copying and distribution of this file, with or without modification, are +# permitted in any medium without royalty provided the copyright notice +# and this notice are preserved. This file is offered as-is, without any +# warranty. + +#serial 7 + +AU_ALIAS([AC_PROG_PERL_MODULES], [AX_PROG_PERL_MODULES]) +AC_DEFUN([AX_PROG_PERL_MODULES],[dnl + +m4_define([ax_perl_modules]) +m4_foreach([ax_perl_module], m4_split(m4_normalize([$1])), + [ + m4_append([ax_perl_modules], +[']m4_bpatsubst(ax_perl_module,=,[ ])[' ]) + ]) + +if test "x$PERL" != x; then + ax_perl_modules_failed=0 + for ax_perl_module in ax_perl_modules; do +AC_MSG_CHECKING(for perl module $ax_perl_module) + +# Would be nice to log result here, but can't rely on autoconf internals +$PERL -e "use $ax_perl_module; exit" > /dev/null 2>&1 +if test $? -ne 0; then + AC_MSG_RESULT(no); + ax_perl_modules_failed=1 + else + AC_MSG_RESULT(ok); +fi + done + + # Run optional shell commands + if test "$ax_perl_modules_failed" = 0; then +: +$2 + else +: +$3 + fi +else + AC_MSG_WARN(could not find perl) +fi])dnl diff --git a/configure b/configure index 1903815..2d80202 100755 --- a/configure +++ b/configure @@ -15554,6 +15554,79 @@ done if test -z "$PERL"; then as_fn_error $? "Perl not found" "$LINENO" 5 fi + + + + + + + +if test "x$PERL" != x; then + ax_perl_modules_failed=0 + for ax_perl_module in 'Test::More' ; do +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for perl module $ax_perl_module" >&5 +$as_echo_n "checking for perl module $ax_perl_module... " >&6; } + +# Woul
Re: [HACKERS] IF (NOT) EXISTS in psql-completion
On 05.02.2016 11:09, Kyotaro HORIGUCHI wrote: Hello, I considered how to make tab-completion robust for syntactical noises, in other words, optional words in syntax. Typically "IF (NOT) EXISTS", UNIQUE and TEMPORARY are words that don't affect further completion. However, the current delimit-matching mechanism is not so capable (or is complexty-prone) to live with such noises. I have proposed to use regular expressions or simplified one for the robustness but it was too complex to be applied. This is another answer for the problem. Removal of such words on-the-fly makes further matching more robust. Next, currently some CREATE xxx subsyntaxes of CREATE SCHEMA are matched using TailMatching but it makes difficult the options-removal operations, which needs forward matching. So I introduced two things to resolve them by this patch. I did some tests with your patch. But I am not confident in tab-complete.c. And I have some notes: 1 - I execute git apply command and get the following warning: ../0001-Suggest-IF-NOT-EXISTS-for-tab-completion-of-psql.patch:302: trailing whitespace. /* warning: 1 line adds whitespace errors. This is because of superfluous whitespace I think. 2 - In psql I write "create table if" and press . psql adds the following: create table IF NOT EXISTS I think psql should continue with lower case if user wrote query with loser case text: create table if not exists 3 - Same with "IF EXISTS". If a write "alter view if" and press psql writes: alter view IF EXISTS -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
Thanks Fujita-san for bug report and the fix. Sorry for bug. Here's patch with better way to fix it. I think while concatenating the lists, we need to copy the lists being appended and in all the cases. If we don't copy, a change in those lists can cause changes in the upward linkages and thus lists of any higher level joins. On Mon, Feb 15, 2016 at 1:10 PM, Etsuro Fujita wrote: > On 2016/02/10 4:16, Robert Haas wrote: > >> On Tue, Feb 9, 2016 at 8:39 AM, Ashutosh Bapat >> wrote: >> >>> Thanks Jeevan for your review and comments. PFA the patch which fixes >>> those. >>> >> > Committed with a couple more small adjustments. >> > > Thanks for working on this, Robert, Ashutosh, and everyone involved! > > I happened to notice that this code in foreign_join_ok(): > > switch (jointype) > { > case JOIN_INNER: > fpinfo->remote_conds = list_concat(fpinfo->remote_conds, >fpinfo_i->remote_conds); > fpinfo->remote_conds = list_concat(fpinfo->remote_conds, >fpinfo_o->remote_conds); > break; > > case JOIN_LEFT: > fpinfo->joinclauses = list_concat(fpinfo->joinclauses, > fpinfo_i->remote_conds); > fpinfo->remote_conds = list_concat(fpinfo->remote_conds, >fpinfo_o->remote_conds); > break; > > case JOIN_RIGHT: > fpinfo->joinclauses = list_concat(fpinfo->joinclauses, > fpinfo_o->remote_conds); > fpinfo->remote_conds = list_concat(fpinfo->remote_conds, >fpinfo_i->remote_conds); > break; > > case JOIN_FULL: > fpinfo->joinclauses = list_concat(fpinfo->joinclauses, > fpinfo_i->remote_conds); > fpinfo->joinclauses = list_concat(fpinfo->joinclauses, > fpinfo_o->remote_conds); > break; > > default: > /* Should not happen, we have just check this above */ > elog(ERROR, "unsupported join type %d", jointype); > } > > would break the list fpinfo_i->remote_conds in the case of INNER JOIN or > FULL JOIN. You can see the list breakage from e.g., the following queries > on an Assert-enabled build: > > postgres=# create extension postgres_fdw; > CREATE EXTENSION > postgres=# create server myserver foreign data wrapper postgres_fdw > options (dbname 'mydatabase'); > CREATE SERVER > postgres=# create user mapping for current_user server myserver; > CREATE USER MAPPING > postgres=# create foreign table foo (a int) server myserver options > (table_name 'foo'); > CREATE FOREIGN TABLE > postgres=# create foreign table bar (a int) server myserver options > (table_name 'bar'); > CREATE FOREIGN TABLE > postgres=# create foreign table baz (a int) server myserver options > (table_name 'baz'); > CREATE FOREIGN TABLE > postgres=# select * from foo, bar, baz where foo.a = bar.a and bar.a = > baz.a and foo.a < 10 and bar.a < 10 and baz.a < 10; > > Attached is a patch to avoid the breakage. > > Best regards, > Etsuro Fujita > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company foreign_join_ok_v2.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incorrect formula for SysV IPC parameters
On Mon, Feb 15, 2016 at 2:31 PM, Kyotaro HORIGUCHI wrote: > Thanks for looking at this. > > At Fri, 12 Feb 2016 23:19:45 +0900, Fujii Masao wrote > in >> >> ISTM that you also need to change the descriptions about SEMMNI and SEMMNS >> >> under the Table 17-1. >> > >> > Oops! Thank you for pointing it out. >> > >> > The original description doesn't mention the magic-number ('5' in >> > the above formulas, which previously was '4') so I haven't added >> > anything about it. >> > >> > Process of which the number is limited by max_worker_processes is >> > called 'background process' (not 'backend worker') in the >> > documentation so I used the name to mention it in the additional >> > description. >> > >> > The difference in the body text for 9.2, 9.3 is only a literal >> > '4' to '5' in the formula. >> >> Thanks for updating the patches! >> >> They look good to me except that the formulas don't include the number of >> background processes requesting shared memory access, i.e., >> GetNumShmemAttachedBgworkers(), in 9.3. Isn't it better to use the following >> formula in 9.3? >> >> ceil((max_connections + autovacuum_max_workers + number of >> background proceses + 5) / 16) >> >> Attached patch uses the above formula for 9.3. I'm thinking to push your >> patches to 9.2, 9.4, 9.5, master, also push the attached one to 9.3. >> Comments? > > One concern is that users don't have any simple way to know how > many bg-workers a server runs in their current configuration. Users need to read the document of the extensions they want to load, to see the number of background worker processes which will be running. > The formula donsn't make sense without it. IMO, documenting "incorrect" formula can cause more troubles. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
Hello, At Mon, 8 Feb 2016 11:37:17 +0900, Amit Langote wrote in <56b7ff5d.7030...@lab.ntt.co.jp> > > Hi Vinayak, > > Thanks for updating the patch, a couple of comments: > > On 2016/02/05 17:15, poku...@pm.nttdata.co.jp wrote: > > Hello, > > > > Please find attached updated patch. > >> The point of having pgstat_report_progress_update_counter() is so that > >> you can efficiently update a single counter without having to update > >> everything, when only one counter has changed. But here you are > >> calling this function a whole bunch of times in a row, which > >> completely misses the point - if you are updating all the counters, > >> it's more efficient to use an interface that does them all at once > >> instead of one at a time. > > > > The pgstat_report_progress_update_counter() is called at appropriate places > > in the attached patch. > > + charprogress_message[N_PROGRESS_PARAM][PROGRESS_MESSAGE_LENGTH]; > > [ ... ] > > + snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, "%s", phase1); > + pgstat_report_progress_update_message(0, progress_message); > > [ ... ] > > + snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, > "%s", phase2); > + pgstat_report_progress_update_message(0, > progress_message); > > Instead of passing the array of char *'s, why not just pass a single char > *, because that's what it's doing - updating a single message. So, > something like: > > + char progress_message[PROGRESS_MESSAGE_LENGTH]; > > [ ... ] > > + snprintf(progress_message, PROGRESS_MESSAGE_LENGTH, "%s", phase1); > + pgstat_report_progress_update_message(0, progress_message); > > [ ... ] > > + snprintf(progress_message, PROGRESS_MESSAGE_LENGTH, "%s", phase2); > + pgstat_report_progress_update_message(0, progress_message); > > And also: > > +/*--- > + * pgstat_report_progress_update_message()- > + * > + *Called to update phase of VACUUM progress > + *--- > + */ > +void > +pgstat_report_progress_update_message(int index, char *msg) > +{ > > [ ... ] > > + pgstat_increment_changecount_before(beentry); > + strncpy((char *)beentry->st_progress_message[index], msg, > PROGRESS_MESSAGE_LENGTH); > + pgstat_increment_changecount_after(beentry); As I might have written upthread, transferring the whole string as a progress message is useless at least in this scenario. Since they are a set of fixed messages, each of them can be represented by an identifier, an integer number. I don't see a reason for sending the whole of a string beyond a backend. Next, the function pg_stat_get_command_progress() has a somewhat generic name, but it seems to reuturn the data only for the backends with beentry->st_command = COMMAND_LAZY_VACUUM and has the column names specific for vucuum like process. If the function is intended to be generic, it might be better to return a set of integer[] for given type. Otherwise it should have a name represents its objective. CREATE FUNCTION pg_stat_get_command_progress(IN cmdtype integer) RETURNS SETOF integer[] as $$ SELECT * from pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as x x -_ {1233, 16233, 1, } {3244, 16236, 2, } CREATE VIEW pg_stat_vacuum_progress AS SELECT S.s[1] as pid, S.s[2] as relid, CASE S.s[3] WHEN 1 THEN 'Scanning Heap' WHEN 2 THEN 'Vacuuming Index and Heap' ELSE 'Unknown phase' END, FROM pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as S; # The name of the function could be other than *_command_progress. Any thoughts or opinions? > One more comment: > > @@ -1120,14 +1157,23 @@ lazy_scan_heap(Relation onerel, LVRelStats > *vacrelstats, > /* Log cleanup info before we touch indexes */ > vacuum_log_cleanup_info(onerel, vacrelstats); > > + snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, "%s", > phase2); > + pgstat_report_progress_update_message(0, progress_message); > /* Remove index entries */ > for (i = 0; i < nindexes; i++) > + { > lazy_vacuum_index(Irel[i], > &indstats[i], > vacrelstats); > + scanned_index_pages += > RelationGetNumberOfBlocks(Irel[i]); > + /* Update the scanned index pages and number of index > scan */ > + pgstat_report_progress_update_counter(3, > scanned_index_pages); > + pgstat_report_progress_update_counter(4, > vacrelstats->num_index_scans > + 1); > + } > /* Remove tuples from heap */ > lazy_vacuum_heap(onerel, vacrelstats); > vacrelstats->num_index_scans++; > + scanned_index_pages = 0; > > I guess num_index_scans could b
Re: [HACKERS] innocuous: pgbench does FD_ISSET on invalid socket
Hello Alvaro, Any objections to changing it like this? I'd probably backpatch to 9.5, but no further (even though this pattern actually appears all the way back.) My 0.02€: if the pattern is repeated, maybe a function which incorporates the check would save lines and improve overall readability? ... = safePQsocket(...); -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [WIP] ALTER ... OWNER TO ... CASCADE
Hi hackers, Recently I've been working on a CASCADE option for ALTER ... OWNER TO statement. Although it's still a working prototype, I think it's time to share my work. Introduction As of now there's no way to transfer the ownership of an object and all its dependent objects in one step. One has to manually alter the owner of each object, be it a table, a schema or something else. This patch adds the 'CASCADE' option to every 'ALTER X OWNER TO' statement, including the 'ALTER DATABASE db OWNER TO user CASCADE' which turns out to be a delicate matter. Implementation == There are two functions that process 'ALTER ... OWNER' statement: ExecAlterOwnerStmt() and ATExecCmd(). The latter function deals with the tasks that refer to all kinds of relations, while the first one handles the remaining object types. Basically, all I had to do is to add 'cascade' flag to the corresponding parsenodes and to make these functions call the dependency tree walker function (which would change the ownership of the dependent objects if needed). Of course, there are various corner cases for each kind of objects that require special treatment, but the code speaks for itself. The aforementioned 'ALTER DATABASE db ...' is handled in a special way. Since objects that don't belong to the 'current database' are hidden, it is impossible to change their owner directly, so we have to do the job in the background worker that is connected to the 'db'. I'm not sure if this is the best solution available, but anyway. What works == Actually, it seems to work in simple cases like 'a table with its inheritors' or 'a schema full of tables', but of course there might be things I've overlooked. There are some regression tests, though, and I'll continue to write some more. What's dubious == It is unclear what kinds of objects should be transferred in case of database ownership change, since there's no way to get the full list of objects that depend on a given database. Currently the code changes ownership of all schemas (excluding the 'information_schema' and some others) and their contents, but this is a temporary limitation. Feedback is welcome! -- Dmitry Ivanov Postgres Professional: http://www.postgrespro.com Russian Postgres Companydiff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index c48e37b..54be671 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -21,6 +21,7 @@ #include "catalog/index.h" #include "catalog/objectaccess.h" #include "catalog/pg_amop.h" +#include "catalog/pg_aggregate.h" #include "catalog/pg_amproc.h" #include "catalog/pg_attrdef.h" #include "catalog/pg_authid.h" @@ -40,6 +41,7 @@ #include "catalog/pg_foreign_server.h" #include "catalog/pg_language.h" #include "catalog/pg_largeobject.h" +#include "catalog/pg_largeobject_metadata.h" #include "catalog/pg_namespace.h" #include "catalog/pg_opclass.h" #include "catalog/pg_operator.h" @@ -56,6 +58,7 @@ #include "catalog/pg_ts_template.h" #include "catalog/pg_type.h" #include "catalog/pg_user_mapping.h" +#include "commands/alter.h" #include "commands/comment.h" #include "commands/defrem.h" #include "commands/event_trigger.h" @@ -66,6 +69,7 @@ #include "commands/seclabel.h" #include "commands/trigger.h" #include "commands/typecmds.h" +#include "commands/tablecmds.h" #include "nodes/nodeFuncs.h" #include "parser/parsetree.h" #include "rewrite/rewriteRemove.h" @@ -77,17 +81,6 @@ #include "utils/tqual.h" -/* - * Deletion processing requires additional state for each ObjectAddress that - * it's planning to delete. For simplicity and code-sharing we make the - * ObjectAddresses code support arrays with or without this extra state. - */ -typedef struct -{ - int flags; /* bitmask, see bit definitions below */ - ObjectAddress dependee; /* object whose deletion forced this one */ -} ObjectAddressExtra; - /* ObjectAddressExtra flag bits */ #define DEPFLAG_ORIGINAL 0x0001 /* an original deletion target */ #define DEPFLAG_NORMAL 0x0002 /* reached via normal dependency */ @@ -97,17 +90,6 @@ typedef struct #define DEPFLAG_REVERSE 0x0020 /* reverse internal/extension link */ -/* expansible list of ObjectAddresses */ -struct ObjectAddresses -{ - ObjectAddress *refs; /* => palloc'd array */ - ObjectAddressExtra *extras; /* => palloc'd array, or NULL if not used */ - int numrefs; /* current number of references */ - int maxrefs; /* current size of palloc'd array(s) */ -}; - -/* typedef ObjectAddresses appears in dependency.h */ - /* threaded list of ObjectAddresses, for recursion detection */ typedef struct ObjectAddressStack { @@ -164,12 +146,21 @@ static const Oid object_classes[] = { }; +/* + * We limit the number of dependencies reported to the client to + * MAX_REPORTED_DEPS, since client software may not deal well with + * enormous error strings. The server log always gets a ful
Re: [HACKERS] extend pgbench expressions with functions
Hello Michaël, + * Recursive evaluation of int or double expressions + * + * Note that currently only integer variables are available, with values + * stored as text. This comment is incorrect, we only care about integers in this patch. Indeed! Taking patch 1 as a completely independent thing, there is no need to introduce PgBenchValueType yet. Similar remark for setIntValue and coerceToInt. They are useful afterwards when introducing double types to be able to handle double input parameters for the gaussian and other functions. Yes. This is exactly the pain I'm trying to avoid, creating a different implementation for the first patch, which is just overriden when the second part is applied... So I'm trying to compromise, having a several part patch *but* having the infrastructure ready for the second patch which adds the double type. Note that the first patch without the second is a loss of time for everyone, as the nearly only useful functions are the randoms, which require a double argument, so it does not make sense to apply the first one if the second one is not to be applied, I think. [...] (INT64_MIN / -1) should error. (INT64_MIN % -1) should result in 0. This is missing the division handling. Oops, indeed I totally messed up when merging the handling of / and %:-( I have found another issue in the (a) patch: the internal scripts were using the future random function which do not yet exist, as they are in patch (b). Here is a three part v30, which still includes the infrastructure for future types in the a patch, see my argumentation above. -- Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index ade1b53..9d5eb32 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -798,8 +798,11 @@ pgbench options dbname The expression may contain integer constants such as 5432, references to variables :variablename, and expressions composed of unary (-) or binary operators - (+, -, *, /, %) - with their usual associativity, and parentheses. + (+, -, *, /, + %) with their usual associativity, function calls and + parentheses. + shows the available + functions. @@ -965,6 +968,52 @@ f(x) = exp(-parameter * (x - min) / (max - min + 1)) / (1.0 - exp(-parameter)) + + +PgBench Functions + + + + Function + Return Type + Description + Example + Result + + + + + abs(a) + same as a + integer value + abs(-17) + 17 + + + debug(a) + same asa + print to stderr the given argument + debug(5432) + 5432 + + + max(i [, ... ] ) + integer + maximum value + max(5, 4, 3, 2) + 5 + + + min(i [, ... ] ) + integer + minimum value + min(5, 4, 3, 2) + 2 + + + + + As an example, the full definition of the built-in TPC-B-like transaction is: diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y index 06ee04b..93c6173 100644 --- a/src/bin/pgbench/exprparse.y +++ b/src/bin/pgbench/exprparse.y @@ -16,10 +16,13 @@ PgBenchExpr *expr_parse_result; +static PgBenchExprList *make_elist(PgBenchExpr *exp, PgBenchExprList *list); static PgBenchExpr *make_integer_constant(int64 ival); static PgBenchExpr *make_variable(char *varname); -static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr, +static PgBenchExpr *make_op(const char *operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr); +static int find_func(const char *fname); +static PgBenchExpr *make_func(const int fnumber, PgBenchExprList *args); %} @@ -31,13 +34,15 @@ static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr, int64 ival; char *str; PgBenchExpr *expr; + PgBenchExprList *elist; } +%type elist %type expr -%type INTEGER -%type VARIABLE +%type INTEGER function +%type VARIABLE FUNCTION -%token INTEGER VARIABLE +%token INTEGER VARIABLE FUNCTION %token CHAR_ERROR /* never used, will raise a syntax error */ /* Precedence: lowest to highest */ @@ -49,16 +54,25 @@ static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr, result: expr{ expr_parse_result = $1; } +elist: { $$ = NULL; } + | expr { $$ = make_elist($1, NULL); } + | elist ',' expr { $$ = make_elist($3, $1); } + ; + expr: '(' expr ')' { $$ = $2; } | '+' expr %prec UMINUS { $$ = $2; } - | '-' expr %prec UMINUS { $$ = make_op('-', make_integer_constant(0), $2); } - | expr '+' expr { $$ = make_op('+', $1, $3); } - | expr '-' expr { $$ = make_op('-', $1, $3); } - | expr '*' expr { $$ = make_op('*', $1, $3); } - | expr '/' expr { $$ = make_op('/', $1, $3); } - | expr '%' expr { $$ = make_op('%', $1, $3); } + | '-' expr %prec UMINUS { $$ = make_op("-", make_integer_constant(0), $2); } + | e
Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW
On 2016/02/15 15:20, Rushabh Lathia wrote: On Fri, Feb 12, 2016 at 5:40 PM, Etsuro Fujita mailto:fujita.ets...@lab.ntt.co.jp>> wrote: As a result of our discussions, we reached a conclusion that the DML pushdown APIs should be provided together with existing APIs such as ExecForeignInsert, ExecForeignUpdate or ExecForeignDelete, IIUC. So, how about (1) leaving the description for the existing APIs as-is and (2) adding a new description for the DML pushdown APIs in parenthesis, like this?: If the IsForeignRelUpdatable pointer is set to NULL, foreign tables are assumed to be insertable, updatable, or deletable if the FDW provides ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete respectively. (If the FDW attempts to optimize a foreign table update, it still needs to provide PlanDMLPushdown, BeginDMLPushdown, IterateDMLPushdown and EndDMLPushdown.) Actually, if the FDW provides the DML pushdown APIs, (pushdown-able) foreign table updates can be done without ExecForeignInsert, ExecForeignUpdate or ExecForeignDelete. So, the above docs are not necessarily correct. But we don't recommend to do that without the existing APIs, so I'm not sure it's worth complicating the docs. Adding a new description for DML pushdown API seems good idea. I would suggest to add that as separate paragraph rather then into brackets. OK, will do. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] custom function for converting human readable sizes to bytes
> On 12/02/16 10:19, Dean Rasheed wrote: >> This seems like a reasonable first patch for me as a committer, so >> I'll take it unless anyone else was planning to do so. > So looking at this, it seems that for the most part pg_size_bytes() will parse any output produced by pg_size_pretty(). The exception is that there are 2 versions of pg_size_pretty(), one that takes bigint and one that takes numeric, whereas pg_size_bytes() returns bigint, so it can't handle all inputs. Is there any reason not to make pg_size_bytes() return numeric? It would still be compatible with the example use cases, but it would be a better inverse of both variants of pg_size_pretty() and would be more future-proof. It already works internally using numeric, so it's a trivial change to make now, but impossible to change in the future without introducing a new function with a different name, which is messy. Thoughts? Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers