Re: [HACKERS] proposal: plpgsql - Assert statement
2015-03-25 0:17 GMT+01:00 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: updated version with Jim Nasby's doc and rebase against last changes in plpgsql. I started looking at this patch. ISTM there are some pretty questionable design decisions in it: 1. Why create a core GUC to control a behavior that's plpgsql-only? I think it'd make more sense for this to be a plgsql custom GUC (ie, plpgsql.enable_asserts or some such name). This type of assertations can be implemented in any PL language - so I prefer global setting. But I have not strong option in this case - this is question about granularity - and more ways are valid. 2. I find the use of errdetail to report whether the assertion condition evaluated to FALSE or to NULL to be pretty useless. (BTW, is considering NULL to be a failure the right thing? SQL CHECK conditions consider NULL to be allowed ...) This is a question - I am happy with SQL CHECK for data, but I am not sure if same behave is safe for plpgsql (procedural) assert. More stricter behave is safer - and some bugs in procedures are based on unhandled NULLs in variables. So in this topic I prefer implemented behave. It is some like: IF expression THEN -- I am able do all ... ELSE RAISE EXCEPTION 'some is wrong'; END IF; I also don't care at all for reporting the internal text of the assertion expression in the errdetail: that will expose implementation details (ie, exactly what does plpgsql convert the user expression to, does it remove comments, etc) which we will then be constrained from changing for fear of breaking client code that expects a particular spelling of the condition. I think we should just drop that whole business. The user can report the condition in her message, if she feels the need to. +1 3. If we drop the errdetail as per #2, then reporting the optional user-supplied string as a HINT would be just plain bizarre; not that it wasn't bizarre already, because there's no good reason to suppose that whatever the programmer has to say about the assertion is merely a hint. I also find the behavior for string-evaluates-to-NULL bizarre; it'd be saner just to leave out the message field, same as if the argument weren't there. I would suggest that we adopt one of these two definitions for the optional string: 3a. If string is present and not null, use it as the primary message text (otherwise use assertion failed). 3b. If string is present and not null, use it as errdetail, with the primary message text always being assertion failed. I mildly prefer #3a, but could be talked into #3b. I prefer #3b - there is more informations. Regards Pavel Comments? regards, tom lane
Re: [HACKERS] Custom/Foreign-Join-APIs
Hello, I had a look on this. At Wed, 25 Mar 2015 03:59:28 +, Kouhei Kaigai kai...@ak.jp.nec.com wrote in 9a28c8860f777e439aa12e8aea7694f8010c6...@bpxm15gp.gisp.nec.co.jp At this moment, I'm not 100% certain about its logic. Especially, I didn't test SEMI- and ANTI- join cases yet. However, time is money - I want people to check overall design first, rather than detailed debugging. Please tell me if I misunderstood the logic to break down join relations. With applying your patch, regression tests of “updatable view” failed. regression.diff contains some errors like this: ! ERROR: could not find RelOptInfo for given relids Could you check that? It is a bug around the logic to find out two RelOptInfo that can construct another RelOptInfo of joinrel. It is caused by split (or multilevel) joinlist. Setting join_collapse_limit to 10 makes the query to go well. I suppose that get_joinrel_broken_down should give up returning result when given joinrel spans over multiple join subproblems, becuase they cannot be merged by FDW anyway even if they comformed the basic requirements for merging. Even though I'm now working to correct the logic, it is not obvious to identify two relids that satisfy joinrel-relids. (Yep, law of entropy enhancement...) On the other hands, we may have a solution that does not need a complicated reconstruction process. The original concern was, FDW driver may add paths that will replace entire join subtree by foreign-scan on remote join multiple times, repeatedly, but these paths shall be identical. If we put a hook for FDW/CSP on bottom of build_join_rel(), we may be able to solve the problem more straight-forward and simply way. Because build_join_rel() finds a cache on root-join_rel_hash then returns immediately if required joinrelids already has its RelOptInfo, bottom of this function never called twice on a particular set of joinrelids. Once FDW/CSP constructs a path that replaces entire join subtree towards the joinrel just after construction, it shall be kept until cheaper built-in paths are added (if exists). This idea has one other positive side-effect. We expect remote-join is cheaper than local join with two remote scan in most cases. Once a much cheaper path is added prior to local join consideration, add_path_precheck() breaks path consideration earlier. +1 as a whole. regards, -- 堀口恭太郎 日本電信電話株式会社 NTTオープンソースソフトウェアセンタ Phone: 03-5860-5115 / Fax: 03-5463-5490 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] printing table in asciidoc with psql
On Wed, Mar 25, 2015 at 02:18:58PM +0900, Michael Paquier wrote: [options=header,cols=l,l,frame=none] | |5 2.2+^.^ |4 2.2+^.^ |2 2.2+^.^ |3 2.2+^.^ | Hm. This is still incorrect. You should remove options=header here or the first tuple is treated as a header in the case non-expanded/tuple-only. Your patch removes correctly the header for the expanded/tuple-only case though. Regards, OK, fixed. Thanks for the testing. Patch attached. New output: --- test= \pset format asciidoc Output format is asciidoc. test= \t Tuples only is on. test= table 5 2.2+^.^ ; [cols=l,l,frame=none] | |5 2.2+^.^ |4 2.2+^.^ |2 2.2+^.^ |3 2.2+^.^ | -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml new file mode 100644 index a637001..82a91ec *** a/doc/src/sgml/ref/psql-ref.sgml --- b/doc/src/sgml/ref/psql-ref.sgml *** lo_import 152801 *** 2092,2099 literalaligned/literal, literalwrapped/literal, literalhtml/literal, literallatex/literal (uses literaltabular/literal), ! literallatex-longtable/literal, or ! literaltroff-ms/literal. Unique abbreviations are allowed. (That would mean one letter is enough.) /para --- 2092,2099 literalaligned/literal, literalwrapped/literal, literalhtml/literal, literallatex/literal (uses literaltabular/literal), ! literallatex-longtable/literal, ! literaltroff-ms/literal, or literalasciidoc/literal. Unique abbreviations are allowed. (That would mean one letter is enough.) /para *** lo_import 152801 *** 2120,2126 para The literalhtml/, literallatex/, ! literallatex-longtable/literal, and literaltroff-ms/ formats put out tables that are intended to be included in documents using the respective mark-up language. They are not complete documents! This might not be --- 2120,2127 para The literalhtml/, literallatex/, ! literallatex-longtable/literal, literaltroff-ms/, ! and literalasciidoc/ formats put out tables that are intended to be included in documents using the respective mark-up language. They are not complete documents! This might not be diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c new file mode 100644 index 7c9f28d..a96f0ef *** a/src/bin/psql/command.c --- b/src/bin/psql/command.c *** _align2string(enum printFormat in) *** 2257,2262 --- 2257,2265 case PRINT_TROFF_MS: return troff-ms; break; + case PRINT_ASCIIDOC: + return asciidoc; + break; } return unknown; } *** do_pset(const char *param, const char *v *** 2330,2338 popt-topt.format = PRINT_LATEX_LONGTABLE; else if (pg_strncasecmp(troff-ms, value, vallen) == 0) popt-topt.format = PRINT_TROFF_MS; else { ! psql_error(\\pset: allowed formats are unaligned, aligned, wrapped, html, latex, troff-ms\n); return false; } --- 2333,2343 popt-topt.format = PRINT_LATEX_LONGTABLE; else if (pg_strncasecmp(troff-ms, value, vallen) == 0) popt-topt.format = PRINT_TROFF_MS; + else if (pg_strncasecmp(asciidoc, value, vallen) == 0) + popt-topt.format = PRINT_ASCIIDOC; else { ! psql_error(\\pset: allowed formats are unaligned, aligned, wrapped, html, latex, troff-ms, asciidoc\n); return false; } diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c new file mode 100644 index ac0dc27..93a517e *** a/src/bin/psql/help.c --- b/src/bin/psql/help.c *** helpVariables(unsigned short int pager) *** 351,357 fprintf(output, _( expanded (or x)toggle expanded output\n)); fprintf(output, _( fieldsep field separator for unaligned output (default '|')\n)); fprintf(output, _( fieldsep_zero set field separator in unaligned mode to zero\n)); ! fprintf(output, _( format set output format [unaligned, aligned, wrapped, html, latex, ..]\n)); fprintf(output, _( footer enable or disable display of the table footer [on, off]\n)); fprintf(output, _( linestyle set the border line drawing style [ascii, old-ascii, unicode]\n)); fprintf(output, _( null set the string to be printed in place of a null value\n)); --- 351,357 fprintf(output, _( expanded (or x)toggle expanded output\n)); fprintf(output, _( fieldsep field separator for unaligned output (default '|')\n)); fprintf(output, _( fieldsep_zero set field
Re: [HACKERS] recovery_target_time ignored ?
On Wed, Mar 25, 2015 at 1:28 AM, David Steele da...@pgmasters.net wrote: On 3/24/15 6:12 AM, Venkata Balaji N wrote: On Tue, Mar 24, 2015 at 9:54 AM, David Steele da...@pgmasters.net mailto:da...@pgmasters.net wrote: On 3/23/15 12:42 AM, Venkata Balaji N wrote: Hi, Assuming that this might require a patch, i am posting this in pgsql-hackers. Apologies, if this is not the appropriate mailing list to start this discussion. I performed a PITR and saw the below message in the log file is a bit confusing. 2015-03-23 13:49:09.816 GMT-10 DB= PID=4707 LOG: *database system was interrupted; last known up at 2015-03-23 10:30:26 GMT-10* 2015-03-23 13:49:09.817 GMT-10 DB= PID=4707 LOG: *starting point-in-time recovery to 2015-03-23 10:00:26+10* 2015-03-23 13:49:09.827 GMT-10 DB= PID=4707 LOG: restored log file 0001000B0020 from archive 2015-03-23 13:49:09.888 GMT-10 DB= PID=4707 LOG: redo starts at B/2090 2015-03-23 13:49:09.937 GMT-10 DB= PID=4707 LOG: consistent recovery state reached at B/20B8 2015-03-23 13:49:09.947 GMT-10 DB= PID=4707 LOG: restored log file 0001000B0021 from archive 2015-03-23 13:49:09.950 GMT-10 DB= PID=4707 LOG: *recovery stopping before commit of transaction 16267, time 2015-03-23 13:22:37.53007+10* By mistake i gave recovery_target_time as 10:00 GMT which is 25/30 minutes behind the backup start/end time registered in the backup_label. The parameter recovery_target_time is ignored and recovery proceeds further applying all the available WAL Archive files finally ends up bringing up the database. I think it would make sense if the recovery does not proceed any further and error out with a message like recovery_target_time is behind the backup time.. please consider using the backup taken prior to the recovery_target_time I just tried it with 9.3.5 and I do get an error: LOG: starting point-in-time recovery to 2015-03-23 17:26:02.721307-04 LOG: restored log file 00010003 from archive LOG: redo starts at 0/3C8 LOG: recovery stopping before commit of transaction 1001, time 2015-03-23 18:26:01.012593-04 LOG: redo done at 0/3000228 FATAL: requested recovery stop point is before consistent recovery point That makes more sense. This is what i was expecting to happen. Then, i think it is something to do with the timestamp format. Here's my recovery.conf file: restore_command = '/usr/bin/pg_backrest.pl http://pg_backrest.pl --stanza=db archive-get %f %p' recovery_target_time = '2015-03-23 17:26:02.721307 EDT' recovery.conf file is as follows : restore_command='cp /data/pgdata9400backup/pgwalarch9400backup/%f %p ' recovery_target_time='2015-03-23 10:00:26 GMT-10' recovery_target_inclusive='true' You have '2015-03-23 10:00:26 GMT-10' in recovery.conf but the log says 'starting point-in-time recovery to 2015-03-23 10:00:26+10'. Note the - vs +. This is my confusion too. I picked up the time format from the backup label. Could you check your log and recovery.conf and make sure the timezone offsets are actually different? I am not sure why the timestamp is taken as 2015-03-23 10:00:26+10 for 2015-03-23 10:00:26 GMT-10'. My another system's timestamp format is also AEDT. I did another test and I get the same problem. Below is the text from the log file. I gave a recovery_target_time almost a day behind the consistent recovery point. Still, the recovery_target_time is taken as *2015-03-23 11:07:10+11* for *2015-03-23 11:07:10 AEDT* 2015-03-24 18:42:44.608 AEDT LOG: database system was interrupted; last known up at *2015-03-24 18:20:53 AEDT* 2015-03-24 18:42:44.608 AEDT LOG: starting point-in-time recovery to *2015-03-23 11:07:10+11* cp: /disk3/pgwalarch9401/0001000300FE: No such file or directory 2015-03-24 18:42:44.626 AEDT LOG: record with zero length at 3/FE90 Below is my recovery.conf file restore_command='cp /disk3/pgwalarch9401/%f %p' recovery_target_time='*2015-03-23 11:07:10 AEDT*' recovery_target_inclusive=true I am checking if this has something to do with my system timestamp format. Not sure what am i missing. Do i need to give any special time format ? This is a weird one. I can reproduce the problem by setting timezone=timezone='Australia/Sydney'. I also tried setting log_timezone=timezone='Australia/Sydney' and my system clock to 'Australia/Sydney' but I still saw the same issue. I'm testing this using unit tests for some backup software I'm
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
2015/03/25 12:59、Kouhei Kaigai kai...@ak.jp.nec.com のメール: At this moment, I'm not 100% certain about its logic. Especially, I didn't test SEMI- and ANTI- join cases yet. However, time is money - I want people to check overall design first, rather than detailed debugging. Please tell me if I misunderstood the logic to break down join relations. With applying your patch, regression tests of “updatable view” failed. regression.diff contains some errors like this: ! ERROR: could not find RelOptInfo for given relids Could you check that? It is a bug around the logic to find out two RelOptInfo that can construct another RelOptInfo of joinrel. Even though I'm now working to correct the logic, it is not obvious to identify two relids that satisfy joinrel-relids. (Yep, law of entropy enhancement…) IIUC, this problem is in only non-INNER JOINs because we can treat relations joined with only INNER JOIN in arbitrary order. But supporting OUTER JOINs would be necessary even for the first cut. On the other hands, we may have a solution that does not need a complicated reconstruction process. The original concern was, FDW driver may add paths that will replace entire join subtree by foreign-scan on remote join multiple times, repeatedly, but these paths shall be identical. If we put a hook for FDW/CSP on bottom of build_join_rel(), we may be able to solve the problem more straight-forward and simply way. Because build_join_rel() finds a cache on root-join_rel_hash then returns immediately if required joinrelids already has its RelOptInfo, bottom of this function never called twice on a particular set of joinrelids. Once FDW/CSP constructs a path that replaces entire join subtree towards the joinrel just after construction, it shall be kept until cheaper built-in paths are added (if exists). This idea has one other positive side-effect. We expect remote-join is cheaper than local join with two remote scan in most cases. Once a much cheaper path is added prior to local join consideration, add_path_precheck() breaks path consideration earlier. Please comment on. Or bottom of make_join_rel(). IMO build_join_rel() is responsible for just building (or searching from a list) a RelOptInfo for given relids. After that make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per join type to generate actual Paths implements the join. make_join_rel() is called only once for particular relid combination, and there SpecialJoinInfo and restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising for FDW cases. Though I’m not sure that it also fits custom join provider’s requirements. — Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
On Wed, Mar 25, 2015 at 3:14 PM, Shigeru HANADA shigeru.han...@gmail.com wrote: Or bottom of make_join_rel(). IMO build_join_rel() is responsible for just building (or searching from a list) a RelOptInfo for given relids. After that make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per join type to generate actual Paths implements the join. make_join_rel() is called only once for particular relid combination, and there SpecialJoinInfo and restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising for FDW cases. I like that idea, but I think we will have complex hook signature, it won't remain as simple as hook (root, joinrel). -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
On Wed, Mar 25, 2015 at 3:14 PM, Shigeru HANADA shigeru.han...@gmail.com wrote: Or bottom of make_join_rel(). IMO build_join_rel() is responsible for just building (or searching from a list) a RelOptInfo for given relids. After that make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per join type to generate actual Paths implements the join. make_join_rel() is called only once for particular relid combination, and there SpecialJoinInfo and restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising for FDW cases. I like that idea, but I think we will have complex hook signature, it won't remain as simple as hook (root, joinrel). In this case, GetForeignJoinPaths() will take root, joinrel, rel1, rel2, sjinfo and restrictlist. It is not too simple, but not complicated signature. Even if we reconstruct rel1 and rel2 using sjinfo, we also need to compute restrictlist using build_joinrel_restrictlist() again. It is a static function in relnode.c. So, I don't think either of them has definitive advantage from the standpoint of simplicity. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On 20 March 2015 17:37, Amit Kapila Wrote: So the patches have to be applied in below sequence: HEAD Commit-id : 8d1f2390 parallel-mode-v8.1.patch [2] assess-parallel-safety-v4.patch [1] parallel-heap-scan.patch [3] parallel_seqscan_v11.patch (Attached with this mail) While I was going through this patch, I observed one invalid ASSERT in the function “ExecInitFunnel” i.e. Assert(outerPlan(node) == NULL); Outer node of Funnel node is always non-NULL and currently it will be PartialSeqScan Node. May be ASSERT is disabled while building the code because of which this issue has not yet been observed. Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] Parallel Seq Scan
On Wed, Mar 25, 2015 at 3:47 PM, Rajeev rastogi rajeev.rast...@huawei.com wrote: On 20 March 2015 17:37, Amit Kapila Wrote: So the patches have to be applied in below sequence: HEAD Commit-id : 8d1f2390 parallel-mode-v8.1.patch [2] assess-parallel-safety-v4.patch [1] parallel-heap-scan.patch [3] parallel_seqscan_v11.patch (Attached with this mail) While I was going through this patch, I observed one invalid ASSERT in the function “ExecInitFunnel” i.e. Assert(outerPlan(node) == NULL); Outer node of Funnel node is always non-NULL and currently it will be PartialSeqScan Node. Which version of patch you are looking at? I am seeing below code in ExecInitFunnel() in Version-11 to which you have replied. + /* Funnel node doesn't have innerPlan node. */ + Assert(innerPlan(node) == NULL); With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] possible dsm bug in dsm_attach()
On Tue, May 6, 2014 at 11:15 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, May 6, 2014 at 1:14 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-05-06 08:48:57 -0400, Robert Haas wrote: On Tue, May 6, 2014 at 8:43 AM, Andres Freund and...@2ndquadrant.com wrote: The break because of refcnt == 1 doesn't generally seem to be a good idea. Why are we bailing if there's *any* segment that's in the process of being removed? I think the check should be there *after* the dsm_control-item[i].handle == seg-handle check? You are correct. Good catch. Fix attached. Committed, thanks. dsm_create(Size size, int flags) { .. /* Lock the control segment so we can register the new segment. */ LWLockAcquire(DynamicSharedMemoryControlLock, LW_EXCLUSIVE); .. /* Verify that we can support an additional mapping. */ if (nitems = dsm_control-maxitems) { if ((flags DSM_CREATE_NULL_IF_MAXSEGMENTS) != 0) { dsm_impl_op(DSM_OP_DESTROY, seg-handle, 0, seg-impl_private, seg-mapped_address, seg-mapped_size, WARNING); if (seg-resowner != NULL) ResourceOwnerForgetDSM(seg-resowner, seg); dlist_delete(seg-node); pfree(seg); return NULL; } .. } Is there a reason lock is not released in case we return NULL in above code? I am facing an issue in case we need to create many segments for large inheritance hierarchy. Attached patch fixes the problem for me. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com release_lock_dsm_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
2015/03/25 19:09、Kouhei Kaigai kai...@ak.jp.nec.com のメール: On Wed, Mar 25, 2015 at 3:14 PM, Shigeru HANADA shigeru.han...@gmail.com wrote: Or bottom of make_join_rel(). IMO build_join_rel() is responsible for just building (or searching from a list) a RelOptInfo for given relids. After that make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per join type to generate actual Paths implements the join. make_join_rel() is called only once for particular relid combination, and there SpecialJoinInfo and restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising for FDW cases. I like that idea, but I think we will have complex hook signature, it won't remain as simple as hook (root, joinrel). In this case, GetForeignJoinPaths() will take root, joinrel, rel1, rel2, sjinfo and restrictlist. It is not too simple, but not complicated signature. Even if we reconstruct rel1 and rel2 using sjinfo, we also need to compute restrictlist using build_joinrel_restrictlist() again. It is a static function in relnode.c. So, I don't think either of them has definitive advantage from the standpoint of simplicity. The bottom of make_join_rel() seems good from the viewpoint of information, but it is called multiple times for join combinations which are essentially identical, for INNER JOIN case like this: fdw=# explain select * from pgbench_branches b join pgbench_tellers t on t.bid = b.bid join pgbench_accounts a on a.bid = b.bid and a.bid = t.bid; INFO: postgresGetForeignJoinPaths() 1x2 INFO: postgresGetForeignJoinPaths() 1x4 INFO: postgresGetForeignJoinPaths() 2x4 INFO: standard_join_search() old hook point INFO: standard_join_search() old hook point INFO: standard_join_search() old hook point INFO: postgresGetForeignJoinPaths() 0x4 INFO: postgresGetForeignJoinPaths() 0x2 INFO: postgresGetForeignJoinPaths() 0x1 INFO: standard_join_search() old hook point QUERY PLAN - Foreign Scan (cost=100.00..102.11 rows=211 width=1068) (1 row) Here I’ve put probe point in the beginnig of GetForeignJoinPaths handler and just before set_cheapest() call in standard_join_search() as “old hook point”. In this example 1, 2, and 4 are base relations, and in the join level 3 planner calls GetForeignJoinPaths() three times for the combinations: 1) (1x2)x4 2) (1x4)x2 3) (2x4)x1 Tom’s suggestion is aiming at providing a chance to consider join push-down in more abstract level, IIUC. So it would be good to call handler only once for that case, for flattened combination (1x2x3). Hum, how about skipping calling handler (or hook) if the joinrel was found by find_join_rel()? At least it suppress redundant call for different join orders, and handler can determine whether the combination can be flattened by checking that all RelOptInfo with RELOPT_JOINREL under joinrel has JOIN_INNER as jointype. — Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] assessing parallel-safety
On Sun, Mar 22, 2015 at 12:00 AM, Thom Brown t...@linux.com wrote: On 21 March 2015 at 14:28, Amit Kapila amit.kapil...@gmail.com wrote: On Fri, Mar 20, 2015 at 7:54 PM, Thom Brown t...@linux.com wrote: createdb pgbench pgbench -i -s 200 pgbench CREATE TABLE pgbench_accounts_1 (CHECK (bid = 1)) INHERITS (pgbench_accounts); ... CREATE TABLE pgbench_accounts_200 (CHECK (bid = 200)) INHERITS (pgbench_accounts); I managed to reproduce the Assertion reported by you as: #2 0x007a053a in ExceptionalCondition (conditionName=conditionName@entry=0x813a4b !(IsInParallelMode()), errorType=errorType@entry=0x7da1d6 FailedAssertion, fileName=fileName@entry=0x81397d parallel.c, lineNumber=lineNumber@entry=123) at assert.c:54 #3 0x004cd5ba in CreateParallelContext (entrypoint=entrypoint@entry=0x659d2c ParallelQueryMain, nworkers=nworkers@entry=8) at parallel.c:123 The reason is that CreateParallelContext() expects to be called in ParallelMode and we enter into parallel-mode after InitPlan() in standard_ExecutorStart(). So the probable fix could be to EnterParallelMode before initializing the plan. I still could not reproduce the crash you have reported as: #0 0x00770843 in pfree () #1 0x005a382f in ExecEndFunnel () #2 0x0059fe75 in ExecEndAppend () #3 0x005920bd in standard_ExecutorEnd () Could you let me know which all patches you have tried and on top of which commit. I am trying on the commit as mentioned in mail[1]. Basically have you tried the versions mentioned in that mail: HEAD Commit-id : 8d1f2390 parallel-mode-v8.1.patch [2] assess-parallel-safety-v4.patch [1] parallel-heap-scan.patch [3] parallel_seqscan_v11.patch (Attached with this mail) If something else, could you let me know the same so that I can try that to reproduce the issue reported by you. Looks like one of the patches I applied is newer than the one in your list: Okay, then that was the reason why you were seeing the crash whereas I could not reproduce, however I have integrated the patch with the v-9 of parallel-mode patch and posted it on appropriate thread. One point to note is that I think there is one small issue in one of the latest commits [1]. After that is fixed you can once verify with latest patch. [1] - http://www.postgresql.org/message-id/caa4ek1+nwuj9ik61ygfzbcn85dqunevd38_h1zngcdzrglg...@mail.gmail.com With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
2015/03/25 12:59、Kouhei Kaigai kai...@ak.jp.nec.com のメール: At this moment, I'm not 100% certain about its logic. Especially, I didn't test SEMI- and ANTI- join cases yet. However, time is money - I want people to check overall design first, rather than detailed debugging. Please tell me if I misunderstood the logic to break down join relations. With applying your patch, regression tests of “updatable view” failed. regression.diff contains some errors like this: ! ERROR: could not find RelOptInfo for given relids Could you check that? It is a bug around the logic to find out two RelOptInfo that can construct another RelOptInfo of joinrel. Even though I'm now working to correct the logic, it is not obvious to identify two relids that satisfy joinrel-relids. (Yep, law of entropy enhancement…) IIUC, this problem is in only non-INNER JOINs because we can treat relations joined with only INNER JOIN in arbitrary order. But supporting OUTER JOINs would be necessary even for the first cut. Yep. In case when joinrel contains all inner-joined relations managed by same FDW driver, job of get_joinrel_broken_down() is quite simple. However, people want to support outer-join also, doesn't it? On the other hands, we may have a solution that does not need a complicated reconstruction process. The original concern was, FDW driver may add paths that will replace entire join subtree by foreign-scan on remote join multiple times, repeatedly, but these paths shall be identical. If we put a hook for FDW/CSP on bottom of build_join_rel(), we may be able to solve the problem more straight-forward and simply way. Because build_join_rel() finds a cache on root-join_rel_hash then returns immediately if required joinrelids already has its RelOptInfo, bottom of this function never called twice on a particular set of joinrelids. Once FDW/CSP constructs a path that replaces entire join subtree towards the joinrel just after construction, it shall be kept until cheaper built-in paths are added (if exists). This idea has one other positive side-effect. We expect remote-join is cheaper than local join with two remote scan in most cases. Once a much cheaper path is added prior to local join consideration, add_path_precheck() breaks path consideration earlier. Please comment on. Or bottom of make_join_rel(). IMO build_join_rel() is responsible for just building (or searching from a list) a RelOptInfo for given relids. After that make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per join type to generate actual Paths implements the join. make_join_rel() is called only once for particular relid combination, and there SpecialJoinInfo and restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising for FDW cases. As long as caller can know whether build_join_rel() actually construct a new RelOptInfo object, or not, I think it makes more sense than putting a hook within make_join_rel(). Though I’m not sure that it also fits custom join provider’s requirements. Join replaced by CSP has two scenarios. First one implements just an alternative logic of built-in join, will takes underlying inner/outer node, so its hook is located on add_paths_to_joinrel() as like built-in join logics. Second one tries to replace entire join sub-tree by materialized view (for example), like FDW remote join cases. So, it has to be hooked nearby the location of GetForeignJoinPaths(). In case of the second scenario, CSP does not have private field in RelOptInfo, so it may not obvious to check whether the given joinrel exactly matches with a particular materialized-view or other caches. At this moment, what I'm interested in is the first scenario, so priority of the second case is not significant for me, at least. Thanks. -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] assessing parallel-safety
On Fri, Mar 20, 2015 at 11:37 PM, Robert Haas robertmh...@gmail.com wrote: That might be a different crash than the first one you showed. But it looks like the problem here is that the parallel sequential scan patch is calling CreateParallelContext even though this is just an EXPLAIN and we're not actually running the query. It shouldn't do that. (This might be an argument for postponing CreateParallelContext() until run time, as I've suggested before.) Okay, I have postponed the CreateParallelContext() until runtime in the latest patch posted on Parallel Seq Scan thread. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Error with index on unlogged table
On Wednesday, March 25, 2015, Michael Paquier michael.paqu...@gmail.com wrote: On Tue, Mar 24, 2015 at 8:46 PM, Thom Brown wrote: The index is unlogged until reindexing... [...] Which is think also raises the question, why are unlogged indexes made persistent by a reindex? That's a bug of HEAD, ~9.4 keeping the index as unlogged even after REINDEX INDEX. What happens is that ReindexIndex relies on relpersistence provided by makeRangeVar at parse time, which is just incorrect as it uses RELPERSISTENCE_PERMANENT all the time. The patch attached fixes that... How about VACUUM FULL and CLUSTER as the problem seems to have been reported to be there too? Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Abbreviated keys for Numeric
On Tue, Mar 24, 2015 at 12:03 AM, Andrew Gierth and...@tao11.riddles.org.uk wrote: So here's the latest (and, hopefully, last) version: - adds diagnostic output from numeric_abbrev_abort using the trace_sort GUC - fixed Datum cs. uint32 issues in hash_uint32 - added a short comment about excess-k representation - tweaked the indenting and comments a bit You still pointlessly check memtupcount here: + if (memtupcount 1 || nss-input_count 1 || !nss-estimating) + return false; I still don't like this comment: + nss = palloc(sizeof(NumericSortSupport)); + + /* + * palloc a buffer for handling unaligned packed values in addition to + * the support struct + */ + nss-buf = palloc(VARATT_SHORT_MAX + VARHDRSZ + 1); This cast to void is unnecessary: +numeric_fast_cmp(Datum x, Datum y, SortSupport ssup) +{ + Numeric nx = DatumGetNumeric(x); + Numeric ny = DatumGetNumeric(y); + int result; + + (void) ssup; Please try and at least consider my feedback. I don't expect you to do exactly what I ask, but I also don't expect you to blithely ignore it. I'm not particularly committed to any specific way of handling the DEC_DIGITS issue. (I moved away from the transparently skip abbreviations approach of the original because it seemed that reducing #ifdefism in the code was a desirable feature.) Good. The INT64_MIN/MAX changes should be committed fairly soon. (I haven't posted a patch for TRACE_SORT) I wouldn't assume that. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with index on unlogged table
Hi, On 2015-03-25 11:38:30 +0900, Michael Paquier wrote: On Tue, Mar 24, 2015 at 8:46 PM, Thom Brown wrote: The index is unlogged until reindexing... [...] Which is think also raises the question, why are unlogged indexes made persistent by a reindex? That's a bug of HEAD, ~9.4 keeping the index as unlogged even after REINDEX INDEX. What happens is that ReindexIndex relies on relpersistence provided by makeRangeVar at parse time, which is just incorrect as it uses RELPERSISTENCE_PERMANENT all the time. The patch attached fixes that... What the hell? That's somewhat nasty. Nice that it got caught before 9.5 was released. Did you check whether a similar bug was made in other places of 85b506bb? Could you additionally add a regression test to this end? Seems like something worth testing. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] printing table in asciidoc with psql
On Wed, Mar 25, 2015 at 4:59 PM, Bruce Momjian br...@momjian.us wrote: On Wed, Mar 25, 2015 at 02:18:58PM +0900, Michael Paquier wrote: [options=header,cols=l,l,frame=none] | |5 2.2+^.^ |4 2.2+^.^ |2 2.2+^.^ |3 2.2+^.^ | Hm. This is still incorrect. You should remove options=header here or the first tuple is treated as a header in the case non-expanded/tuple-only. Your patch removes correctly the header for the expanded/tuple-only case though. Regards, OK, fixed. Thanks for the testing. Patch attached. New output: This time things look good from my side. I have played with this patch some time, testing some crazy scenarios and I have not found problems. That's cool stuff, thanks! -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On 25 March 2015 at 11:46, Thom Brown t...@linux.com wrote: Still not sure why 8 workers are needed for each partial scan. I would expect 8 workers to be used for 8 separate scans. Perhaps this is just my misunderstanding of how this feature works. Another issue: SELECT * FROM pgbtab *crash* Logs: 2015-03-25 13:17:49 GMT [22823]: [124-1] user=,db=,client= LOG: registering background worker parallel worker for PID 24792 2015-03-25 13:17:49 GMT [22823]: [125-1] user=,db=,client= LOG: registering background worker parallel worker for PID 24792 2015-03-25 13:17:49 GMT [22823]: [126-1] user=,db=,client= LOG: registering background worker parallel worker for PID 24792 2015-03-25 13:17:49 GMT [22823]: [127-1] user=,db=,client= LOG: registering background worker parallel worker for PID 24792 2015-03-25 13:17:49 GMT [22823]: [128-1] user=,db=,client= LOG: registering background worker parallel worker for PID 24792 2015-03-25 13:17:49 GMT [22823]: [129-1] user=,db=,client= LOG: registering background worker parallel worker for PID 24792 2015-03-25 13:17:49 GMT [22823]: [130-1] user=,db=,client= LOG: registering background worker parallel worker for PID 24792 2015-03-25 13:17:49 GMT [22823]: [131-1] user=,db=,client= LOG: registering background worker parallel worker for PID 24792 2015-03-25 13:17:49 GMT [22823]: [132-1] user=,db=,client= LOG: starting background worker process parallel worker for PID 24792 2015-03-25 13:17:49 GMT [22823]: [133-1] user=,db=,client= LOG: starting background worker process parallel worker for PID 24792 2015-03-25 13:17:49 GMT [22823]: [134-1] user=,db=,client= LOG: starting background worker process parallel worker for PID 24792 2015-03-25 13:17:49 GMT [22823]: [135-1] user=,db=,client= LOG: starting background worker process parallel worker for PID 24792 2015-03-25 13:17:49 GMT [22823]: [136-1] user=,db=,client= LOG: starting background worker process parallel worker for PID 24792 2015-03-25 13:17:49 GMT [22823]: [137-1] user=,db=,client= LOG: starting background worker process parallel worker for PID 24792 2015-03-25 13:17:49 GMT [22823]: [138-1] user=,db=,client= LOG: starting background worker process parallel worker for PID 24792 2015-03-25 13:17:49 GMT [22823]: [139-1] user=,db=,client= LOG: starting background worker process parallel worker for PID 24792 2015-03-25 13:17:49 GMT [22823]: [140-1] user=,db=,client= LOG: worker process: parallel worker for PID 24792 (PID 24804) was terminated by signal 11: Segmentation fault 2015-03-25 13:17:49 GMT [22823]: [141-1] user=,db=,client= LOG: terminating any other active server processes 2015-03-25 13:17:49 GMT [24777]: [2-1] user=,db=,client= WARNING: terminating connection because of crash of another server process 2015-03-25 13:17:49 GMT [24777]: [3-1] user=,db=,client= DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2015-03-25 13:17:49 GMT [24777]: [4-1] user=,db=,client= HINT: In a moment you should be able to reconnect to the database and repeat your command. Backtrace: #0 GrantLockLocal (locallock=locallock@entry=0xfbe7f0, owner=owner@entry=0x1046da0) at lock.c:1544 #1 0x0066975c in LockAcquireExtended (locktag=locktag@entry=0x7fffdcb0ea20, lockmode=1, lockmode@entry=error reading variable: Cannot access memory at address 0x7fffdcb0e9f0, sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000', reportMemoryError=reportMemoryError@entry=1 '\001', ) at lock.c:798 #2 0x0066a1c4 in LockAcquire (locktag=locktag@entry=0x7fffdcb0ea20, lockmode=error reading variable: Cannot access memory at address 0x7fffdcb0e9f0, sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000') at lock.c:680 #3 0x00667c48 in LockRelationOid (relid=error reading variable: Cannot access memory at address 0x7fffdcb0e9e8, relid@entry=error reading variable: Cannot access memory at address 0x7fffdcb0ea48, lockmode=error reading variable: Cannot access memory at address 0x7fffdcb0e9f0, lockmode@entry=error reading variable: Cannot access memory at address 0x7fffdcb0ea48) at lmgr.c:94 But the issue seems to produce a different backtrace each time... 2nd backtrace: #0 hash_search_with_hash_value (hashp=0x2a2c370, keyPtr=keyPtr@entry=0x75ad2230, hashvalue=hashvalue@entry=2114233864, action=action@entry=HASH_FIND, foundPtr=foundPtr@entry=0x0) at dynahash.c:918 #1 0x00654d1a in BufTableLookup (tagPtr=tagPtr@entry=0x75ad2230, hashcode=hashcode@entry=2114233864) at buf_table.c:96 #2 0x0065746b in BufferAlloc (foundPtr=0x75ad222f Address 0x75ad222f out of bounds, strategy=0x0, blockNum=error reading variable: Cannot access memory at address 0x75ad2204, forkNum=error reading variable: Cannot access memory at address 0x75ad2208, relpersistence=error reading variable:
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
2015/03/25 19:47、Kouhei Kaigai kai...@ak.jp.nec.com のメール: The reason why FDW handler was called multiple times on your example is, your modified make_join_rel() does not check whether build_join_rel() actually build a new RelOptInfo, or just a cache reference, doesn't it? Yep. After that change calling count looks like this: fdw=# explain select * from pgbench_branches b join pgbench_tellers t on t.bid = b.bid join pgbench_accounts a on a.bid = b.bid and a.bid = t.bid; INFO: postgresGetForeignJoinPaths() 1x2 INFO: postgresGetForeignJoinPaths() 1x4 INFO: postgresGetForeignJoinPaths() 2x4 INFO: standard_join_search() old hook point INFO: standard_join_search() old hook point INFO: standard_join_search() old hook point INFO: postgresGetForeignJoinPaths() 0x4 INFO: standard_join_search() old hook point QUERY PLAN - Foreign Scan (cost=100.00..102.11 rows=211 width=1068) (1 row) fdw=# If so, I'm inclined to your proposition. A new bool *found argument of build_join_rel() makes reduce number of FDW handler call, with keeping reasonable information to build remote- join query. Another idea is to pass “found” as parameter to FDW handler, and let FDW to decide to skip or not. Some of FDWs (and some of CSP?) might want to be conscious of join combination. I think it does not match the concept we stand on. Unlike CSP, FDW intends to replace an entire join sub-tree that is represented with a particular joinrel, regardless of the sequence to construct a joinrel from multiple baserels. So, it is sufficient to call GetForeignJoinPaths() once a joinrel is constructed, isn't it? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with index on unlogged table
On 25 March 2015 at 12:22, Amit Langote amitlangot...@gmail.com wrote: On Wednesday, March 25, 2015, Michael Paquier michael.paqu...@gmail.com wrote: On Tue, Mar 24, 2015 at 8:46 PM, Thom Brown wrote: The index is unlogged until reindexing... [...] Which is think also raises the question, why are unlogged indexes made persistent by a reindex? That's a bug of HEAD, ~9.4 keeping the index as unlogged even after REINDEX INDEX. What happens is that ReindexIndex relies on relpersistence provided by makeRangeVar at parse time, which is just incorrect as it uses RELPERSISTENCE_PERMANENT all the time. The patch attached fixes that... How about VACUUM FULL and CLUSTER as the problem seems to have been reported to be there too? No, those are okay. They actually revert the index back to the same persistence level as the table they're attached to. -- Thom
Re: [HACKERS] Ignoring entries generated by autoconf in code tree
Michael Paquier michael.paqu...@gmail.com writes: When running autoconf from the root tree, autom4te.cache/ is automatically generated. Wouldn't it make sense to add an entry in .gitignore for that? Personally, I don't want such a thing, as then I would tend to forget to remove that cache file. And you do want to remove it. autoconf goes pretty berserk if the cache file hangs around across significant changes to configure.in, such as if you were to switch branches. (Or at least that used to be true --- last time I got burnt by it was quite some time ago, but possibly that's just because I'm careful about removing the cache file.) In short, -1. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)
On 3/25/15 7:46 AM, Sawada Masahiko wrote: On Wed, Mar 25, 2015 at 12:23 PM, David Steele da...@pgmasters.net wrote: On Wed, Mar 25, 2015 at 12:38 AM, David Steele da...@pgmasters.net wrote: 2. OBJECT auditing does not work before adding acl info to pg_class.rel_acl. In following situation, pg_audit can not audit OBJECT log. $ cat postgresql.conf | grep audit shared_preload_libraries = 'pg_audit' pg_audit.role = 'hoge_user' pg_audit.log = 'read, write' $ psql -d postgres -U hoge_user =# create table hoge(col int); =# select * from hoge; LOG: AUDIT: SESSION,3,1,READ,SELECT,,,select * from hoge; OBJECT audit log is not logged here since pg_class.rel_acl is empty yet. (Only logged SESSION log) So after creating another unconcerned role and grant any privilege to that user, OBJECT audit is logged successfully. Yes, object auditing does not work until some grants have been made to the audit role. =# create role bar_user; =# grant select on hoge to bar_user; =# select * from hoge; LOG: AUDIT: SESSION,4,1,READ,SELECT,,,select * from hoge; LOG: AUDIT: OBJECT,4,1,READ,SELECT,TABLE,public.hoge,select * from hoge; The both OBJCET and SESSION log are logged. Looks right to me. If you don't want the session logging then disable pg_audit.log. Session and object logging are completely independent from each other: one or the other, or both, or neither can be enabled at any time. It means that OBJECT log is not logged just after creating table, even if that table is touched by its owner. To write OBJECT log, we need to grant privilege to role at least. right? Exactly. Privileges must be granted to the audit role in order for object auditing to work. The table owner always can touch its table. So does it lead that table owner can get its table information while hiding OBJECT logging? Yes, the table owner would be able to access the table without object logging if grants to that table were not made to the audit role. That would also be true for any other user that had grants on the table. The purpose of object auditing is to allow more fine-grained control and is intended to be used in situations where you only want to audit some things, rather than all things. Logging everything is better done with the session logging. However, object logging does yield more information since it lists every table that was touched by the statement, so there may be cases where you'd like to object log everything. In that case I'd recommend writing a bit of plpgsql code to create the grants. Also I looked into latest patch again. Here are two review comment. 1. typedef struct { int64 statementId; int64 substatementId; Both statementId and substatementId could be negative number. I think these should be uint64 instead. True. I did this because printf formatting for uint64 seems to be vary across platforms. int64 formatting is more standard and still gives more than enough IDs. I could change it back to uint64 if you have a portable way to modify the sprintf at line 507. 2. I got ERROR when executing function uses cursor. 1) create empty table (hoge table) 2) create test function as follows. create function test() returns int as $$ declare cur1 cursor for select * from hoge; tmp int; begin open cur1; fetch cur1 into tmp; return tmp; end$$ language plpgsql ; 3) execute test function (got ERROR) =# select test(); LOG: AUDIT: SESSION,6,1,READ,SELECT,,,selecT test(); LOG: AUDIT: SESSION,6,2,FUNCTION,EXECUTE,FUNCTION,public.test,selecT test(); LOG: AUDIT: SESSION,6,3,READ,SELECT,,,select * from hoge CONTEXT: PL/pgSQL function test() line 6 at OPEN ERROR: pg_audit stack is already empty STATEMENT: selecT test(); It seems like that the item in stack is already freed by deleting pg_audit memory context (in MemoryContextDelete()), before calling stack_pop in dropping of top-level Portal. Good catch, I'll add this to my test cases and work on a fix. I think I see a good way to approach it. -- - David Steele da...@pgmasters.net signature.asc Description: OpenPGP digital signature
Re: [HACKERS] printing table in asciidoc with psql
On Wed, Mar 25, 2015 at 09:37:08PM +0900, Michael Paquier wrote: On Wed, Mar 25, 2015 at 4:59 PM, Bruce Momjian br...@momjian.us wrote: On Wed, Mar 25, 2015 at 02:18:58PM +0900, Michael Paquier wrote: [options=header,cols=l,l,frame=none] | |5 2.2+^.^ |4 2.2+^.^ |2 2.2+^.^ |3 2.2+^.^ | Hm. This is still incorrect. You should remove options=header here or the first tuple is treated as a header in the case non-expanded/tuple-only. Your patch removes correctly the header for the expanded/tuple-only case though. Regards, OK, fixed. Thanks for the testing. Patch attached. New output: This time things look good from my side. I have played with this patch some time, testing some crazy scenarios and I have not found problems. That's cool stuff, thanks! Wow, thanks. I never would have gotten here without your help. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with index on unlogged table
On Wednesday, March 25, 2015, Thom Brown t...@linux.com wrote: On 25 March 2015 at 12:22, Amit Langote amitlangot...@gmail.com javascript:_e(%7B%7D,'cvml','amitlangot...@gmail.com'); wrote: On Wednesday, March 25, 2015, Michael Paquier michael.paqu...@gmail.com javascript:_e(%7B%7D,'cvml','michael.paqu...@gmail.com'); wrote: On Tue, Mar 24, 2015 at 8:46 PM, Thom Brown wrote: The index is unlogged until reindexing... [...] Which is think also raises the question, why are unlogged indexes made persistent by a reindex? That's a bug of HEAD, ~9.4 keeping the index as unlogged even after REINDEX INDEX. What happens is that ReindexIndex relies on relpersistence provided by makeRangeVar at parse time, which is just incorrect as it uses RELPERSISTENCE_PERMANENT all the time. The patch attached fixes that... How about VACUUM FULL and CLUSTER as the problem seems to have been reported to be there too? No, those are okay. They actually revert the index back to the same persistence level as the table they're attached to. Ah, I misread then; sorry about the noise. Amit
[HACKERS] varlena.c hash_any() and hash_uint32() calls require DatumGetUInt32()
Attached patch adds DatumGetUInt32() around the hash_any() and hash_uint32() calls within varlena.c. These should have been in the original abbreviated keys commit. Mea culpa. -- Peter Geoghegan diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index 3edd283..02e9949 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -2084,8 +2084,8 @@ bttext_abbrev_convert(Datum original, SortSupport ssup) * in order to compensate for cases where differences are past * PG_CACHE_LINE_SIZE bytes, so as to limit the overhead of hashing. */ - hash = hash_any((unsigned char *) authoritative_data, - Min(len, PG_CACHE_LINE_SIZE)); + hash = DatumGetUInt32(hash_any((unsigned char *) authoritative_data, + Min(len, PG_CACHE_LINE_SIZE))); if (len PG_CACHE_LINE_SIZE) hash ^= DatumGetUInt32(hash_uint32((uint32) len)); @@ -2100,10 +2100,10 @@ bttext_abbrev_convert(Datum original, SortSupport ssup) lohalf = (uint32) res; hihalf = (uint32) (res 32); - hash = hash_uint32(lohalf ^ hihalf); + hash = DatumGetUInt32(hash_uint32(lohalf ^ hihalf)); } #else /* SIZEOF_DATUM != 8 */ - hash = hash_uint32((uint32) res); + hash = DatumGetUInt32(hash_uint32((uint32) res)); #endif addHyperLogLog(tss-abbr_card, hash); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with index on unlogged table
On Wed, Mar 25, 2015 at 10:53 AM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2015-03-25 11:38:30 +0900, Michael Paquier wrote: On Tue, Mar 24, 2015 at 8:46 PM, Thom Brown wrote: The index is unlogged until reindexing... [...] Which is think also raises the question, why are unlogged indexes made persistent by a reindex? That's a bug of HEAD, ~9.4 keeping the index as unlogged even after REINDEX INDEX. What happens is that ReindexIndex relies on relpersistence provided by makeRangeVar at parse time, which is just incorrect as it uses RELPERSISTENCE_PERMANENT all the time. The patch attached fixes that... What the hell? That's somewhat nasty. Nice that it got caught before 9.5 was released. Unfortunately this is very nasty. Sorry! Did you check whether a similar bug was made in other places of 85b506bb? Could you additionally add a regression test to this end? Seems like something worth testing. I'm checking it and adding some regression tests. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
Re: [HACKERS] Parallel Seq Scan
On Wed, Mar 25, 2015 at 5:16 PM, Thom Brown t...@linux.com wrote: On 25 March 2015 at 10:27, Amit Kapila amit.kapil...@gmail.com wrote: Fixed the reported issue on assess-parallel-safety thread and another bug caught while testing joins and integrated with latest version of parallel-mode patch (parallel-mode-v9 patch). Apart from that I have moved the Initialization of dsm segement from InitNode phase to ExecFunnel() (on first execution) as per suggestion from Robert. The main idea is that as it creates large shared memory segment, so do the work when it is really required. HEAD Commit-Id: 11226e38 parallel-mode-v9.patch [2] assess-parallel-safety-v4.patch [1] parallel-heap-scan.patch [3] parallel_seqscan_v12.patch (Attached with this mail) [1] - http://www.postgresql.org/message-id/ca+tgmobjsuefipok6+i9werugeab3ggjv7jxlx+r6s5syyd...@mail.gmail.com [2] - http://www.postgresql.org/message-id/ca+tgmozfsxzhs6qy4z0786d7iu_abhbvpqfwlthpsvgiecz...@mail.gmail.com [3] - http://www.postgresql.org/message-id/ca+tgmoyjetgeaxuszrona7bdtwzptqexpjntv1gkcavmgsd...@mail.gmail.com Okay, with my pgbench_accounts partitioned into 300, I ran: SELECT DISTINCT bid FROM pgbench_accounts; The query never returns, You seem to be hitting the issue I have pointed in near-by thread [1] and I have mentioned the same while replying on assess-parallel-safety thread. Can you check after applying the patch in mail [1] and I also get this: grep -r 'starting background worker process parallel worker for PID 12165' postgresql-2015-03-25_112522.log | wc -l 2496 2,496 workers? This is with parallel_seqscan_degree set to 8. If I set it to 2, this number goes down to 626, and with 16, goes up to 4320. .. Still not sure why 8 workers are needed for each partial scan. I would expect 8 workers to be used for 8 separate scans. Perhaps this is just my misunderstanding of how this feature works. The reason is that for each table scan, it tries to use workers equal to parallel_seqscan_degree if they are available and in this case as the scan for inheritance hierarchy (tables in hierarchy) happens one after another, it uses 8 workers for each scan. I think as of now the strategy to decide number of workers to be used in scan is kept simple and in future we can try to come with some better mechanism to decide number of workers. [1] - http://www.postgresql.org/message-id/caa4ek1+nwuj9ik61ygfzbcn85dqunevd38_h1zngcdzrglg...@mail.gmail.com With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Error with index on unlogged table
On Wed, Mar 25, 2015 at 12:46 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Wed, Mar 25, 2015 at 10:53 AM, Andres Freund and...@2ndquadrant.com wrote: Did you check whether a similar bug was made in other places of 85b506bb? Could you additionally add a regression test to this end? Seems like something worth testing. I'm checking it and adding some regression tests. I didn't found any other similar bug introduced by 85b506bb. Attached the original patch provided by Michael with some regression tests. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 1c1d0da..1520d32 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -1685,6 +1685,8 @@ ReindexIndex(RangeVar *indexRelation) { Oid indOid; Oid heapOid = InvalidOid; + Relation irel; + char relpersistence; /* lock level used here should match index lock reindex_index() */ indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock, @@ -1692,7 +1694,11 @@ ReindexIndex(RangeVar *indexRelation) RangeVarCallbackForReindexIndex, (void *) heapOid); - reindex_index(indOid, false, indexRelation-relpersistence); + irel = index_open(indOid, AccessExclusiveLock); + relpersistence = irel-rd_rel-relpersistence; + index_close(irel, NoLock); + + reindex_index(indOid, false, relpersistence); return indOid; } diff --git a/src/test/regress/expected/create_table.out b/src/test/regress/expected/create_table.out index 34b5fc1..3214c19 100644 --- a/src/test/regress/expected/create_table.out +++ b/src/test/regress/expected/create_table.out @@ -208,6 +208,21 @@ CREATE TABLE IF NOT EXISTS test_tsvector( ); NOTICE: relation test_tsvector already exists, skipping CREATE UNLOGGED TABLE unlogged1 (a int primary key); -- OK +SELECT relname, relkind, relpersistence FROM pg_class WHERE relname ~ '^unlogged1' ORDER BY relname; +relname | relkind | relpersistence ++-+ + unlogged1 | r | u + unlogged1_pkey | i | u +(2 rows) + +REINDEX INDEX unlogged1_pkey; +SELECT relname, relkind, relpersistence FROM pg_class WHERE relname ~ '^unlogged1' ORDER BY relname; +relname | relkind | relpersistence ++-+ + unlogged1 | r | u + unlogged1_pkey | i | u +(2 rows) + INSERT INTO unlogged1 VALUES (42); CREATE UNLOGGED TABLE public.unlogged2 (a int primary key); -- also OK CREATE UNLOGGED TABLE pg_temp.unlogged3 (a int primary key); -- not OK diff --git a/src/test/regress/sql/create_table.sql b/src/test/regress/sql/create_table.sql index 08029a9..acb7eb8 100644 --- a/src/test/regress/sql/create_table.sql +++ b/src/test/regress/sql/create_table.sql @@ -246,6 +246,9 @@ CREATE TABLE IF NOT EXISTS test_tsvector( ); CREATE UNLOGGED TABLE unlogged1 (a int primary key); -- OK +SELECT relname, relkind, relpersistence FROM pg_class WHERE relname ~ '^unlogged1' ORDER BY relname; +REINDEX INDEX unlogged1_pkey; +SELECT relname, relkind, relpersistence FROM pg_class WHERE relname ~ '^unlogged1' ORDER BY relname; INSERT INTO unlogged1 VALUES (42); CREATE UNLOGGED TABLE public.unlogged2 (a int primary key); -- also OK CREATE UNLOGGED TABLE pg_temp.unlogged3 (a int primary key); -- not OK -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What exactly is our CRC algorithm?
On 02/12/2015 09:26 PM, Heikki Linnakangas wrote: On 02/11/2015 04:20 PM, Abhijit Menon-Sen wrote: At 2015-02-11 13:20:29 +0200, hlinnakan...@vmware.com wrote: I don't follow. I didn't change configure at all, compared to your patch. OK, I extrapolated a little too much. Your patch didn't actually include crc_instructions.h; Oh, I'm sorry. Here's the complete patch with crc_instructions.h I was just about to commit the attached, which is the same as the previous patch with just cosmetic comment changes, but then I realized that this probably doesn't compile with Visual Studio 2005 or older. The code does #ifdef _MSC_VER, and then uses the _mm_crc32_u64 intrinsic, but that intrinsic was added in Visual Studio 2008. I think we'll need a version check there. Or better yet, a direct configure test to check if the intrinsic exists - that way we get to also use it on Intel compilers, which I believe also has the same intrinsics. You want to write that or should I? How do you like this latest version of the patch otherwise? You had some criticism earlier, but I had forgotten to include the crc_instructions.h header file in that earlier version. - Heikki From f934cb017ad0270ded73feb4d3279e81a58a4149 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas heikki.linnakangas@iki.fi Date: Wed, 25 Mar 2015 18:44:07 +0200 Subject: [PATCH v2 1/1] Use Intel SSE4.2 CRC instructions where available. On x86, perform a runtime check to see if we're running on a CPU that supports SSE 4.2. If we are, we can use the special crc32b and crc32q instructions for the CRC-32C calculations. That greatly speeds up CRC calculation. Abhijit Menon-Sen, reviewed by Andres Freund and me. --- configure | 2 +- configure.in| 2 +- src/common/pg_crc.c | 113 +++ src/include/common/pg_crc.h | 20 -- src/include/pg_config.h.in | 3 + src/include/port/crc_instructions.h | 128 6 files changed, 248 insertions(+), 20 deletions(-) create mode 100644 src/include/port/crc_instructions.h diff --git a/configure b/configure index 2c9b3a7..87ceb0b 100755 --- a/configure +++ b/configure @@ -9204,7 +9204,7 @@ fi done -for ac_header in atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h +for ac_header in atomic.h cpuid.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h do : as_ac_Header=`$as_echo ac_cv_header_$ac_header | $as_tr_sh` ac_fn_c_check_header_mongrel $LINENO $ac_header $as_ac_Header $ac_includes_default diff --git a/configure.in b/configure.in index b2c1ce7..bf604ea 100644 --- a/configure.in +++ b/configure.in @@ -1032,7 +1032,7 @@ AC_SUBST(UUID_LIBS) ## dnl sys/socket.h is required by AC_FUNC_ACCEPT_ARGTYPES -AC_CHECK_HEADERS([atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h]) +AC_CHECK_HEADERS([atomic.h cpuid.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h]) # On BSD, test for net/if.h will fail unless sys/socket.h # is included first. diff --git a/src/common/pg_crc.c b/src/common/pg_crc.c index eba32d3..6675ae7 100644 --- a/src/common/pg_crc.c +++ b/src/common/pg_crc.c @@ -21,25 +21,113 @@ #include common/pg_crc.h -/* Accumulate one input byte */ -#ifdef WORDS_BIGENDIAN -#define CRC8(x) pg_crc32c_table[0][((crc 24) ^ (x)) 0xFF] ^ (crc 8) +#ifdef PG_HAVE_CRC32C_INSTRUCTIONS +static pg_crc32 pg_comp_crc32c_hw(pg_crc32 crc, const void *data, size_t len); +#endif + +#if !defined(PG_HAVE_CRC32C_INSTRUCTIONS) || defined(PG_CRC32C_INSTRUCTIONS_NEED_RUNTIME_CHECK) +static pg_crc32 pg_comp_crc32c_sb8(pg_crc32 crc, const void *data, size_t len); +static const uint32 pg_crc32c_table[8][256]; +#endif + +#ifdef PG_CRC32C_INSTRUCTIONS_NEED_RUNTIME_CHECK +/* + * When built with support for CRC instructions, but we need to perform a + * run-time check to determine whether we can actually use them, + * pg_comp_crc32c is a function pointer. It is initialized to + * pg_comp_crc32c_choose, which performs the runtime check, and
Re: [HACKERS] What exactly is our CRC algorithm?
On 03/25/2015 07:20 PM, Andres Freund wrote: On 2015-03-25 19:18:51 +0200, Heikki Linnakangas wrote: Or better yet, a direct configure test to check if the intrinsic exists - that way we get to also use it on Intel compilers, which I believe also has the same intrinsics. Maybe I'm missing something, but configure isn't run for msvc? Good point. On MSVC, we use the pre-built pg_config.h.win32 file instead. There are already a couple of cases like this in it: /* Define to 1 if you have the `rint' function. */ #if (_MSC_VER = 1800) #define HAVE_RINT 1 #endif I think we should do that for the CRC32 intrinsic too. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What exactly is our CRC algorithm?
On 2015-03-25 19:18:51 +0200, Heikki Linnakangas wrote: I was just about to commit the attached, which is the same as the previous patch with just cosmetic comment changes, but then I realized that this probably doesn't compile with Visual Studio 2005 or older. The code does #ifdef _MSC_VER, and then uses the _mm_crc32_u64 intrinsic, but that intrinsic was added in Visual Studio 2008. I think we'll need a version check there. Good catch. Or better yet, a direct configure test to check if the intrinsic exists - that way we get to also use it on Intel compilers, which I believe also has the same intrinsics. Maybe I'm missing something, but configure isn't run for msvc? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: plpgsql - Assert statement
On 3/25/15 1:21 AM, Pavel Stehule wrote: 2015-03-25 0:17 GMT+01:00 Tom Lane t...@sss.pgh.pa.us mailto:t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com mailto:pavel.steh...@gmail.com writes: updated version with Jim Nasby's doc and rebase against last changes in plpgsql. I started looking at this patch. ISTM there are some pretty questionable design decisions in it: 1. Why create a core GUC to control a behavior that's plpgsql-only? I think it'd make more sense for this to be a plgsql custom GUC (ie, plpgsql.enable_asserts or some such name). This type of assertations can be implemented in any PL language - so I prefer global setting. But I have not strong option in this case - this is question about granularity - and more ways are valid. +1 2. I find the use of errdetail to report whether the assertion condition evaluated to FALSE or to NULL to be pretty useless. (BTW, is considering NULL to be a failure the right thing? SQL CHECK conditions consider NULL to be allowed ...) This is a question - I am happy with SQL CHECK for data, but I am not sure if same behave is safe for plpgsql (procedural) assert. More stricter behave is safer - and some bugs in procedures are based on unhandled NULLs in variables. So in this topic I prefer implemented behave. It is some like: +1. I think POLA here is that an assert must be true and only true to be valid. If someone was unhappy with that they could always coalesce(..., true). -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing PG_VERSION_NUM in pg_config
On 3/24/15 6:26 PM, Tom Lane wrote: Andrew Gierth and...@tao11.riddles.org.uk writes: Tom == Tom Lane t...@sss.pgh.pa.us writes: Tom I concur with Michael that there's value in exposing the version Tom number in the numeric form used by PG_VERSION_NUM. However, I Tom also concur with Andrew that if the use-case for this is Tom Makefiles, pg_config is a pretty poor transmission mechanism. We Tom should instead add PG_VERSION_NUM to the version variables set in Tom Makefile.global. I think there's an argument for both. pg_config already has a VERSION= string in the output, and I think adding a VERSION_NUM= would be good for consistency there. And people definitely do want to do version comparisons in makefiles... Hm. We're all agreed that there's a use case for exposing PG_VERSION_NUM to the makefiles, but I did not hear one for adding it to pg_config; and doing the former takes about two lines whereas adding a pg_config option entails quite a lot of overhead (documentation, translatable help text, yadda yadda). So I'm not in favor of doing the latter without a much more solid case than has been made. Why else would you want the version number other than to do some kind of comparison? I know I've had to play these games in the past (outside of a Makefile), though I don't remember the details right now. I'm sure I'm not alone in that. Michael's original patch seems to hit everything necessary but the translations, and it's only ~15 lines. That doesn't seem very unreasonable to me... -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On 25 March 2015 at 15:49, Amit Kapila amit.kapil...@gmail.com wrote: On Wed, Mar 25, 2015 at 5:16 PM, Thom Brown t...@linux.com wrote: On 25 March 2015 at 10:27, Amit Kapila amit.kapil...@gmail.com wrote: Fixed the reported issue on assess-parallel-safety thread and another bug caught while testing joins and integrated with latest version of parallel-mode patch (parallel-mode-v9 patch). Apart from that I have moved the Initialization of dsm segement from InitNode phase to ExecFunnel() (on first execution) as per suggestion from Robert. The main idea is that as it creates large shared memory segment, so do the work when it is really required. HEAD Commit-Id: 11226e38 parallel-mode-v9.patch [2] assess-parallel-safety-v4.patch [1] parallel-heap-scan.patch [3] parallel_seqscan_v12.patch (Attached with this mail) [1] - http://www.postgresql.org/message-id/ca+tgmobjsuefipok6+i9werugeab3ggjv7jxlx+r6s5syyd...@mail.gmail.com [2] - http://www.postgresql.org/message-id/ca+tgmozfsxzhs6qy4z0786d7iu_abhbvpqfwlthpsvgiecz...@mail.gmail.com [3] - http://www.postgresql.org/message-id/ca+tgmoyjetgeaxuszrona7bdtwzptqexpjntv1gkcavmgsd...@mail.gmail.com Okay, with my pgbench_accounts partitioned into 300, I ran: SELECT DISTINCT bid FROM pgbench_accounts; The query never returns, You seem to be hitting the issue I have pointed in near-by thread [1] and I have mentioned the same while replying on assess-parallel-safety thread. Can you check after applying the patch in mail [1] Ah, okay, here's the patches I've now applied: parallel-mode-v9.patch assess-parallel-safety-v4.patch parallel-heap-scan.patch parallel_seqscan_v12.patch release_lock_dsm_v1.patch (with perl patch for pg_proc.h) The query now returns successfully. and I also get this: grep -r 'starting background worker process parallel worker for PID 12165' postgresql-2015-03-25_112522.log | wc -l 2496 2,496 workers? This is with parallel_seqscan_degree set to 8. If I set it to 2, this number goes down to 626, and with 16, goes up to 4320. .. Still not sure why 8 workers are needed for each partial scan. I would expect 8 workers to be used for 8 separate scans. Perhaps this is just my misunderstanding of how this feature works. The reason is that for each table scan, it tries to use workers equal to parallel_seqscan_degree if they are available and in this case as the scan for inheritance hierarchy (tables in hierarchy) happens one after another, it uses 8 workers for each scan. I think as of now the strategy to decide number of workers to be used in scan is kept simple and in future we can try to come with some better mechanism to decide number of workers. Yes, I was expecting the parallel aspect to apply across partitions (a worker per partition up to parallel_seqscan_degree and reallocate to another scan once finished with current job), not individual ones, so for the workers to be above the funnel, not below it. So this is parallelising, just not in a way that will be a win in this case. :( For the query I posted (SELECT DISTINCT bid FROM pgbench_partitions), the parallelised version takes 8 times longer to complete. However, I'm perhaps premature in what I expect from the feature at this stage. -- Thom
Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0
On Wed, Mar 18, 2015 at 2:41 PM, Heikki Linnakangas hlinn...@iki.fi wrote: Here's what I had in mind: the inserter tags the tuple with the speculative insertion token, by storing the token in the t_ctid field. If the inserter needs to super-delete the tuple, it sets xmax like in a regular deletion, but also sets another flag to indicate that it was a super-deletion. I was able to quickly hack up a prototype of this in my hotel room at pgConf.US. It works fine at first blush, passing the jjanes_upsert stress tests and my own regression tests without a problem. Obviously it needs more testing and clean-up before posting, but I was pleased with how easy this was. When another backend inserts, and notices that it has a potential conflict with the first tuple, it tries to acquire a hw-lock on the token. In most cases, the inserter has long since completed the insertion, and the acquisition succeeds immediately but you have to check because the token is not cleared on a completed insertion. You don't even have to check/take a ShareLock on the token when the other xact committed/aborted, because you know that if it is there, then based on that (and based on the fact that it wasn't super deleted) the tuple is visible/committed, or (in the event of other-xact-abort) not visible/aborted. In other words, we continue to only check for a speculative token when the inserting xact is in flight - we just take the token from the heap now instead. Not much needs to change, AFAICT. Regarding the physical layout: We can use a magic OffsetNumber value above MaxOffsetNumber to indicate that the t_ctid field stores a token rather than a regular ctid value. And another magic t_ctid value to indicate that a tuple has been super-deleted. The token and the super-deletion flag are quite ephemeral, they are not needed after the inserting transaction has completed, so it's nice to not consume the valuable infomask bits for these things. Those states are conveniently not possible on an updated tuple, when we would need the t_ctid field for it's current purpose. Haven't done anything about this yet. I'm just using an infomask2 bit for now. Although that was only because I forgot that you suggested this before having a go at implementing this new t_ctid scheme! My next revision will have a more polished version of this scheme. I'm not going to immediately act on Robert's feedback elsewhere (although I'd like to), owing to time constraints - no reason to deny you the opportunity to review the entirely unrelated low-level speculative locking mechanism due to that. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing PG_VERSION_NUM in pg_config
On 2015-03-25 14:50:44 -0400, Tom Lane wrote: Jim Nasby jim.na...@bluetreble.com writes: On 3/24/15 6:26 PM, Tom Lane wrote: Hm. We're all agreed that there's a use case for exposing PG_VERSION_NUM to the makefiles, but I did not hear one for adding it to pg_config; and doing the former takes about two lines whereas adding a pg_config option entails quite a lot of overhead (documentation, translatable help text, yadda yadda). So I'm not in favor of doing the latter without a much more solid case than has been made. Why else would you want the version number other than to do some kind of comparison? The question is why, if we supply the version number in a make variable, you would not just use that variable instead of having to do $(shell $(PG_CONFIG) --something). The shell version adds new failure modes, removes none, and has no redeeming social value that I can see. I think using the makefile is preferrable if you have the version dependency in the makefile. But if you don't actually use make (e.g. stuff not written in C) or you need the detection in configure or something, it's different. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] deparsing utility commands
On Wed, Mar 25, 2015 at 11:59 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: * Should we prohibit DDL from within event triggers? Please don't prohibit DDL unless there is a really, really good reason to do so. I have several use cases in mind for event triggers, but they are only useful if I can perform DDL. For example, I want to create an Elasticsearch FDW and use that to index and search Postgres tables. When a table is created, I am planning to use an event trigger to capture the CREATE event and automatically create a foreign table via the Elasticsearch FDW. In addition, I would add normal triggers to the new table which capture DML and update the foreign table accordingly. In other words, I want to use FDWs and event triggers to automatically sync table DDL and DML to Elasticsearch.
Re: [HACKERS] Exposing PG_VERSION_NUM in pg_config
Jim Nasby jim.na...@bluetreble.com writes: On 3/24/15 6:26 PM, Tom Lane wrote: Hm. We're all agreed that there's a use case for exposing PG_VERSION_NUM to the makefiles, but I did not hear one for adding it to pg_config; and doing the former takes about two lines whereas adding a pg_config option entails quite a lot of overhead (documentation, translatable help text, yadda yadda). So I'm not in favor of doing the latter without a much more solid case than has been made. Why else would you want the version number other than to do some kind of comparison? The question is why, if we supply the version number in a make variable, you would not just use that variable instead of having to do $(shell $(PG_CONFIG) --something). The shell version adds new failure modes, removes none, and has no redeeming social value that I can see. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove fsync ON/OFF as a visible option?
On 3/21/15 12:25 PM, Jeff Janes wrote: On Sat, Mar 21, 2015 at 8:54 AM, Tom Lane t...@sss.pgh.pa.us mailto:t...@sss.pgh.pa.us wrote: Stephen Frost sfr...@snowman.net mailto:sfr...@snowman.net writes: At the moment, one could look at our default postgresql.conf and the turns forced synchronization on or off and think it's something akin or somehow related to synchronous_commit (which is completely different, but the options are right next to each other..). How about a big warning around fsync and make it more indepenent from the options around it? Yeah, the main SGML docs are reasonably clear about the risks of fsync, but postgresql.conf doesn't give you any hint that it's dangerous. Now I'm not entirely sure that people who frob postgresql.conf without having read the docs can be saved from themselves, but we could do something like this: # - Settings - #wal_level = minimal # minimal, archive, hot_standby, or logical # (change requires restart) #fsync = on# turns forced synchronization on or off + # (fsync=off is dangerous, read the + # (manual before using it) #synchronous_commit = on # synchronization level; # off, local, remote_write, or on #wal_sync_method = fsync # the default is the first option # supported by the operating system: Also, I think the short description turns forced synchronization on or off could stand improvement; it really conveys zero information. Maybe something like force data to disk when committing? I agree the current description is lacking, but that proposed wording would be a better description of synchronous_commit. It is checkpointing and flush-WAL-before-data where fsync=off does its damage. Force data to disk when needed for integrity? Or just don't describe what it is at all, and refer to the documentation only. I see 3 settings that allow people to accidentally shoot themselves in the foot; fsync, wal_sync_method and full_page_writes. How about just grouping those 3 together with a bulk disclaimer along the lines of The following 3 settings are dangerous. Use at your own risk, and read the docs first.? That would also allow us to just remove the comments about what the settings do; if you don't already know you certainly shouldn't be touching them! :) -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2015 proposal. Bitmap Index-only Count
2015-03-24 18:01 GMT+04:00 Tom Lane t...@sss.pgh.pa.us: Anastasia Lubennikova lubennikov...@gmail.com writes: There is a problem of slow counting in PostgreSQL [1]. The reason why this is slow is related to the *MVCC* implementation in PostgreSQL. Index-only scans (implemented since PostgreSQL-9.2) providing some performance improvements where the *visibility map* of the table allows it. That’s good. But it works only for access methods which provide amgettuple method. Unfortunately GIN supports only BitmapIndexScan and has no implementation of index_getnext() interface [2]. Right ... As a GSoC student I will create new Node “Bitmap Index-Only Scan”, which would catch tuples from Bitmap Index Scan node and pass them to Aggregate node. Thus, new query plan will be as follow: I'm pretty hesitant about adding a whole new plan node type (which will require quite a lot of infrastructure) for such a narrow use-case. I think the odds are good that if you proceed down this path, you will end up with something that never gets committed to Postgres. Thanks a lot for reply. It was just approximate idea. I thought is wasn't very good. I wonder whether it'd be possible to teach GIN to support index_getnext instead. Initially it would probably work only for cases where the index didn't have to return any columns ... but if we did it, maybe the door would be open to cases where GIN could reconstruct actual values. Another idea is to write index_getnext() for GIN which would return some fake tuple. Since there is no difference for COUNT aggregate what the tuple contains. COUNT just wants to know whether we have tuple that satisfy the qual. Is this idea better? Is it possible for planner to use index_getnext() for GIN only with COUNT aggregate? -- Best regards, Lubennikova Anastasia
Re: [HACKERS] deparsing utility commands
Alvaro Herrera wrote: Here's an updated version of this series. I just pushed patches 0001 and 0002, with very small tweaks; those had already been reviewed and it didn't seem like there was much controversy. To test the posted series it's probably easiest to git checkout b3196e65f5bfc997ec7fa3f91645a09289c10dee and apply it all on top of that. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reduce pinning in btree indexes
Kevin Grittner kgri...@ymail.com wrote: Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: I attached the your latest patch to this mail as bt-nopin-v4.patch for now. Please check that there's no problem in it. I checked out master, applied the patch and checked it against my latest code (merged with master), and it matched. So, looks good. Pushed with the addition of one more paragraph of comments, regarding something to watch out for in any follow-on patch to eliminate the pin for a scan using a non-MVCC snapshot. Thanks again to Kyotaro-san and Heikki for the reviews! -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2015 proposal. Bitmap Index-only Count
Anastasia Lubennikova lubennikov...@gmail.com writes: 2015-03-24 18:01 GMT+04:00 Tom Lane t...@sss.pgh.pa.us: I wonder whether it'd be possible to teach GIN to support index_getnext instead. Initially it would probably work only for cases where the index didn't have to return any columns ... but if we did it, maybe the door would be open to cases where GIN could reconstruct actual values. Another idea is to write index_getnext() for GIN which would return some fake tuple. Since there is no difference for COUNT aggregate what the tuple contains. COUNT just wants to know whether we have tuple that satisfy the qual. Well, yeah, that would be the idea (at least initially). You don't have to return any real data unless you claim you can do so via amcanreturn. The planner is still capable of selecting an index-only scan as long as the query retrieves no columns. The trick would be to not return the same heap TID more than once per scan. A zero-order implementation would be to construct the same bitmap we do now and then just provide a gingetnext function that scans through that. That would be pretty awful in terms of scan startup time, so doing better would be nice; but perhaps it would be useful even in that form. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add macros wrapping all usage of gcc's __attribute__.
Andres Freund and...@anarazel.de writes: Add macros wrapping all usage of gcc's __attribute__. I noticed that this commit attached pg_attribute_noreturn not only to the extern declarations, but to some actual function definitions. I think this is a bad idea, because it's going to look like heck after pgindent gets through with it. Do we actually need decoration on the function definitions? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: plpgsql - Assert statement
Jim Nasby jim.na...@bluetreble.com writes: On 3/25/15 1:21 AM, Pavel Stehule wrote: 2015-03-25 0:17 GMT+01:00 Tom Lane t...@sss.pgh.pa.us mailto:t...@sss.pgh.pa.us: (BTW, is considering NULL to be a failure the right thing? SQL CHECK conditions consider NULL to be allowed ...) This is a question - I am happy with SQL CHECK for data, but I am not sure if same behave is safe for plpgsql (procedural) assert. More stricter behave is safer - and some bugs in procedures are based on unhandled NULLs in variables. So in this topic I prefer implemented behave. It is some like: +1. I think POLA here is that an assert must be true and only true to be valid. If someone was unhappy with that they could always coalesce(..., true). Fair enough. Committed with the other changes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Abbreviated keys for Numeric
On Wed, Mar 25, 2015 at 6:26 PM, Andrew Gierth and...@tao11.riddles.org.uk wrote: Peter == Peter Geoghegan p...@heroku.com writes: Peter You still pointlessly check memtupcount here: Peter + if (memtupcount 1 || nss-input_count 1 || !nss-estimating) Peter + return false; It's in a register; the test is free. Unbelievable! The test is free, and you wrote it that way in the first place, so we should just leave it that way rather than changing it to suit you? As things stand, the test is totally misleading. In fact, I think it was the source of my initial confusion about the way you track non-null tuple count separately. I certainly didn't have any clue from comments. Apart from fixing this, I added some comments explaining that in my version, but you didn't make any effort to follow that either. Peter This cast to void is unnecessary: Peter + (void) ssup; It's an explicit statement that the parameter is otherwise unused. Maybe that compiler warning isn't usually on by default, but I personally regard it as good style to be explicit about it. Your personal preferences don't enter into it, since every sort support routine since 2011 has followed that style (most still don't touch the sortsupport object). That's the way things work around here - consistency matters. If the existing convention is wrong, we fix the existing convention. Why do you get to ignore well established practice like this? Are you special? Peter Please try and at least consider my feedback. I don't expect you Peter to do exactly what I ask, but I also don't expect you to Peter blithely ignore it. You should really stop digging this hole deeper. You have that backwards. The INT64_MIN/MAX changes should be committed fairly soon. (I haven't posted a patch for TRACE_SORT) Peter I wouldn't assume that. Oh ye of little faith. I would not have said that had I not already been informed of it by a committer, and indeed it is now committed. You could have said so. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with index on unlogged table
On Thu, Mar 26, 2015 at 1:02 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Wed, Mar 25, 2015 at 12:46 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Wed, Mar 25, 2015 at 10:53 AM, Andres Freund and...@2ndquadrant.com wrote: Did you check whether a similar bug was made in other places of 85b506bb? Could you additionally add a regression test to this end? Seems like something worth testing. I'm checking it and adding some regression tests. I didn't found any other similar bug introduced by 85b506bb. Attached the original patch provided by Michael with some regression tests. Thanks for adding a test, this looks fine to me (did some sanity checks and tutti-quanti for people wondering). On temporary tables this was failing with an error in md.c... -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove fsync ON/OFF as a visible option?
On Wed, Mar 25, 2015 at 12:45 PM, Jim Nasby jim.na...@bluetreble.com wrote: I see 3 settings that allow people to accidentally shoot themselves in the foot; fsync, wal_sync_method and full_page_writes. How about just grouping those 3 together with a bulk disclaimer along the lines of The following 3 settings are dangerous. Use at your own risk, and read the docs first.? That would also allow us to just remove the comments about what the settings do; if you don't already know you certainly shouldn't be touching them! :) But one of these things is not like the other. Any supported (i.e. non fatal erroring) setting of wal_sync_method *should* always be safe (although may be inefficient) if the underlying kernel, RAID controller, hard drives, and fs fulfill their pledges. It is hard to document every known liar in this regard. About the best you can do, short of pull-the-plug test on a massive scale, is to run pg_fsync_test and assuming that any result inconsistent with the RPM of the spinning rust is obviously unsafe. Unfortunately that doesn't rule out the possibility that something is both unsafe and gratuitously slow. Cheers, Jeff
[HACKERS] GSoC 2015 proposal: Improve the performance of “ALTER TABLE .. SET LOGGED / UNLOGGED” statement
Hi all, Below I written my proposal idea to this GSoC. *** Improve the performance of “ALTER TABLE .. SET LOGGED / UNLOGGED” statement *** Last year during the GSoC2014 I implemented the feature to allow an unlogged table to be changed to logged [1], but the desing chosen was to rewrite the entire heap in a new relfilenode with a new relpersistence because some troubles pointed here [2]. The project was successfully finished and got committed [3] into PostgreSQL to be released this year in the 9.5 version. However this design lead us to performance problems with large relations because we need to rewrite the entire content of the relation twice, one into a new heap and other into the WAL, so this project will change the current desing of the mechanism of change an unlogged table to logged without the need to rewrite the entire heap, but just by removing the init forks and if the wal_level != minimal we’ll write the contents to the WAL too. ** Benefits to the PostgreSQL Community ** The “unlogged” tables feature was introduced by 9.1 version, and provide better write performance than regular tables (logged), but are not crash-safe. Their contents are automatically discarded (cleared) if the server crashes. Also, their contents do not propagate to standby servers. We already have a way to change an unlogged table to logged using “ALTER TABLE name SET LOGGED” developed last year during the GSoC2014, now during the GSoC2015 we’ll redesing the internals to improve the I/O performance of this feature removing the need of rewrite the entire heap into a new relfilenode. The are a good idea about the desing here [4], but I’ll discuss the design with my mentor to implement this improvement. ** Additional Goals ** The main goal of this project is improve the performance of the ALTER TABLE name SET {LOGGED|UNLOGGED}, but we can expand this propostal to more related goals. * Allow unlogged materialized views . ALTER MATERIALIZED VIEW name SET { UNLOGGED | LOGGED } * Allow unlogged indexes on logged tables . ALTER INDEX name SET { UNLOGGED | LOGGED } ** Deliverables ** This project has just one deliverable at the end. The deliverable will be the improvement of the routines that transform an “unlogged” table to “logged” and “logged” to “unlogged”, without the need to create a new “relfilenode” with a different “relpersistence”. ** Project Schedule ** until May 25: * create a website to the project (wiki.postgresql.org) * create a public repository to the project (github.com/fabriziomello) * read what has already been discussed by the community about the project [4] * learn about some PostgreSQL internals: . control data (src/include/catalog/pg_control.h) . storage (src/backend/storage/*) * discuss the additional goals with community May 26 - June 21 * implementation of the first prototype: . implement the change of unlogged table to logged without rewrite the entire heap when “wal_level = minimal” . at this point when “wal_level != minimal” we use the current implementation * write documentation and the test cases * submit this first prototype to the commitfest 2015/06 ( https://commitfest.postgresql.org/5/) June 22 - June 26 * mentor review the work in progress June 27 - August 17 * do the adjustments based on the community feedback during the commitfest 2015/06 * implementation of the second prototype: . when “wal_level != minimal” we’ll remove the init fork (first prototype) and write relation pages to the WAL. . implement “ALTER MATERIALIZED VIEW .. SET LOGGED / UNLOGGED” * submit to the commitfest 2015/09 for final evaluation and maybe will be committed to 9.6 version (webpage don’t created yet) August 18 - August 21 * do the adjustments based on the community feedback during the commitfest 2015/09 * final mentor review ** About the proponent ** Fabrízio de Royes Mello e-mail: fabriziome...@gmail.com twitter: @fabriziomello github: http://github.com/fabriziomello linkedin: http://linkedin.com/in/fabriziomello Currently I help people and teams to take the full potential of relational databases, especially PostgreSQL, helping teams to design the structure of the database (modeling), build physical architecture (database schema), programming (procedural languages), SQL (usage, tuning, best practices), optimization and orchestration of instances in production too. I perform a volunteer work for Brazilian Community of PostgreSQL (www.postgresql.org.br), supporting mailing lists, organizing events (pgbr.postgresql.org.br) and some admin tasks. And also I help a little the PostgreSQL Global Development Group (PGDG) in the implementation of some features and review of patches (git.postgresql.org). Links [1] https://wiki.postgresql.org/wiki/Allow_an_unlogged_table_to_be_changed_to_logged_GSoC_2014 [2] http://www.postgresql.org/message-id/CA+Tgmob44LNwwU73N1aJsGQyzQ61SdhKJRC_89wCm0+aLg=x...@mail.gmail.com [3]
Re: [HACKERS] Why SyncRepWakeQueue is not static?
SyncRepWakeQueue (src/backend/replication/syncrep.c) is not used anywhere except in the file. If there's no good reason for it, I think it should be declared as a static function. Included patch does so. Fix committed/pushed from master to 9.2. 9.1 declares it as a static function. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why SyncRepWakeQueue is not static?
On Thu, Mar 26, 2015 at 10:46 AM, Tatsuo Ishii is...@postgresql.org wrote: SyncRepWakeQueue (src/backend/replication/syncrep.c) is not used anywhere except in the file. If there's no good reason for it, I think it should be declared as a static function. Included patch does so. Fix committed/pushed from master to 9.2. 9.1 declares it as a static function. Er, is that a good idea to back-patch that? Normally routine specs are maintained stable on back-branches, and this is just a cosmetic change. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with index on unlogged table
On Wed, Mar 25, 2015 at 10:53 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2015-03-25 11:38:30 +0900, Michael Paquier wrote: On Tue, Mar 24, 2015 at 8:46 PM, Thom Brown wrote: The index is unlogged until reindexing... [...] Which is think also raises the question, why are unlogged indexes made persistent by a reindex? That's a bug of HEAD, ~9.4 keeping the index as unlogged even after REINDEX INDEX. What happens is that ReindexIndex relies on relpersistence provided by makeRangeVar at parse time, which is just incorrect as it uses RELPERSISTENCE_PERMANENT all the time. The patch attached fixes that... What the hell? That's somewhat nasty. Nice that it got caught before 9.5 was released. Did you check whether a similar bug was made in other places of 85b506bb? Yeah I got a look at the other code paths, particularly cluster and matviews, and the relpersistence used is taken directly from a Relation. Could you additionally add a regression test to this end? Seems like something worth testing. Definitely. And I guess that Fabrizio already did that... -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Moving on to close the current CF 2015-02
Hi all, Visibly there is no commit fest manager this time (was I?), and people may think that I still am the CFM for 2015-02, continuously after 2014-12 and that I am severely slacking on my duties. Honestly I thought that I was not and that it was clear enoug... Still, biting the bullet to make things move on, should I launch a VACUUM FULL on the current entries of the CF to brush up things that could get in 9.5? The current CF officially finished two weeks ago. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving on to close the current CF 2015-02
On Thu, Mar 26, 2015 at 10:41 AM, Bruce Momjian br...@momjian.us wrote: On Thu, Mar 26, 2015 at 10:22:11AM +0900, Michael Paquier wrote: Hi all, Visibly there is no commit fest manager this time (was I?), and people may think that I still am the CFM for 2015-02, continuously after 2014-12 and that I am severely slacking on my duties. Honestly I thought that I was not and that it was clear enoug... Still, biting the bullet to make things move on, should I launch a VACUUM FULL on the current entries of the CF to brush up things that could get in 9.5? The current CF officially finished two weeks ago. Usually the last commitfest is double the typical length. Oh, OK. So this lets 3 weeks... -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why SyncRepWakeQueue is not static?
Fix committed/pushed from master to 9.2. 9.1 declares it as a static function. Er, is that a good idea to back-patch that? Normally routine specs are maintained stable on back-branches, and this is just a cosmetic change. I'm not sure if it's a cosmetic change or not. I thought declaring to-be-static function as extern is against our coding standard. Moreover, if someone wants to change near the place in the source code in the future, changes made to head may not be easily back patched or cherry-picked to older branches if I do not back patch it. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: btree_gin: properly call DirectFunctionCall1()
Bruce Momjian br...@momjian.us writes: OK, I figured out that I was only supposed to change inet_in, not the other calls to DirectFunctionCall3 (varbit_in and bit_in). Patch attached. That looks better ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: btree_gin: properly call DirectFunctionCall1()
On Tue, Mar 24, 2015 at 10:35:10PM -0400, Bruce Momjian wrote: On Tue, Mar 24, 2015 at 10:02:03PM -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: btree_gin: properly call DirectFunctionCall1() Previously we called DirectFunctionCall3() with dummy arguments. This patch is entirely wrong and has broken the buildfarm. Please revert. OK, done. OK, I figured out that I was only supposed to change inet_in, not the other calls to DirectFunctionCall3 (varbit_in and bit_in). Patch attached. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/contrib/btree_gin/btree_gin.c b/contrib/btree_gin/btree_gin.c new file mode 100644 index 80521fb..1a5bb3c *** a/contrib/btree_gin/btree_gin.c --- b/contrib/btree_gin/btree_gin.c *** GIN_SUPPORT(macaddr, false, leftmostvalu *** 318,327 static Datum leftmostvalue_inet(void) { ! return DirectFunctionCall3(inet_in, ! CStringGetDatum(0.0.0.0/0), ! ObjectIdGetDatum(0), ! Int32GetDatum(-1)); } GIN_SUPPORT(inet, true, leftmostvalue_inet, network_cmp) --- 318,324 static Datum leftmostvalue_inet(void) { ! return DirectFunctionCall1(inet_in, CStringGetDatum(0.0.0.0/0)); } GIN_SUPPORT(inet, true, leftmostvalue_inet, network_cmp) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving on to close the current CF 2015-02
On Thu, Mar 26, 2015 at 10:22:11AM +0900, Michael Paquier wrote: Hi all, Visibly there is no commit fest manager this time (was I?), and people may think that I still am the CFM for 2015-02, continuously after 2014-12 and that I am severely slacking on my duties. Honestly I thought that I was not and that it was clear enoug... Still, biting the bullet to make things move on, should I launch a VACUUM FULL on the current entries of the CF to brush up things that could get in 9.5? The current CF officially finished two weeks ago. Usually the last commitfest is double the typical length. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
Or bottom of make_join_rel(). IMO build_join_rel() is responsible for just building (or searching from a list) a RelOptInfo for given relids. After that make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per join type to generate actual Paths implements the join. make_join_rel() is called only once for particular relid combination, and there SpecialJoinInfo and restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising for FDW cases. I like that idea, but I think we will have complex hook signature, it won't remain as simple as hook (root, joinrel). Signature of the hook (or the FDW API handler) would be like this: typedef void (*GetForeignJoinPaths_function ) (PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *outerrel, RelOptInfo *innerrel, JoinType jointype, SpecialJoinInfo *sjinfo, List *restrictlist); This is very similar to add_paths_to_joinrel(), but lacks semifactors and extra_lateral_rels. semifactors can be obtained with compute_semi_anti_join_factors(), and extra_lateral_rels can be constructed from root-placeholder_list as add_paths_to_joinrel() does. From the viewpoint of postgres_fdw, jointype and restrictlist is necessary to generate SELECT statement, so it would require most work done in make_join_rel again if the signature was hook(root, joinrel). sjinfo will be necessary for supporting SEMI/ANTI joins, but currently it is not in the scope of postgres_fdw. I guess that other FDWs require at least jointype and restrictlist. The attached patch adds GetForeignJoinPaths call on make_join_rel() only when 'joinrel' is actually built and both of child relations are managed by same FDW driver, prior to any other built-in join paths. I adjusted the hook definition a little bit, because jointype can be reproduced using SpecialJoinInfo. Right? Probably, it will solve the original concern towards multiple calls of FDW handler in case when it tries to replace an entire join subtree with a foreign- scan on the result of remote join query. How about your opinion? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com pgsql-v9.5-custom-join.v11.patch Description: pgsql-v9.5-custom-join.v11.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
2015/03/25 18:53、Ashutosh Bapat ashutosh.ba...@enterprisedb.com のメール: On Wed, Mar 25, 2015 at 3:14 PM, Shigeru HANADA shigeru.han...@gmail.com wrote: Or bottom of make_join_rel(). IMO build_join_rel() is responsible for just building (or searching from a list) a RelOptInfo for given relids. After that make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per join type to generate actual Paths implements the join. make_join_rel() is called only once for particular relid combination, and there SpecialJoinInfo and restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising for FDW cases. I like that idea, but I think we will have complex hook signature, it won't remain as simple as hook (root, joinrel). Signature of the hook (or the FDW API handler) would be like this: typedef void (*GetForeignJoinPaths_function ) (PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *outerrel, RelOptInfo *innerrel, JoinType jointype, SpecialJoinInfo *sjinfo, List *restrictlist); This is very similar to add_paths_to_joinrel(), but lacks semifactors and extra_lateral_rels. semifactors can be obtained with compute_semi_anti_join_factors(), and extra_lateral_rels can be constructed from root-placeholder_list as add_paths_to_joinrel() does. From the viewpoint of postgres_fdw, jointype and restrictlist is necessary to generate SELECT statement, so it would require most work done in make_join_rel again if the signature was hook(root, joinrel). sjinfo will be necessary for supporting SEMI/ANTI joins, but currently it is not in the scope of postgres_fdw. I guess that other FDWs require at least jointype and restrictlist. — Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Ignoring entries generated by autoconf in code tree
Hi all, When running autoconf from the root tree, autom4te.cache/ is automatically generated. Wouldn't it make sense to add an entry in .gitignore for that? Regards, -- Michael diff --git a/.gitignore b/.gitignore index 8d3af50..b1f04bb 100644 --- a/.gitignore +++ b/.gitignore @@ -28,6 +28,7 @@ lib*dll.def lib*.pc # Local excludes in root directory +/autom4te.cache/ /GNUmakefile /config.cache /config.log -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
2015/03/25 19:47、Kouhei Kaigai kai...@ak.jp.nec.com のメール: The reason why FDW handler was called multiple times on your example is, your modified make_join_rel() does not check whether build_join_rel() actually build a new RelOptInfo, or just a cache reference, doesn't it? Yep. After that change calling count looks like this: fdw=# explain select * from pgbench_branches b join pgbench_tellers t on t.bid = b.bid join pgbench_accounts a on a.bid = b.bid and a.bid = t.bid; INFO: postgresGetForeignJoinPaths() 1x2 INFO: postgresGetForeignJoinPaths() 1x4 INFO: postgresGetForeignJoinPaths() 2x4 INFO: standard_join_search() old hook point INFO: standard_join_search() old hook point INFO: standard_join_search() old hook point INFO: postgresGetForeignJoinPaths() 0x4 INFO: standard_join_search() old hook point QUERY PLAN - Foreign Scan (cost=100.00..102.11 rows=211 width=1068) (1 row) fdw=# If so, I'm inclined to your proposition. A new bool *found argument of build_join_rel() makes reduce number of FDW handler call, with keeping reasonable information to build remote- join query. Another idea is to pass “found” as parameter to FDW handler, and let FDW to decide to skip or not. Some of FDWs (and some of CSP?) might want to be conscious of join combination. — Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On 25 March 2015 at 10:27, Amit Kapila amit.kapil...@gmail.com wrote: On Fri, Mar 20, 2015 at 5:36 PM, Amit Kapila amit.kapil...@gmail.com wrote: So the patches have to be applied in below sequence: HEAD Commit-id : 8d1f2390 parallel-mode-v8.1.patch [2] assess-parallel-safety-v4.patch [1] parallel-heap-scan.patch [3] parallel_seqscan_v11.patch (Attached with this mail) The reason for not using the latest commit in HEAD is that latest version of assess-parallel-safety patch was not getting applied, so I generated the patch at commit-id where I could apply that patch successfully. [1] - http://www.postgresql.org/message-id/ca+tgmobjsuefipok6+i9werugeab3ggjv7jxlx+r6s5syyd...@mail.gmail.com [2] - http://www.postgresql.org/message-id/ca+tgmozjjzynpxchl3gr7nwruzkazpmpvkatdt5shvc5cd7...@mail.gmail.com [3] - http://www.postgresql.org/message-id/ca+tgmoyjetgeaxuszrona7bdtwzptqexpjntv1gkcavmgsd...@mail.gmail.com Fixed the reported issue on assess-parallel-safety thread and another bug caught while testing joins and integrated with latest version of parallel-mode patch (parallel-mode-v9 patch). Apart from that I have moved the Initialization of dsm segement from InitNode phase to ExecFunnel() (on first execution) as per suggestion from Robert. The main idea is that as it creates large shared memory segment, so do the work when it is really required. HEAD Commit-Id: 11226e38 parallel-mode-v9.patch [2] assess-parallel-safety-v4.patch [1] parallel-heap-scan.patch [3] parallel_seqscan_v12.patch (Attached with this mail) [1] - http://www.postgresql.org/message-id/ca+tgmobjsuefipok6+i9werugeab3ggjv7jxlx+r6s5syyd...@mail.gmail.com [2] - http://www.postgresql.org/message-id/ca+tgmozfsxzhs6qy4z0786d7iu_abhbvpqfwlthpsvgiecz...@mail.gmail.com [3] - http://www.postgresql.org/message-id/ca+tgmoyjetgeaxuszrona7bdtwzptqexpjntv1gkcavmgsd...@mail.gmail.com Okay, with my pgbench_accounts partitioned into 300, I ran: SELECT DISTINCT bid FROM pgbench_accounts; The query never returns, and I also get this: grep -r 'starting background worker process parallel worker for PID 12165' postgresql-2015-03-25_112522.log | wc -l 2496 2,496 workers? This is with parallel_seqscan_degree set to 8. If I set it to 2, this number goes down to 626, and with 16, goes up to 4320. Here's the query plan: QUERY PLAN - HashAggregate (cost=38856527.50..38856529.50 rows=200 width=4) Group Key: pgbench_accounts.bid - Append (cost=0.00..38806370.00 rows=20063001 width=4) - Seq Scan on pgbench_accounts (cost=0.00..0.00 rows=1 width=4) - Funnel on pgbench_accounts_1 (cost=0.00..192333.33 rows=10 width=4) Number of Workers: 8 - Partial Seq Scan on pgbench_accounts_1 (cost=0.00..1641000.00 rows=10 width=4) - Funnel on pgbench_accounts_2 (cost=0.00..192333.33 rows=10 width=4) Number of Workers: 8 - Partial Seq Scan on pgbench_accounts_2 (cost=0.00..1641000.00 rows=10 width=4) - Funnel on pgbench_accounts_3 (cost=0.00..192333.33 rows=10 width=4) Number of Workers: 8 ... - Partial Seq Scan on pgbench_accounts_498 (cost=0.00..10002.10 rows=210 width=4) - Funnel on pgbench_accounts_499 (cost=0.00..1132.34 rows=210 width=4) Number of Workers: 8 - Partial Seq Scan on pgbench_accounts_499 (cost=0.00..10002.10 rows=210 width=4) - Funnel on pgbench_accounts_500 (cost=0.00..1132.34 rows=210 width=4) Number of Workers: 8 - Partial Seq Scan on pgbench_accounts_500 (cost=0.00..10002.10 rows=210 width=4) Still not sure why 8 workers are needed for each partial scan. I would expect 8 workers to be used for 8 separate scans. Perhaps this is just my misunderstanding of how this feature works. -- Thom
Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)
On Wed, Mar 25, 2015 at 12:23 PM, David Steele da...@pgmasters.net wrote: On Wed, Mar 25, 2015 at 12:38 AM, David Steele da...@pgmasters.net wrote: 2. OBJECT auditing does not work before adding acl info to pg_class.rel_acl. In following situation, pg_audit can not audit OBJECT log. $ cat postgresql.conf | grep audit shared_preload_libraries = 'pg_audit' pg_audit.role = 'hoge_user' pg_audit.log = 'read, write' $ psql -d postgres -U hoge_user =# create table hoge(col int); =# select * from hoge; LOG: AUDIT: SESSION,3,1,READ,SELECT,,,select * from hoge; OBJECT audit log is not logged here since pg_class.rel_acl is empty yet. (Only logged SESSION log) So after creating another unconcerned role and grant any privilege to that user, OBJECT audit is logged successfully. Yes, object auditing does not work until some grants have been made to the audit role. =# create role bar_user; =# grant select on hoge to bar_user; =# select * from hoge; LOG: AUDIT: SESSION,4,1,READ,SELECT,,,select * from hoge; LOG: AUDIT: OBJECT,4,1,READ,SELECT,TABLE,public.hoge,select * from hoge; The both OBJCET and SESSION log are logged. Looks right to me. If you don't want the session logging then disable pg_audit.log. Session and object logging are completely independent from each other: one or the other, or both, or neither can be enabled at any time. It means that OBJECT log is not logged just after creating table, even if that table is touched by its owner. To write OBJECT log, we need to grant privilege to role at least. right? Exactly. Privileges must be granted to the audit role in order for object auditing to work. The table owner always can touch its table. So does it lead that table owner can get its table information while hiding OBJECT logging? Also I looked into latest patch again. Here are two review comment. 1. typedef struct { int64 statementId; int64 substatementId; Both statementId and substatementId could be negative number. I think these should be uint64 instead. 2. I got ERROR when executing function uses cursor. 1) create empty table (hoge table) 2) create test function as follows. create function test() returns int as $$ declare cur1 cursor for select * from hoge; tmp int; begin open cur1; fetch cur1 into tmp; return tmp; end$$ language plpgsql ; 3) execute test function (got ERROR) =# select test(); LOG: AUDIT: SESSION,6,1,READ,SELECT,,,selecT test(); LOG: AUDIT: SESSION,6,2,FUNCTION,EXECUTE,FUNCTION,public.test,selecT test(); LOG: AUDIT: SESSION,6,3,READ,SELECT,,,select * from hoge CONTEXT: PL/pgSQL function test() line 6 at OPEN ERROR: pg_audit stack is already empty STATEMENT: selecT test(); It seems like that the item in stack is already freed by deleting pg_audit memory context (in MemoryContextDelete()), before calling stack_pop in dropping of top-level Portal. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: plpgsql - Assert statement
2015-03-26 0:08 GMT+01:00 Tom Lane t...@sss.pgh.pa.us: Jim Nasby jim.na...@bluetreble.com writes: On 3/25/15 1:21 AM, Pavel Stehule wrote: 2015-03-25 0:17 GMT+01:00 Tom Lane t...@sss.pgh.pa.us mailto:t...@sss.pgh.pa.us: (BTW, is considering NULL to be a failure the right thing? SQL CHECK conditions consider NULL to be allowed ...) This is a question - I am happy with SQL CHECK for data, but I am not sure if same behave is safe for plpgsql (procedural) assert. More stricter behave is safer - and some bugs in procedures are based on unhandled NULLs in variables. So in this topic I prefer implemented behave. It is some like: +1. I think POLA here is that an assert must be true and only true to be valid. If someone was unhappy with that they could always coalesce(..., true). Fair enough. Committed with the other changes. Thank you very much regards Pavel regards, tom lane
[HACKERS] compiler warnings in lwlock
When building with LOCK_DEBUG but without casserts, I was getting unused variable warnings. I believe this is the correct way to silence them. Cheers, Jeff silence_lwlock_lock_debug.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Abbreviated keys for Numeric
Peter == Peter Geoghegan p...@heroku.com writes: Peter You still pointlessly check memtupcount here: Peter + if (memtupcount 1 || nss-input_count 1 || !nss-estimating) Peter + return false; It's in a register; the test is free. Peter This cast to void is unnecessary: Peter + (void) ssup; It's an explicit statement that the parameter is otherwise unused. Maybe that compiler warning isn't usually on by default, but I personally regard it as good style to be explicit about it. Peter Please try and at least consider my feedback. I don't expect you Peter to do exactly what I ask, but I also don't expect you to Peter blithely ignore it. You should really stop digging this hole deeper. The INT64_MIN/MAX changes should be committed fairly soon. (I haven't posted a patch for TRACE_SORT) Peter I wouldn't assume that. Oh ye of little faith. I would not have said that had I not already been informed of it by a committer, and indeed it is now committed. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove fsync ON/OFF as a visible option?
On 3/22/15 4:50 PM, Greg Stark wrote: On Sun, Mar 22, 2015 at 3:34 PM, Euler Taveira eu...@timbira.com.br wrote: On 21-03-2015 17:53, Josh Berkus wrote: Now, I have *long* been an advocate that we should ship a stripped PostgreSQL.conf which has only the most commonly used settings, and leave the rest of the settings in the docs and share/postgresql/postgresql.conf.advanced. Here's my example of such a file, tailored to PostgreSQL 9.3: +1. I agree that common used settings in a postgresql.conf file is useful for newbies. How do we ship it? Fwiw I disagree. I'm a fan of the idea that the default should be an empty config file. You should only have to put things in postgresql.conf if you want something unusual or specific. We're a long way from there but I would rather move towards there than keep operating under the assumption that nobody will run Postgres without first completing the rite of passage of reviewing every option in postgresql.conf to see if it's relevant to their setup. Apache used to ship with a config full of commented out options and going through and figuring out which options needed to be enabled or changed was the first step in installing Apache. It was awful. When they adopted a strict policy that the default config was empty so the only things you need in your config are options to specify what you want to serve it became so much easier. I would argue it also represents a more professional attitude where the job of the admin is to declare only what he wants to happen and how it should differ from the norm and the job of the software is to go about its business without needing to be set up for normal uses. +1. Going the route of big default config files just sucks. We should either just have an empty .conf, or only write stuff that initdb has tweaked in it. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On 25 March 2015 16:00, Amit Kapila Wrote: Which version of patch you are looking at? I am seeing below code in ExecInitFunnel() in Version-11 to which you have replied. + /* Funnel node doesn't have innerPlan node. */ + Assert(innerPlan(node) == NULL I was seeing the version-10. I just checked version-11 and version-12 and found to be already fixed. I should have checked the latest version before sending the report…☺ Thanks and Regards, Kumar Rajeev Rastogi From: Amit Kapila [mailto:amit.kapil...@gmail.com] Sent: 25 March 2015 16:00 To: Rajeev rastogi Cc: Amit Langote; Robert Haas; Andres Freund; Kouhei Kaigai; Amit Langote; Fabrízio Mello; Thom Brown; Stephen Frost; pgsql-hackers Subject: Re: [HACKERS] Parallel Seq Scan On Wed, Mar 25, 2015 at 3:47 PM, Rajeev rastogi rajeev.rast...@huawei.commailto:rajeev.rast...@huawei.com wrote: On 20 March 2015 17:37, Amit Kapila Wrote: So the patches have to be applied in below sequence: HEAD Commit-id : 8d1f2390 parallel-mode-v8.1.patch [2] assess-parallel-safety-v4.patch [1] parallel-heap-scan.patch [3] parallel_seqscan_v11.patch (Attached with this mail) While I was going through this patch, I observed one invalid ASSERT in the function “ExecInitFunnel” i.e. Assert(outerPlan(node) == NULL); Outer node of Funnel node is always non-NULL and currently it will be PartialSeqScan Node. Which version of patch you are looking at? I am seeing below code in ExecInitFunnel() in Version-11 to which you have replied. + /* Funnel node doesn't have innerPlan node. */ + Assert(innerPlan(node) == NULL); With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.comhttp://www.enterprisedb.com/
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
2015/03/25 19:09、Kouhei Kaigai kai...@ak.jp.nec.com のメール: On Wed, Mar 25, 2015 at 3:14 PM, Shigeru HANADA shigeru.han...@gmail.com wrote: Or bottom of make_join_rel(). IMO build_join_rel() is responsible for just building (or searching from a list) a RelOptInfo for given relids. After that make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per join type to generate actual Paths implements the join. make_join_rel() is called only once for particular relid combination, and there SpecialJoinInfo and restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising for FDW cases. I like that idea, but I think we will have complex hook signature, it won't remain as simple as hook (root, joinrel). In this case, GetForeignJoinPaths() will take root, joinrel, rel1, rel2, sjinfo and restrictlist. It is not too simple, but not complicated signature. Even if we reconstruct rel1 and rel2 using sjinfo, we also need to compute restrictlist using build_joinrel_restrictlist() again. It is a static function in relnode.c. So, I don't think either of them has definitive advantage from the standpoint of simplicity. The bottom of make_join_rel() seems good from the viewpoint of information, but it is called multiple times for join combinations which are essentially identical, for INNER JOIN case like this: fdw=# explain select * from pgbench_branches b join pgbench_tellers t on t.bid = b.bid join pgbench_accounts a on a.bid = b.bid and a.bid = t.bid; INFO: postgresGetForeignJoinPaths() 1x2 INFO: postgresGetForeignJoinPaths() 1x4 INFO: postgresGetForeignJoinPaths() 2x4 INFO: standard_join_search() old hook point INFO: standard_join_search() old hook point INFO: standard_join_search() old hook point INFO: postgresGetForeignJoinPaths() 0x4 INFO: postgresGetForeignJoinPaths() 0x2 INFO: postgresGetForeignJoinPaths() 0x1 INFO: standard_join_search() old hook point QUERY PLAN - Foreign Scan (cost=100.00..102.11 rows=211 width=1068) (1 row) Here I’ve put probe point in the beginnig of GetForeignJoinPaths handler and just before set_cheapest() call in standard_join_search() as “old hook point”. In this example 1, 2, and 4 are base relations, and in the join level 3 planner calls GetForeignJoinPaths() three times for the combinations: 1) (1x2)x4 2) (1x4)x2 3) (2x4)x1 Tom’s suggestion is aiming at providing a chance to consider join push-down in more abstract level, IIUC. So it would be good to call handler only once for that case, for flattened combination (1x2x3). Hum, how about skipping calling handler (or hook) if the joinrel was found by find_join_rel()? At least it suppress redundant call for different join orders, and handler can determine whether the combination can be flattened by checking that all RelOptInfo with RELOPT_JOINREL under joinrel has JOIN_INNER as jointype. The reason why FDW handler was called multiple times on your example is, your modified make_join_rel() does not check whether build_join_rel() actually build a new RelOptInfo, or just a cache reference, doesn't it? If so, I'm inclined to your proposition. A new bool *found argument of build_join_rel() makes reduce number of FDW handler call, with keeping reasonable information to build remote- join query. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Wed, Mar 25, 2015 at 4:08 PM, Rajeev rastogi rajeev.rast...@huawei.com wrote: On 25 March 2015 16:00, Amit Kapila Wrote: Which version of patch you are looking at? I am seeing below code in ExecInitFunnel() in Version-11 to which you have replied. + /* Funnel node doesn't have innerPlan node. */ + Assert(innerPlan(node) == NULL I was seeing the version-10. I just checked version-11 and version-12 and found to be already fixed. I should have checked the latest version before sending the report…J No problem, Thanks for looking into the patch. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] pg_rewind in contrib
On Thu, Mar 26, 2015 at 12:23 PM, Venkata Balaji N nag1...@gmail.com wrote: Test 1 : [...] If the master is crashed or killed abruptly, it may not be possible to do a rewind. Is my understanding correct ? Yep. This is mentioned in the documentation: http://www.postgresql.org/docs/devel/static/app-pgrewind.html The target server must shut down cleanly before running pg_rewind. Test 2 : - On a successfully running streaming replication with one master and one slave, i did a clean shutdown of master - promoted slave - performed some operations (data changes) on newly promoted slave and did a clean shutdown - Executed pg_rewind on the old master to sync with the latest changes on new master. I got the below message The servers diverged at WAL position 0/A298 on timeline 1. No rewind required. I am not getting this too. In this case the master WAL visibly did not diverge from the slave WAL line. A rewind is done if the master touches new relation pages after the standby has been promoted, and before the master is shutdown. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Wed, Mar 25, 2015 at 9:53 PM, Thom Brown t...@linux.com wrote: On 25 March 2015 at 15:49, Amit Kapila amit.kapil...@gmail.com wrote: On Wed, Mar 25, 2015 at 5:16 PM, Thom Brown t...@linux.com wrote: Okay, with my pgbench_accounts partitioned into 300, I ran: SELECT DISTINCT bid FROM pgbench_accounts; The query never returns, You seem to be hitting the issue I have pointed in near-by thread [1] and I have mentioned the same while replying on assess-parallel-safety thread. Can you check after applying the patch in mail [1] Ah, okay, here's the patches I've now applied: parallel-mode-v9.patch assess-parallel-safety-v4.patch parallel-heap-scan.patch parallel_seqscan_v12.patch release_lock_dsm_v1.patch (with perl patch for pg_proc.h) The query now returns successfully. Thanks for verification. .. Still not sure why 8 workers are needed for each partial scan. I would expect 8 workers to be used for 8 separate scans. Perhaps this is just my misunderstanding of how this feature works. The reason is that for each table scan, it tries to use workers equal to parallel_seqscan_degree if they are available and in this case as the scan for inheritance hierarchy (tables in hierarchy) happens one after another, it uses 8 workers for each scan. I think as of now the strategy to decide number of workers to be used in scan is kept simple and in future we can try to come with some better mechanism to decide number of workers. Yes, I was expecting the parallel aspect to apply across partitions (a worker per partition up to parallel_seqscan_degree and reallocate to another scan once finished with current job), not individual ones, Here what you are describing is something like parallel partition scan which is somewhat related but different feature. This feature will parallelize the scan for an individual table. so for the workers to be above the funnel, not below it. So this is parallelising, just not in a way that will be a win in this case. :( For the query I posted (SELECT DISTINCT bid FROM pgbench_partitions), the parallelised version takes 8 times longer to complete. I think the primary reason for it not performing as per expectation is because we have either not the set the right values for cost parameters or changed the existing cost parameters (cost_seq_page) which makes planner to select parallel plan even though it is costly. This is similar to the behaviour when user has intentionally disabled index scan to test sequence scan and then telling that it is performing slower. I think if you want to help in this direction, then what will be more useful is to see what could be the appropriate values of cost parameters for parallel scan. We have introduced 3 parameters (cpu_tuple_comm_cost, parallel_setup_cost, parallel_startup_cost) for costing of parallel plans, so with your tests if we can decide what is the appropriate value for each of these parameters such that it chooses parallel plan only when it is better than non-parallel plan, then that will be really valuable input. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
2015/03/26 10:51、Kouhei Kaigai kai...@ak.jp.nec.com のメール: The attached patch adds GetForeignJoinPaths call on make_join_rel() only when 'joinrel' is actually built and both of child relations are managed by same FDW driver, prior to any other built-in join paths. I adjusted the hook definition a little bit, because jointype can be reproduced using SpecialJoinInfo. Right? OK. Probably, it will solve the original concern towards multiple calls of FDW handler in case when it tries to replace an entire join subtree with a foreign- scan on the result of remote join query. How about your opinion? Seems fine. I’ve fixed my postgres_fdw code to fit the new version, and am working on handling a whole-join-tree. It would be difficult in the 9.5 cycle, but a hook point where we can handle whole joinrel might allow us to optimize a query which accesses multiple parent tables, each is inherited by foreign tables and partitioned with identical join key, by building a path tree which joins sharded tables first, and then union those results. -- Shigeru HANADA shigeru.han...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind in contrib
I have committed this, with some more kibitzing. hope I have not missed any comments given so far. Many thanks for the review, and please continue reviewing and testing it :-). I have been testing the pg_rewind and have an analysis to share along with few questions - I had a streaming replication setup with one master and one slave running successfully. Test 1 : - Killed postgres process on master and promoted slave. Both were in sync earlier. - performed some operations (data changes) on newly promoted slave node and shutdown - Executed pg_rewind on old master and got the below message *target server must be shut down cleanly* *Failure, exiting* If the master is crashed or killed abruptly, it may not be possible to do a rewind. Is my understanding correct ? Test 2 : - On a successfully running streaming replication with one master and one slave, i did a clean shutdown of master - promoted slave - performed some operations (data changes) on newly promoted slave and did a clean shutdown - Executed pg_rewind on the old master to sync with the latest changes on new master. I got the below message *The servers diverged at WAL position 0/A298 on timeline 1.* *No rewind required.* I am not getting this too. Regards, Venkata Balaji N