Re: [HACKERS] Trigger concurrent execution
On 05/16/2014 08:06 AM, Blagoj Petrushev wrote: > Hi, > > I'm thinking of an extension to trigger functionality like this: > > CREATE TRIGGER trigger_name > AFTER event > ON table > CONCURRENTLY EXECUTE PROCEDURE trigger_fc > > This would call the trigger after the end of the transaction. If "after the end of the transaction" is what you mean by "concurrently", then that's the wrong word to choose. "AFTER COMMIT" ? The concept of running a trigger "concurrently" just doesn't make sense in PostgreSQL, because the backend is single threaded. You wouldn't be able to run any SQL commands until the trigger finished. It isn't possible to do anything useful without a transaction, so PostgreSQL would need to start a transaction for the trigger and commit the transaction at the end, as if you'd run SELECT my_procedure();. Because it's outside the scope of the transaction it probably wouldn't be possible to do FOR EACH ROW with a NEW and OLD var, unless you stashed them as materialized rows in the queue of pending "AFTER COMMIT" triggers. Finally, because it's after transaction commit, you couldn't easily guarantee that the trigger would really run. If the backend crashed / the server was shut down / etc after the commit but before your trigger finished, you'd have a committed transaction but the trigger would not run. To fix that you'd need to somehow make the trigger queue WAL-logged and run it during replay, which from my rather limited understanding of this area would be ... "interesting" to do. It'd also mean the trigger couldn't have any session context. This isn't easy, if it's practical at all. > I have a big table with big text column article and a nullable > tsvector column fts_article. On each insert or update that changes the > article, I trigger-issue 'NOTIFY article_changed row_id', then, with a > daemon listener, I catch the notification and update fts_article > accordingly with my_fts_fc(article). The reason I don't do this > directly in my trigger is because my_fts_fc is slow for big articles, > fts_article has a gin index, and also, on heavy load, my listener can > do these updates concurrently. Now, with a concurrent execution of > triggers, I can just call my_fts_fc inside the trigger instead of the > notify roundtrip. I don't think that really fits. It seems like you want to run the trigger procedure in the background on another back-end. That'd be quite cool, but also not trivial to do, especially if you wanted to guarantee that it happened reliably and in a crash-safe manner. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] BUGFIX: Dynamic bgworkers with BGW_NEVER_RESTART worker restarted on FatalError
Hi all There's a bug in the dynamic bgworkers code that I think needs fixing before release. TL;DR: BGW_NO_RESTART workers are restarted after postmaster crash, attached patch changes that. The case that's triggering the issue is where a static bgworker is registering a new dynamic bgworker to do some setup work each time it starts. The static worker is relaunched on postmaster restart and registers a new BGW_NO_RESTART dynamic bgworker. This dynamic bgworker immediately hits an Assert and dies. This *should* cause a postmaster restart loop; that's expected. What it shouldn't do, but is doing, is restart multiple copies of the bgworker - fail to purge the old BGW_NO_RESTART one and launch it as if it'd never crashed. The attached patch fixes the problem. Detail: If you set a worker as BGW_NO_RESTART it isn't restarted if it ERRORs out. That's fine. With Petr's applied patch it no longer restarts on exit 0 (normal exit) either. There's a third case, though: a bgworker crash causing postmaster restart. In this case the bgworker is still restarted, which makes no sense at all if it isn't for the other two cases. The existing code looks like it tries to protect against this - in maybe_start_bgworker, the invocation of do_start_bgworker is protected by a prior test for rw->rw_crashed_at that, if rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART, unregisters the worker and skips to the next one. However, in my testing a breakpoint inside the if (rw->rw_crashed_at != 0) test in maybe_start_bgworker is never hit, even for a bgworker that is known to have crashed. rw->rw_crashed_at is always zero. The culprit is ResetBackgroundWorkerCrashTimes, which unconditionally resets the crash time without considering that the worker might be BGW_NO_RESTART. The attached patch makes ResetBackgroundWorkerCrashTimes only reset the crashed time for workers with a restart time set. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services >From a48de97f593b17160487e67f953c8d2b25f9e98a Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Fri, 16 May 2014 13:29:16 +0800 Subject: [PATCH] Don't restart BGW_NO_RESTART workers after postmaster crash ResetBackgroundWorkerCrashTimes was ignoring BGW_NO_RESTART and clearing the crash time of all bgworkers on postmaster restart so they'd be restarted as soon as the postmaster was up and ready. It should only do this for workers that are supposed to be restarted, as a BGW_NO_RESTART worker with no rw_crashed_at set is assumed to be newly registered. As a result, if a bgworker registered another BGW_NO_RESTART bgworker during postmaster restart we'd get an increasing number of duplicates of the BGW_NO_RESTART worker every time the postmaster restarted. --- src/backend/postmaster/bgworker.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c index 85a3b3a..4bbebb6 100644 --- a/src/backend/postmaster/bgworker.c +++ b/src/backend/postmaster/bgworker.c @@ -411,7 +411,8 @@ ResetBackgroundWorkerCrashTimes(void) RegisteredBgWorker *rw; rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur); - rw->rw_crashed_at = 0; + if (rw->rw_worker.bgw_restart_time != BGW_NEVER_RESTART) + rw->rw_crashed_at = 0; } } -- 1.9.0 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] psql \db+ lack of size column
Hi all, Are there some reason to don't show the tablespace size in the \db+ psql command? Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Timbira: http://www.timbira.com.br >> Blog sobre TI: http://fabriziomello.blogspot.com >> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] 9.0 PDF build broken
I wrote: > Yeah. This is caused by a hyperlink whose displayed text crosses a page > boundary. The only known fix is to change the text enough so the link > no longer runs across a page boundary. Unfortunately, pdfTeX is pretty > unhelpful about identifying exactly where the problem is. I seem to > recall having posted a recipe about finding such problems. Ah, found it: http://www.postgresql.org/message-id/9473.1296172...@sss.pgh.pa.us regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.0 PDF build broken
Peter Eisentraut writes: > Mysteriously, commit 6b2a1445ec8a631060c4cbff3f172bf31d3379b9 has broken > the PDF build (openjade + pdfjadetex) in the 9.0 branch only. The error > is > [256.0.28 > ! pdfTeX error (ext4): \pdfendlink ended up in different nesting level than > \pd > fstartlink. Oh, not that again. > I have reproduced this on two different platforms, and it affects only > this branch. I guess this change might have caused the page boundaries > to shift in an unfortunate way. I seem to recall we have had similar > problems before. Does anyone remember? Yeah. This is caused by a hyperlink whose displayed text crosses a page boundary. The only known fix is to change the text enough so the link no longer runs across a page boundary. Unfortunately, pdfTeX is pretty unhelpful about identifying exactly where the problem is. I seem to recall having posted a recipe about finding such problems. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.0 PDF build broken
Mysteriously, commit 6b2a1445ec8a631060c4cbff3f172bf31d3379b9 has broken the PDF build (openjade + pdfjadetex) in the 9.0 branch only. The error is [256.0.28 ! pdfTeX error (ext4): \pdfendlink ended up in different nesting level than \pd fstartlink. \AtBegShi@Output ...ipout \box \AtBeginShipoutBox \fi \fi l.241723 ...char95{}stat\char95{}file('filename'); ! ==> Fatal error occurred, no output PDF file produced! Transcript written on postgres-A4.log. make: *** [postgres-A4.pdf] Error 1 I have reproduced this on two different platforms, and it affects only this branch. I guess this change might have caused the page boundaries to shift in an unfortunate way. I seem to recall we have had similar problems before. Does anyone remember? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Trigger concurrent execution
Blagoj Petrushev wrote > Hi, > > I'm thinking of an extension to trigger functionality like this: > > CREATE TRIGGER trigger_name > AFTER event > ON table > CONCURRENTLY EXECUTE PROCEDURE trigger_fc > > This would call the trigger after the end of the transaction. > > The following is a use-case, please tell me if I'm doing it wrong. > > I have a big table with big text column article and a nullable > tsvector column fts_article. On each insert or update that changes the > article, I trigger-issue 'NOTIFY article_changed row_id', then, with a > daemon listener, I catch the notification and update fts_article > accordingly with my_fts_fc(article). The reason I don't do this > directly in my trigger is because my_fts_fc is slow for big articles, > fts_article has a gin index, and also, on heavy load, my listener can > do these updates concurrently. Now, with a concurrent execution of > triggers, I can just call my_fts_fc inside the trigger instead of the > notify roundtrip. Conceptually, trigger actions run in-transaction and can cause it to ROLLBACK; so how would "after the end of the transaction" work? Since the easy way is to have COMMIT; block until all the AFTER event concurrent triggers fire I presume you would want something more like a task queue for background workers where, at commit, the function call is in place in a FIFO queue and the calling session is allowed to move onto other activity. It is not clear what you mean by "my listener can do these updates concurrently"? Concurrently with each other or concurrently with other DML action on table?I assume you have multiple listeners since the potential rate of insert of the documents is likely much greater than the rate of update/indexing. Also, it would seem you'd typically want the GIN index to be updated once the corresponding transaction committed and makes the rest of the data available. Or does your use case allow for some delay between the article being in the database physically and it being available in the index? David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Trigger-concurrent-execution-tp5804158p5804164.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
Andres Freund-3 wrote > On 2014-05-04 08:46:07 -0400, Bruce Momjian wrote: >> I have completed the initial version of the 9.4 release notes. You can >> view them here: >> >> http://www.postgresql.org/docs/devel/static/release-9-4.html >> >> I will be adding additional markup in the next few days. >> >> Feedback expected and welcomed. I expect to be modifying this until we >> release 9.4 final. I have marked items where I need help with question >> marks. > > This time I started reading from the end. I think I've fixed most of the > questionable things (i.e. ? or FIXMEs) left. > > I am not really sure how to rewrite the notes for the logical decoding > stuff into a more appropriate format for the release notes. Currently it > seems to describe too many details and not enough overview. It's also > probably too long. > > How about letting it keep it's > > but remove the > > and > put a short explanation about the individual parts into a following > > or two? That'd require a name after a > > which normally > isn't done... > > Greetings, > > Andres Freund Some errors and suggestions - my apologizes for the format as I do not have a proper patching routine setup. Patch Review - Top to Bottom (mostly, I think...) s/constraints's/constraint/ - possessive not needed here, it is a property of the constraint, not owned by it. s/plannet/planner DISCARD ALL now also discards the states of sequences. change to DISCARD ALL now also discards sequence state. IIUC: Logical decoding allows for streaming of statement-scoped database changes. Add function pg_filenode_relation() to more efficiently lookup relations via their filenode. This allows one to declare array_agg()-like aggregates using SQL. IIUC: Remove line length restrictions from pgbench. These are not in the patch but from my quick scan of the online release notes - top to bottom: then conditionally additional adjacent whitespace if not in FX mode -> then, conditionally, remove additional adjacent whitespace if not in FX mode. (I presume those conditions are noted in the documentation somewhere) For example, previously format string space-space-space would consume only the first space in ' 12', while it will not consume all three characters. -> For example, the format string space-space-space previously consumed only the first space in 'space-space-12' whereas now it will consume all three characters. style: add comma -> Previously[,] empty arrays were returned (occurs frequently but is minor) style: NULL VARIADIC function arguments are now disallowed -> Disallow NULL VARIADIC function arguments (most of the notes are verb-leading in structure) During immediate shutdown, send uncatchable termination <- kill the comma In contrast to local_preload_libraries, this parameter can load any shared library <- shoot the comma The linking on this one is odd: Have Windows ASCII-encoded databases and server process (e.g. postmaster) emit messages in the LC_CTYPE-defined language (Alexander Law, Noah Misch) Add ROWS FROM() syntax to allow horizontal concatenation of set-returning functions in the FROM-clause (Andrew Gierth) -> Maybe a note about using this to avoid least-common-multiple expansion? Add WITH ORDINALITY syntax which numbers rows returned from FROM-clause functions -> Add WITH ORDINALITY syntax to number the rows returned by FROM-clause functions. ??? DISCARD ALL will now also discard such information. -> DISCARD ALL now also invokes this command. be converted to NULL in in CSV mode <- strike one of the "in"s [I got no clue on this pair... but recommend someone rewrite it] Improve the internal definition of system relations (Andres Freund, Robert Haas) Previously, relations once moved into the system catalog schema could no longer be modified or dropped. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/9-4-release-notes-tp5802343p5804163.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error in running DBT2
Andrew Dunstan wrote > On 05/15/2014 06:37 PM, Rohit Goyal wrote: >> Hi All, >> >> I am using centOS6 and after all confugration, I run the below command >> >> *dbt2-run-workload -a pgsql -d 120 -w 1 -o /tmp/result -c 10 >> * >> *Error:* >> Stage 3. Processing of results... >> Killing client... >> waiting for server to shut down done >> server stopped >> Traceback (most recent call last): >> File "/tmp/dbt2/bin/dbt2-post-process", line 14, in > >> import rpy2.robjects as robjects >> ImportError: No module named rpy2.robjects >> Test completed. >> Results are in: /tmp/result >> >> Please guide me !! :) >> >> > > > Do these questions about running dbt2 even belong on pgsql-hackers? They > seem to me to be usage questions that belong on pgsql-general. > > They are also woefully inadequate in the detail they provide. I don't > see how anyone could take the above report and give any sort of opinion, > other than that it is a pythin error and not apparently a postgres error > at all. Actually, osdldbt-gene...@lists.sourceforge.net would be the proper location and I do see that the OP has found said mailing list and even got a response a few days ago. I guess it makes some sense to shout help from the rooftops but likely anyone willing and able to personal training to a DBT2 newbie is going to be monitoring the aforementioned list. One other suggestion is, after seeing such an error as above, actually telling people that you have made some rudimentary attempt to confirm that, indeed you have: 1) Python installed 2) R installed 3) The python module "rpy2" installed I am thinking centOS6 doesn't have a packaging system that ensures that all dependencies are present and so it seems that the user needs to do so and affirm that such is the case when asking for help. Another helpful outcome of trying to understand what people need to help you is that you would realize that the message "Results are in: /tmp/result" indicates that there is an additional file that someone debugging (you or someone on these mailing lists) would likely find helpful to review. Attach that file - either in full or in part - or at least indicate you looked at it and didn't see anything unusual (unlikely that, though). Sorry I cannot help you directly but maybe I can help you be more successful in asking for help in the future :) Good Luck! David J. IOW: Guiding someone through debugging on an mailing list is very tedious and difficult for the people doing the guiding - do as much exploring as possible on your own and tell people where you have already been and what you have seen. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Error-in-running-DBT2-tp5804147p5804161.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
On 2014-05-04 08:46:07 -0400, Bruce Momjian wrote: > I have completed the initial version of the 9.4 release notes. You can > view them here: > > http://www.postgresql.org/docs/devel/static/release-9-4.html > > I will be adding additional markup in the next few days. > > Feedback expected and welcomed. I expect to be modifying this until we > release 9.4 final. I have marked items where I need help with question > marks. This time I started reading from the end. I think I've fixed most of the questionable things (i.e. ? or FIXMEs) left. I am not really sure how to rewrite the notes for the logical decoding stuff into a more appropriate format for the release notes. Currently it seems to describe too many details and not enough overview. It's also probably too long. How about letting it keep it's but remove the and put a short explanation about the individual parts into a following or two? That'd require a name after a which normally isn't done... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services >From 18511f2e4cd7d72b8d943efd9e6501d3903a64c4 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Fri, 16 May 2014 01:37:07 +0200 Subject: [PATCH] Further 9.4 release notes improvements. --- doc/src/sgml/release-9.4.sgml | 52 +++ 1 file changed, 33 insertions(+), 19 deletions(-) diff --git a/doc/src/sgml/release-9.4.sgml b/doc/src/sgml/release-9.4.sgml index 3070d0b..e143178 100644 --- a/doc/src/sgml/release-9.4.sgml +++ b/doc/src/sgml/release-9.4.sgml @@ -30,7 +30,7 @@ Logical decoding allows database - changes to be streamed out in customizable format + changes to be streamed out in a customizable format @@ -298,6 +298,12 @@ + + + DISCARD ALL now also discards the states of sequences. + + + @@ -1005,7 +1011,6 @@ - This was added so views that select from a table with zero columns can be dumped correctly. @@ -1028,7 +1033,6 @@ - DISCARD ALL will now also discard such information. @@ -1199,6 +1203,11 @@ AGGREGATE to supply the size of the aggregate's transition state data (Hadi Moshayedi) + + +This allows the plannet to better estimate how much memory will be +used when aggregating. + @@ -1218,7 +1227,7 @@ -Allow the changing of foreign key constraint via ALTER TABLE ... ALTER CONSTRAINT (Simon Riggs) @@ -1254,7 +1263,7 @@ -Fully-implement the line data type (Peter Eisentraut) @@ -1472,7 +1481,7 @@ Add function pg_filenode_relation() -to allow for more efficient filenode to relation lookups (Andres +to allow for more efficient lookups from filenode to relation (Andres Freund) @@ -1543,10 +1552,13 @@ - Allow polymorphic aggregates to have non-polymorphic state data -types ? (Tom Lane) +types (Tom Lane) + + +This allows to declare aggregates like the builtin +array_agg() from SQL. @@ -1772,8 +1784,8 @@ -Allow field wrapping to psql's "extended" mode -(Sergey Muraviov) +Add ability to wrap long lines in psql's "expanded" +mode by using \pset format wrapped (Sergey Muraviov) @@ -2218,7 +2230,8 @@ Add pg_prewarm -to preload relation data into the shared buffer cache (Robert Haas) +extension to preload relation data into the shared buffer cache +(Robert Haas) @@ -2243,7 +2256,7 @@ -Add logging of trigger execution to auto_explain (Horiguchi Kyotaro) @@ -2279,9 +2292,10 @@ -Improve indexing of pg_trgm values to -discourage indexing whitespace (Alexander Korotkov) +Improve pg_trgm's +generation of trigrams for indexed regular expression searches by +discouraging the use of trigrams containing whitespace (Alexander +Korotkov) @@ -2328,7 +2342,7 @@ -Allow pgbench to process script files of any line length (Sawada +Allow pgbench to process script files of any line length (Sawada Masahiko) @@ -2339,20 +2353,20 @@ -Add pg_bench option (--rate) to control +Add pgbench option (--rate) to control the transaction rate (Fabien Coelho) -Add --progress output option to pgbench (Fabien Coelho) +Add --progress output option t
[HACKERS] Trigger concurrent execution
Hi, I'm thinking of an extension to trigger functionality like this: CREATE TRIGGER trigger_name AFTER event ON table CONCURRENTLY EXECUTE PROCEDURE trigger_fc This would call the trigger after the end of the transaction. The following is a use-case, please tell me if I'm doing it wrong. I have a big table with big text column article and a nullable tsvector column fts_article. On each insert or update that changes the article, I trigger-issue 'NOTIFY article_changed row_id', then, with a daemon listener, I catch the notification and update fts_article accordingly with my_fts_fc(article). The reason I don't do this directly in my trigger is because my_fts_fc is slow for big articles, fts_article has a gin index, and also, on heavy load, my listener can do these updates concurrently. Now, with a concurrent execution of triggers, I can just call my_fts_fc inside the trigger instead of the notify roundtrip. Kind regards, Blagoj Petrushev -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] buildfarm animals and 'snapshot too old'
On 15.5.2014 22:07, Andrew Dunstan wrote: > > Yes, I've seen that. Frankly, a test that takes something like 500 > hours is a bit crazy. Maybe. It certainly is not a test people will use during development. But if it can detect some hard-to-find errors in the code, that might possibly lead to serious problems, then +1 from me to run them at least on one animal. 500 hours is ~3 weeks, which is not that bad IMHO. Also, once you know where it fails the developer can run just that single test (which might take minutes/hours, but not days). > If we really want to run this in the buildfarm we should probably > try to create a massively cut down test schedule for use in this > case. If we can run cut this down in a meaningful way (i.e. without sacrificing most of the benefits) then sure - let's do that. But I think that's what CLOBBER_CACHE_ALWAYS is about. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] buildfarm animals and 'snapshot too old'
On 15.5.2014 22:56, Andrew Dunstan wrote: > > On 05/15/2014 04:30 PM, Stefan Kaltenbrunner wrote: > >> well I'm not sure about about "misconfigured" but both my personal >> buildfarm members and pginfra run ones (like gaibasaurus) got errors >> complaining about "snapshot too old" in the past for long running tests >> so I'm not sure it is really a "we never had machine with runs that >> long". So maybe we should not reject those submissions at submission >> time but rather mark them clearly on the dashboard and leave the final >> interpretation to a human... >> > > That's a LOT harder and more work to arrange. Frankly, there are more > important things to do. > > I would like to know the circumstances of these very long runs. I drive > some of my VMs pretty hard on pretty modest hardware, and they don't > come close to running 24 hours. > > The current behaviour goes back to this commit from December 2011: > >commit a8b5049e64f9cb08f8e165d0737139dab74e3bce >Author: Andrew Dunstan >Date: Wed Dec 14 14:38:44 2011 -0800 > > Use git snapshot instead of fixed 10 day timeout. > > The sanity checks made sure that an animal wasn't submitting a > snapshot that was too old. But sometimes an old branch doesn't > get any changes for more than 10 days. So accept a snapshot that > is not more than 1 day older than the last known snapshot. Per > complaint from Stefan. > > > I'm prepared to increase the sanity check time if there is a serious > demand for it, but I'd like to know what to increase it to. I doubt there's no "one size fits all" limit. If the machines running "recursive clobber" tests need tens of days to complete the tests. then that limit is pretty useless to most regular animals. So what about keeping the current value for most animals, but allowing an override for some selected ones? I'd expect this to be much simpler to implement, and it shouldn't require any human intervention. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error in running DBT2
On Thu, May 15, 2014 at 3:47 PM, Andrew Dunstan wrote: > Do these questions about running dbt2 even belong on pgsql-hackers? They > seem to me to be usage questions that belong on pgsql-general. I agree. Anyway, perhaps the OP will have more luck with OLTPBenchmark, which has some kind of TPC-C support: http://oltpbenchmark.com/wiki/index.php?title=Main_Page -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error in running DBT2
On 05/15/2014 06:37 PM, Rohit Goyal wrote: Hi All, I am using centOS6 and after all confugration, I run the below command *dbt2-run-workload -a pgsql -d 120 -w 1 -o /tmp/result -c 10 * *Error:* Stage 3. Processing of results... Killing client... waiting for server to shut down done server stopped Traceback (most recent call last): File "/tmp/dbt2/bin/dbt2-post-process", line 14, in import rpy2.robjects as robjects ImportError: No module named rpy2.robjects Test completed. Results are in: /tmp/result Please guide me !! :) Do these questions about running dbt2 even belong on pgsql-hackers? They seem to me to be usage questions that belong on pgsql-general. They are also woefully inadequate in the detail they provide. I don't see how anyone could take the above report and give any sort of opinion, other than that it is a pythin error and not apparently a postgres error at all. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Error in running DBT2
Hi All, I am using centOS6 and after all confugration, I run the below command *dbt2-run-workload -a pgsql -d 120 -w 1 -o /tmp/result -c 10* *Error:* Stage 3. Processing of results... Killing client... waiting for server to shut down done server stopped Traceback (most recent call last): File "/tmp/dbt2/bin/dbt2-post-process", line 14, in import rpy2.robjects as robjects ImportError: No module named rpy2.robjects Test completed. Results are in: /tmp/result Please guide me !! :) Regards, Rohit Goyal -- Regards, Rohit Goyal
Re: [HACKERS] CTE that result in repeated sorting of the data
On Thu, May 15, 2014 at 4:50 PM, David G Johnston wrote: > Jon Nelson-14 wrote >> I was watching a very large recursive CTE get built today and this CTE >> involves on the order of a dozen or so "loops" joining the initial >> table against existing tables. It struck me that - every time through >> the loop the tables were sorted and then joined and that it would be >> much more efficient if the tables remained in a sorted state and could >> avoid being re-sorted each time through the loop. Am I missing >> something here? I am using PG 8.4 if that matters. > > I'm not sure what you mean by "watching" but maybe this is a simple as > changing your CTE to use "UNION ALL" instead of "UNION [DISTINCT]"? In fact, I'm using UNION ALL. > If you really think it could be improved upon maybe you can help and provide > a minimal self-contained example query and data that exhibits the behavior > you describe so others can see it and test changes? It would be nice to > know if other versions than one that is basically no longer supported > exhibits the same behavior. Pretty much any CTE that looks like this: with cte AS ( select stuff from A UNION ALL select more_stuff from B, cte WHERE ) SELECT * FROM cte; *and* where the planner chooses to join B and cte by sorting and doing a merge join. I'll see if I can come up with a self-contained example. -- Jon -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sepgsql: label regression test failed
Sorry, I've forgotten the report. The test fails on label test come from specification change in the mcs policy. Previously, it was applied to all the domains including unconfined_t, but now, it became to be applied on the domain with "mcsconstrained" attribute. This regression test run sepgsql_seton() on the system "unconfined_t" domain, and see the behavior when process intended to move wider or narrower ranged categories, so it was affected by system policy change, even though it is our intention of sepgsql. The attached patch adds "mcsconstrained" attribute on the domain for this regression test, if this attribute exists. So, it will work on both of F20 and older system. Regarding to the regression test on ddl and alter, this change looks to me hook invocation around recomputeNamespacePath() were gone, because the schema already allowed to search was already checked. Is the behavior around recomputeNamespacePath() recently updated? At least, it is not a matter since {search} permission towards "regtest_schema_2" is checked in this test scenario. Thanks, 2014-05-14 13:33 GMT+09:00 Sergey Muraviov : > Hi. > > Some regression tests for sepgsql still not work on Fedora 20: > > == running regression test queries== > test label... FAILED > test dml ... ok > test ddl ... FAILED > test alter... FAILED > test misc ... ok > > == > 3 of 5 tests failed. > == > > $ sestatus > SELinux status: enabled > SELinuxfs mount:/sys/fs/selinux > SELinux root directory: /etc/selinux > Loaded policy name: targeted > Current mode: enforcing > Mode from config file: enforcing > Policy MLS status: enabled > Policy deny_unknown status: allowed > Max kernel policy version: 29 > > $ uname -i -o -r > 3.14.3-200.fc20.x86_64 x86_64 GNU/Linux > > $ /usr/local/pgsql/bin/postgres --version > postgres (PostgreSQL) 9.4beta1 > > PS > I've got this compiler warning: > relation.c: In function ‘sepgsql_relation_drop’: > relation.c:472:25: warning: ‘tclass’ may be used uninitialized in this > function [-Wmaybe-uninitialized] > sepgsql_avc_check_perms(&object, > ^ > > > 2013-12-25 0:34 GMT+04:00 Kohei KaiGai : > >> Hello, >> >> It seems to me changes in the base security policy on Fedora affected to >> the regression test. Our test cases for sepgsql_setcon() utilizes the MCS >> rules, that prevents domain transition from narrow categories to wider >> ones, >> to control the success cases and failure cases. >> >> However, its coverage was changed. It was applied all the domains in the >> system, thus "unconfined_t" domain had been enforced by MCS rules. >> But now, it shall be applied only domains with "mcs_constrained_type" >> attribute. >> >> [kaigai@vmlinux tmp]$ diff -up old/policy/mcs new/policy/mcs >> : >> >> : >> mlsconstrain process { transition dyntransition } >> - (( h1 dom h2 ) or ( t1 == mcssetcats )); >> + (( h1 dom h2 ) or ( t1 != mcs_constrained_type )); >> >> Probably, we need to define a domain by ourselves for regression test to >> ensure >> the test stability, not using the system "unconfined" domain that has >> different >> meaning by release. >> >> I'll make a patch. Please wait for a while. >> >> Thanks for your test & reports. >> >> 2013/12/18 Sergey Muraviov : >> > # semodule -l | grep sepgslq >> > sepgsql-regtest 1.07 >> > >> > Full list of modules is in attachment. >> > >> > >> > 2013/12/18 Kohei KaiGai >> >> >> >> Could you show me semodule -l on your environment? >> >> I believe security policy has not been changed between F19 and F20... >> >> >> >> Thanks, >> >> >> >> 2013/12/18 Sergey Muraviov : >> >> > Hi >> >> > >> >> > I've tried to test postgres 9.3.2 and 9.4devel with selinux on Fedora >> >> > 20 >> >> > and >> >> > met with a label regression test failure. >> >> > >> >> > PS >> >> > I've got some warning during build process. >> >> > >> >> > -- >> >> > Best regards, >> >> > Sergey Muraviov >> >> > >> >> > >> >> > -- >> >> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> >> > To make changes to your subscription: >> >> > http://www.postgresql.org/mailpref/pgsql-hackers >> >> > >> >> >> >> >> >> >> >> -- >> >> KaiGai Kohei >> > >> > >> > >> > >> > -- >> > Best regards, >> > Sergey Muraviov >> >> >> >> -- >> KaiGai Kohei > > > > > -- > Best regards, > Sergey Muraviov -- KaiGai Kohei sepgsql-fixup-regtest-policy.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
Andres Freund wrote: > On 2014-05-15 17:37:14 -0400, Alvaro Herrera wrote: > > Andres Freund wrote: > > > On 2014-05-15 15:40:06 -0400, Robert Haas wrote: > > > > On Thu, May 15, 2014 at 2:34 PM, Bruce Momjian wrote: > > > > > > If the larger clog size is a show-stopper (and I'm not sure I have an > > > > intelligent opinion on that just yet), one way to get around the > > > > problem would be to summarize CLOG entries after-the-fact. Once an > > > > XID precedes the xmin of every snapshot, we don't need to know the > > > > commit LSN any more. So we could read the old pg_clog files and write > > > > new summary files. Since we don't need to care about subcommitted > > > > transactions either, we could get by with just 1 bit per transaction, > > > > 1 = committed, 0 = aborted. Once we've written and fsync'd the > > > > summary files, we could throw away the original files. That might > > > > leave us with a smaller pg_clog than what we have today. > > > > > > I think the easiest way for now would be to have pg_clog with the same > > > format as today and a rangewise much smaller pg_csn storing the lsns > > > that are needed. That'll leave us with pg_upgrade'ability without > > > needing to rewrite pg_clog during the upgrade. > > > > Err, we're proposing a patch to add timestamps to each commit, > > http://www.postgresql.org/message-id/2013101600.ge4...@eldon.alvh.no-ip.org > > which does so in precisely this way. > > I am not sure where my statements above conflict with committs? I didn't say it did ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_recvlogical tests
On 2014-05-15 18:52:45 -0300, Euler Taveira wrote: > On 15-05-2014 18:09, Andres Freund wrote: > > I have some preliminary tests for the pg_recvlogical binary using the > > infrastructure Peter added. I am wondering if somebody has a good idea > > about how to make the tests more meaningful. Currently all that's tested > > are simple commands. Not the main functionality namely the actual > > streaming of changes. > > > Could you post your preliminary patch? It's not particularly interesting yet, but attached. > > I wonder if somebody has a great idea for cancelling pg_recvlogical > > after a while. Right now my best idea is to add a new > > --stop-after-messages parameter. Does anybody have a better idea? > > > pgbench uses two stop conditions: (i) number of transactions and (ii) > time. These could be applied to pg_recvlogical too. > > --transactions=NUM > --time=NUM Sounds too complicated for what I need it for. The former isn't easily implementable because pg_recvlogical doesn't know about transactions and the latter isn't that interesting for the tests because it'll have timing issues... > > If that's the way, would somebody object to tests and the parameter > > being added now? The potential harm seems pretty low and the additional > > tests would cover the walsender interface that's not covered by any > > tests right now... > > > beta is not the right time to add even a *minor* functionality. Let's do > it for 9.5. I am not interested in the feature iself at all. I am interested into adding more tests for new functionality that's currently not tested in an automated fassion. And I do think beta isn't a bad time for that. We didn't *have* the infrastructure for the tests of binaries until the end of the last CF, so I couldn't have added it during development. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services 030_pg_recvlogical.pl Description: Perl program -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_recvlogical tests
On 15-05-2014 18:09, Andres Freund wrote: > I have some preliminary tests for the pg_recvlogical binary using the > infrastructure Peter added. I am wondering if somebody has a good idea > about how to make the tests more meaningful. Currently all that's tested > are simple commands. Not the main functionality namely the actual > streaming of changes. > Could you post your preliminary patch? > I wonder if somebody has a great idea for cancelling pg_recvlogical > after a while. Right now my best idea is to add a new > --stop-after-messages parameter. Does anybody have a better idea? > pgbench uses two stop conditions: (i) number of transactions and (ii) time. These could be applied to pg_recvlogical too. --transactions=NUM --time=NUM > If that's the way, would somebody object to tests and the parameter > being added now? The potential harm seems pretty low and the additional > tests would cover the walsender interface that's not covered by any > tests right now... > beta is not the right time to add even a *minor* functionality. Let's do it for 9.5. -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CTE that result in repeated sorting of the data
Jon Nelson-14 wrote > I was watching a very large recursive CTE get built today and this CTE > involves on the order of a dozen or so "loops" joining the initial > table against existing tables. It struck me that - every time through > the loop the tables were sorted and then joined and that it would be > much more efficient if the tables remained in a sorted state and could > avoid being re-sorted each time through the loop. Am I missing > something here? I am using PG 8.4 if that matters. I'm not sure what you mean by "watching" but maybe this is a simple as changing your CTE to use "UNION ALL" instead of "UNION [DISTINCT]"? If you really think it could be improved upon maybe you can help and provide a minimal self-contained example query and data that exhibits the behavior you describe so others can see it and test changes? It would be nice to know if other versions than one that is basically no longer supported exhibits the same behavior. Or maybe someone more familiar with the implementation of recursive CTE will chime in - my knowledge in this area is fairly limited. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/CTE-that-result-in-repeated-sorting-of-the-data-tp5804136p5804140.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..
On Wed, May 14, 2014 at 8:26 AM, Robert Haas wrote: > On Sun, May 11, 2014 at 12:47 PM, Tom Lane wrote: > > Simon Riggs writes: > >> On 11 May 2014 11:18, Andres Freund wrote: > >>> I don't know. I'd find UPDATE/DELETE ORDER BY something rather > >>> useful. > > > >> Perhaps if an index exists to provide an ordering that makes it clear > >> what this means, then yes. > > > > The $64 question is whether we'd accept an implementation that fails > > if the target table has children (ie, is partitioned). > > I'd say "no". Partitioning is important, and we need to make it more > seamless and better-integrated, not add new warts. > I think the importance of partitioning argues the other way on this issue. Where I most wanted a LIMIT clause on DELETE is where I was moving tuples from one partition to a different one in a transactional way using bite-size chunks that wouldn't choke the live system with locks or with IO. So the DELETE was always running against either a child by name, or against ONLY parent, not against the whole inheritance tree. Not being able to do this on single partitions makes partitioning harder, not easier. Sure, I can select the nth smallest ctid and then "WITH T AS (DELETE FROM ONLY foo WHERE ctid < :that RETURNING *) INSERT INTO bar SELECT * from T", but how annoying. > > That seems > > to me to not be up to the project's usual quality expectations, but > > maybe if there's enough demand for a partial solution we should do so. > > I like this feature, but if I were searching for places where it makes > sense to loosen our project's usual quality expectations, this isn't > where I'd start. > In this case I'd much rather have half a loaf rather than none at all. We wouldn't be adding warts to partitioning, but removing warts from the simpler case. Cheers, Jeff
Re: [HACKERS] Proposal for CSN based snapshots
On 2014-05-15 17:37:14 -0400, Alvaro Herrera wrote: > Andres Freund wrote: > > On 2014-05-15 15:40:06 -0400, Robert Haas wrote: > > > On Thu, May 15, 2014 at 2:34 PM, Bruce Momjian wrote: > > > > If the larger clog size is a show-stopper (and I'm not sure I have an > > > intelligent opinion on that just yet), one way to get around the > > > problem would be to summarize CLOG entries after-the-fact. Once an > > > XID precedes the xmin of every snapshot, we don't need to know the > > > commit LSN any more. So we could read the old pg_clog files and write > > > new summary files. Since we don't need to care about subcommitted > > > transactions either, we could get by with just 1 bit per transaction, > > > 1 = committed, 0 = aborted. Once we've written and fsync'd the > > > summary files, we could throw away the original files. That might > > > leave us with a smaller pg_clog than what we have today. > > > > I think the easiest way for now would be to have pg_clog with the same > > format as today and a rangewise much smaller pg_csn storing the lsns > > that are needed. That'll leave us with pg_upgrade'ability without > > needing to rewrite pg_clog during the upgrade. > > Err, we're proposing a patch to add timestamps to each commit, > http://www.postgresql.org/message-id/2013101600.ge4...@eldon.alvh.no-ip.org > which does so in precisely this way. I am not sure where my statements above conflict with committs? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
Andres Freund wrote: > On 2014-05-15 15:40:06 -0400, Robert Haas wrote: > > On Thu, May 15, 2014 at 2:34 PM, Bruce Momjian wrote: > > If the larger clog size is a show-stopper (and I'm not sure I have an > > intelligent opinion on that just yet), one way to get around the > > problem would be to summarize CLOG entries after-the-fact. Once an > > XID precedes the xmin of every snapshot, we don't need to know the > > commit LSN any more. So we could read the old pg_clog files and write > > new summary files. Since we don't need to care about subcommitted > > transactions either, we could get by with just 1 bit per transaction, > > 1 = committed, 0 = aborted. Once we've written and fsync'd the > > summary files, we could throw away the original files. That might > > leave us with a smaller pg_clog than what we have today. > > I think the easiest way for now would be to have pg_clog with the same > format as today and a rangewise much smaller pg_csn storing the lsns > that are needed. That'll leave us with pg_upgrade'ability without > needing to rewrite pg_clog during the upgrade. Err, we're proposing a patch to add timestamps to each commit, http://www.postgresql.org/message-id/2013101600.ge4...@eldon.alvh.no-ip.org which does so in precisely this way. The idea that pg_csn or pg_committs can be truncated much earlier than pg_clog has its merit, no doubt. If we can make sure that the atomicity is sane, +1 from me. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] CTE that result in repeated sorting of the data
I was watching a very large recursive CTE get built today and this CTE involves on the order of a dozen or so "loops" joining the initial table against existing tables. It struck me that - every time through the loop the tables were sorted and then joined and that it would be much more efficient if the tables remained in a sorted state and could avoid being re-sorted each time through the loop. Am I missing something here? I am using PG 8.4 if that matters. -- Jon -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_recvlogical tests
Hi, I have some preliminary tests for the pg_recvlogical binary using the infrastructure Peter added. I am wondering if somebody has a good idea about how to make the tests more meaningful. Currently all that's tested are simple commands. Not the main functionality namely the actual streaming of changes. I wonder if somebody has a great idea for cancelling pg_recvlogical after a while. Right now my best idea is to add a new --stop-after-messages parameter. Does anybody have a better idea? If that's the way, would somebody object to tests and the parameter being added now? The potential harm seems pretty low and the additional tests would cover the walsender interface that's not covered by any tests right now... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication woes
On 2014-05-15 22:30:53 +0300, Heikki Linnakangas wrote: > >Attached patch fixes things, but I want to add some regression tests > >before commit. > > Looks good to me. Attached are two patches. One for the unitialized dbId/tsId issue; one for the decoding bug. The former should be backpatched. Should you wonder about the slight reordering of the assignments in RecordTransactionCommitPrepared() - I've made it more similar to RecordTransactionCommit() to make it easier to see they are equivalent. Thanks for the testing! Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services >From 07d0f4330f7c25bcec9e356527b0dc86372d2886 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Thu, 15 May 2014 22:23:12 +0200 Subject: [PATCH 1/2] Initialize all members of xl_xact_commit during prepared transaction commits. Commit dd428c79 added dbId and tsId to the xl_xact_commit struct but missed that prepared transaction commits reuse that struct. Fix that. Because those fields were used WAL logged unitialized a hot standby node could miss relcache init file invalidations leading to errors like ERROR: could not open file "...": No such file or directory on the standby. A restart of the database is sufficient to fix the problem. As problems can only be triggered when a system table/index has been rewritten in a transaction using two phase commit the problem is unlikely to have affected many installations. Found while investigating a logical decoding bugreport from Heikki. Backpatch to 9.0 where the bug was introduced. --- src/backend/access/transam/twophase.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index 2cefa08..d5409a6 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -2082,9 +2082,13 @@ RecordTransactionCommitPrepared(TransactionId xid, /* Emit the XLOG commit record */ xlrec.xid = xid; - xlrec.crec.xact_time = GetCurrentTimestamp(); + xlrec.crec.xinfo = initfileinval ? XACT_COMPLETION_UPDATE_RELCACHE_FILE : 0; - xlrec.crec.nmsgs = 0; + + xlrec.crec.dbId = MyDatabaseId; + xlrec.crec.tsId = MyDatabaseTableSpace; + + xlrec.crec.xact_time = GetCurrentTimestamp(); xlrec.crec.nrels = nrels; xlrec.crec.nsubxacts = nchildren; xlrec.crec.nmsgs = ninvalmsgs; -- 2.0.0.rc2.4.g1dc51c6.dirty >From de22d8e5308b1c4c509d8566317d9aea0956c2bc Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Thu, 15 May 2014 22:42:27 +0200 Subject: [PATCH 2/2] Fix typo-induced bug in decoding of prepared transactions. The decoding of prepared transaction commits accidentally used the xid of the transaction performing the COMMIT PREPARED. Before bb38fb0d43c8d that lead to those transactions not being decoded, afterwards to a assertion failure. Add tests for the decoding of prepared transactions. Bug found and fixed by Heikki Linnakangas; new regression test by Andres Freund. --- contrib/test_decoding/Makefile | 2 +- contrib/test_decoding/expected/ddl.out | 6 +-- contrib/test_decoding/expected/prepared.out | 82 + contrib/test_decoding/sql/ddl.sql | 2 +- contrib/test_decoding/sql/prepared.sql | 50 ++ src/backend/replication/logical/decode.c| 2 +- 6 files changed, 138 insertions(+), 6 deletions(-) create mode 100644 contrib/test_decoding/expected/prepared.out create mode 100644 contrib/test_decoding/sql/prepared.sql diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile index 685986c..58e0f38 100644 --- a/contrib/test_decoding/Makefile +++ b/contrib/test_decoding/Makefile @@ -37,7 +37,7 @@ submake-isolation: submake-test_decoding: $(MAKE) -C $(top_builddir)/contrib/test_decoding -REGRESSCHECKS=ddl rewrite toast permissions decoding_in_xact binary +REGRESSCHECKS=ddl rewrite toast permissions decoding_in_xact binary prepared regresscheck: all | submake-regress submake-test_decoding $(MKDIR_P) regression_output diff --git a/contrib/test_decoding/expected/ddl.out b/contrib/test_decoding/expected/ddl.out index 05a4bd3..37ff8b7 100644 --- a/contrib/test_decoding/expected/ddl.out +++ b/contrib/test_decoding/expected/ddl.out @@ -640,8 +640,8 @@ SELECT pg_drop_replication_slot('regression_slot'); (1 row) /* check that we aren't visible anymore now */ -SELECT * FROM pg_stat_replication; - pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state --+--+-+--+-+-+-+---+--+---+---+++-+---+ +SELECT * FROM pg_replic
Re: [HACKERS] buildfarm animals and 'snapshot too old'
On 05/15/2014 04:30 PM, Stefan Kaltenbrunner wrote: On 05/15/2014 07:46 PM, Andrew Dunstan wrote: On 05/15/2014 12:43 PM, Tomas Vondra wrote: Hi all, today I got a few of errors like these (this one is from last week, though): Status Line: 493 snapshot too old: Wed May 7 04:36:57 2014 GMT Content: snapshot to old: Wed May 7 04:36:57 2014 GMT on the new buildfarm animals. I believe it was my mistake (incorrectly configured local git mirror), but it got me thinking about how this will behave with the animals running CLOBBER_CACHE_RECURSIVELY. If I understand the Perl code correctly, it does this: (1) update the repository (2) run the tests (3) check that the snapshot is not older than 24 hours (pgstatus.pl:188) (4) fail if older Now, imagine that the test runs for days/weeks. This pretty much means it's wasted, because the results will be thrown away anyway, no? The 24 hours runs from the time of the latest commit on the branch in question, not the current time, but basically yes. We've never had machines with runs that long. The longest in recent times has been friarbird, which runs CLOBBER_CACHE_ALWAYS and takes around 4.5 hours. But we have had misconfigured machines reporting unbelievable snapshot times. I'll take a look and see if we can tighten up the sanity check. It's worth noting that one thing friarbird does is skip the install-check stage - it's almost certainly not going to have terribly much interesting to tell us from that, given it has already run a plain "make check". well I'm not sure about about "misconfigured" but both my personal buildfarm members and pginfra run ones (like gaibasaurus) got errors complaining about "snapshot too old" in the past for long running tests so I'm not sure it is really a "we never had machine with runs that long". So maybe we should not reject those submissions at submission time but rather mark them clearly on the dashboard and leave the final interpretation to a human... That's a LOT harder and more work to arrange. Frankly, there are more important things to do. I would like to know the circumstances of these very long runs. I drive some of my VMs pretty hard on pretty modest hardware, and they don't come close to running 24 hours. The current behaviour goes back to this commit from December 2011: commit a8b5049e64f9cb08f8e165d0737139dab74e3bce Author: Andrew Dunstan Date: Wed Dec 14 14:38:44 2011 -0800 Use git snapshot instead of fixed 10 day timeout. The sanity checks made sure that an animal wasn't submitting a snapshot that was too old. But sometimes an old branch doesn't get any changes for more than 10 days. So accept a snapshot that is not more than 1 day older than the last known snapshot. Per complaint from Stefan. I'm prepared to increase the sanity check time if there is a serious demand for it, but I'd like to know what to increase it to. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] buildfarm animals and 'snapshot too old'
On 05/15/2014 07:46 PM, Andrew Dunstan wrote: > > On 05/15/2014 12:43 PM, Tomas Vondra wrote: >> Hi all, >> >> today I got a few of errors like these (this one is from last week, >> though): >> >> Status Line: 493 snapshot too old: Wed May 7 04:36:57 2014 GMT >> Content: >> snapshot to old: Wed May 7 04:36:57 2014 GMT >> >> on the new buildfarm animals. I believe it was my mistake (incorrectly >> configured local git mirror), but it got me thinking about how this will >> behave with the animals running CLOBBER_CACHE_RECURSIVELY. >> >> If I understand the Perl code correctly, it does this: >> >> (1) update the repository >> (2) run the tests >> (3) check that the snapshot is not older than 24 hours (pgstatus.pl:188) >> (4) fail if older >> >> Now, imagine that the test runs for days/weeks. This pretty much means >> it's wasted, because the results will be thrown away anyway, no? >> > > > The 24 hours runs from the time of the latest commit on the branch in > question, not the current time, but basically yes. > > We've never had machines with runs that long. The longest in recent > times has been friarbird, which runs CLOBBER_CACHE_ALWAYS and takes > around 4.5 hours. But we have had misconfigured machines reporting > unbelievable snapshot times. I'll take a look and see if we can tighten > up the sanity check. It's worth noting that one thing friarbird does is > skip the install-check stage - it's almost certainly not going to have > terribly much interesting to tell us from that, given it has already run > a plain "make check". well I'm not sure about about "misconfigured" but both my personal buildfarm members and pginfra run ones (like gaibasaurus) got errors complaining about "snapshot too old" in the past for long running tests so I'm not sure it is really a "we never had machine with runs that long". So maybe we should not reject those submissions at submission time but rather mark them clearly on the dashboard and leave the final interpretation to a human... Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
On 2014-05-15 16:13:49 -0400, Bruce Momjian wrote: > On Thu, May 15, 2014 at 10:06:32PM +0200, Andres Freund wrote: > > > If the larger clog size is a show-stopper (and I'm not sure I have an > > > intelligent opinion on that just yet), one way to get around the > > > problem would be to summarize CLOG entries after-the-fact. Once an > > > XID precedes the xmin of every snapshot, we don't need to know the > > > commit LSN any more. So we could read the old pg_clog files and write > > > new summary files. Since we don't need to care about subcommitted > > > transactions either, we could get by with just 1 bit per transaction, > > > 1 = committed, 0 = aborted. Once we've written and fsync'd the > > > summary files, we could throw away the original files. That might > > > leave us with a smaller pg_clog than what we have today. > > > > I think the easiest way for now would be to have pg_clog with the same > > format as today and a rangewise much smaller pg_csn storing the lsns > > that are needed. That'll leave us with pg_upgrade'ability without > > needing to rewrite pg_clog during the upgrade. > > Yes, I like the idea of storing the CSN separately. One reason the > 2-bit clog is so good is that we know we have atomic 1-byte writes on > all platforms. I don't think we rely on that anywhere. And in fact we don't have the ability to do so for arbitrary bytes - lots of platforms can do that only on specifically aligned bytes. We rely on being able to atomically (as in either before/after no torn value) write/read TransactionIds, but that's it I think? > Can we assume atomic 64-bit writes? Not on 32bit platforms. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] buildfarm animals and 'snapshot too old'
Andrew Dunstan writes: > Incidentally, should the CLOBBER_CACHE_ALWAYS machines also be defining > CLOBBER_FREED_MEMORY? The former does need the latter or it's not very thorough. However, CLOBBER_FREED_MEMORY is defined automatically by --enable-cassert, so you shouldn't need to use a -D switch for it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
On Thu, May 15, 2014 at 10:06:32PM +0200, Andres Freund wrote: > > If the larger clog size is a show-stopper (and I'm not sure I have an > > intelligent opinion on that just yet), one way to get around the > > problem would be to summarize CLOG entries after-the-fact. Once an > > XID precedes the xmin of every snapshot, we don't need to know the > > commit LSN any more. So we could read the old pg_clog files and write > > new summary files. Since we don't need to care about subcommitted > > transactions either, we could get by with just 1 bit per transaction, > > 1 = committed, 0 = aborted. Once we've written and fsync'd the > > summary files, we could throw away the original files. That might > > leave us with a smaller pg_clog than what we have today. > > I think the easiest way for now would be to have pg_clog with the same > format as today and a rangewise much smaller pg_csn storing the lsns > that are needed. That'll leave us with pg_upgrade'ability without > needing to rewrite pg_clog during the upgrade. Yes, I like the idea of storing the CSN separately. One reason the 2-bit clog is so good is that we know we have atomic 1-byte writes on all platforms. Can we assume atomic 64-bit writes? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] buildfarm animals and 'snapshot too old'
On 05/15/2014 03:57 PM, Tomas Vondra wrote: How long does a CLOBBER_CACHE_RECURSIVELY run take? days or weeks seems kinda nuts. I don't know. According to this comment from cache/inval.c, it's expected to be way slower (~100x) compared to CLOBBER_CACHE_ALWAYS. /* * Test code to force cache flushes anytime a flush could happen. * * If used with CLOBBER_FREED_MEMORY, CLOBBER_CACHE_ALWAYS provides a * fairly thorough test that the system contains no cache-flush hazards. * However, it also makes the system unbelievably slow --- the regression * tests take about 100 times longer than normal. * * If you're a glutton for punishment, try CLOBBER_CACHE_RECURSIVELY. This * slows things by at least a factor of 1, so I wouldn't suggest * trying to run the entire regression tests that way.It's useful to try * a few simple tests, to make sure that cache reload isn't subject to * internal cache-flush hazards, but after you've done a few thousand * recursive reloads it's unlikely you'll learn more. */ Yes, I've seen that. Frankly, a test that takes something like 500 hours is a bit crazy. If we really want to run this in the buildfarm we should probably try to create a massively cut down test schedule for use in this case. Incidentally, should the CLOBBER_CACHE_ALWAYS machines also be defining CLOBBER_FREED_MEMORY? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
On 2014-05-15 15:40:06 -0400, Robert Haas wrote: > On Thu, May 15, 2014 at 2:34 PM, Bruce Momjian wrote: > > On Mon, May 12, 2014 at 06:01:59PM +0300, Heikki Linnakangas wrote: > >> >Some of the stuff in here will be influence whether your freezing > >> >replacement patch gets in. Do you plan to further pursue that one? > >> > >> Not sure. I got to the point where it seemed to work, but I got a > >> bit of a cold feet proceeding with it. I used the page header's LSN > >> field to define the "epoch" of the page, but I started to feel > >> uneasy about it. I would be much more comfortable with an extra > >> field in the page header, even though that uses more disk space. And > >> requires dealing with pg_upgrade. > > > > FYI, pg_upgrade copies pg_clog from the old cluster, so there will be a > > pg_upgrade issue anyway. > > > > I am not excited about a 32x increase in clog size, especially since we > > already do freezing at 200M transactions to allow for more aggressive > > clog trimming. Extrapolating that out, it means we would freeze every > > 6.25M transactions. The default setting imo is far too low for a database of any relevant activity. If I had the stomach for the fight around it I'd suggest increasing it significantly by default. People with small databases won't be hurt significantly because they simply don't have that many transactions and autovacuum will get around to cleanup long before normally. > It seems better to allow clog to grow larger than to force > more-frequent freezing. Yes. > If the larger clog size is a show-stopper (and I'm not sure I have an > intelligent opinion on that just yet), one way to get around the > problem would be to summarize CLOG entries after-the-fact. Once an > XID precedes the xmin of every snapshot, we don't need to know the > commit LSN any more. So we could read the old pg_clog files and write > new summary files. Since we don't need to care about subcommitted > transactions either, we could get by with just 1 bit per transaction, > 1 = committed, 0 = aborted. Once we've written and fsync'd the > summary files, we could throw away the original files. That might > leave us with a smaller pg_clog than what we have today. I think the easiest way for now would be to have pg_clog with the same format as today and a rangewise much smaller pg_csn storing the lsns that are needed. That'll leave us with pg_upgrade'ability without needing to rewrite pg_clog during the upgrade. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication woes
On 2014-05-15 22:30:53 +0300, Heikki Linnakangas wrote: > On 05/15/2014 08:46 PM, Andres Freund wrote: > >On 2014-05-15 20:07:23 +0300, Heikki Linnakangas wrote: > >How very wierd. The reason for this is that > >RecordTransactionCommitPrepared() forgets to fill a couple of fields in > >xl_xact_commit. Any reason dbId/tsId aren't filled? They aren't strictly > >needed because afaics they're only looked at for relcache invalidations > >which prepared xacts don't support, but still? > > Seems like an oversight in commit dd428c79, which added the fields to > xl_xact_commit. They are needed. A prepared xact can indeed cause relcache > invalidations, and removal of the init file. For example: Hm, so that part has to be backpatched to 9.0. Ick, I wonder if that's been hit in production. Seems like it could cause pretty random looking errors. It's easy enough to cause errors like: ERROR: could not open file "base/12753/12613": No such file or directory I guess not many people will do relevant stuff in prepared xacts tho. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] buildfarm animals and 'snapshot too old'
On 15 Květen 2014, 19:46, Andrew Dunstan wrote: > > On 05/15/2014 12:43 PM, Tomas Vondra wrote: >> Hi all, >> >> today I got a few of errors like these (this one is from last week, >> though): >> >> Status Line: 493 snapshot too old: Wed May 7 04:36:57 2014 GMT >> Content: >> snapshot to old: Wed May 7 04:36:57 2014 GMT >> >> on the new buildfarm animals. I believe it was my mistake (incorrectly >> configured local git mirror), but it got me thinking about how this will >> behave with the animals running CLOBBER_CACHE_RECURSIVELY. >> >> If I understand the Perl code correctly, it does this: >> >> (1) update the repository >> (2) run the tests >> (3) check that the snapshot is not older than 24 hours (pgstatus.pl:188) >> (4) fail if older >> >> Now, imagine that the test runs for days/weeks. This pretty much means >> it's wasted, because the results will be thrown away anyway, no? >> > > > The 24 hours runs from the time of the latest commit on the branch in > question, not the current time, but basically yes. > > We've never had machines with runs that long. The longest in recent > times has been friarbird, which runs CLOBBER_CACHE_ALWAYS and takes > around 4.5 hours. But we have had misconfigured machines reporting > unbelievable snapshot times. I'll take a look and see if we can tighten > up the sanity check. It's worth noting that one thing friarbird does is > skip the install-check stage - it's almost certainly not going to have > terribly much interesting to tell us from that, given it has already run > a plain "make check". > > How long does a CLOBBER_CACHE_RECURSIVELY run take? days or weeks seems > kinda nuts. I don't know. According to this comment from cache/inval.c, it's expected to be way slower (~100x) compared to CLOBBER_CACHE_ALWAYS. /* * Test code to force cache flushes anytime a flush could happen. * * If used with CLOBBER_FREED_MEMORY, CLOBBER_CACHE_ALWAYS provides a * fairly thorough test that the system contains no cache-flush hazards. * However, it also makes the system unbelievably slow --- the regression * tests take about 100 times longer than normal. * * If you're a glutton for punishment, try CLOBBER_CACHE_RECURSIVELY. This * slows things by at least a factor of 1, so I wouldn't suggest * trying to run the entire regression tests that way.It's useful to try * a few simple tests, to make sure that cache reload isn't subject to * internal cache-flush hazards, but after you've done a few thousand * recursive reloads it's unlikely you'll learn more. */ regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSOC13 proposal - extend RETURNING syntax
Andres Freund anarazel.de> writes: > > Hi, > > Some comments about the patch: > * Coding Style: > * multiline comments have both /* and */ on their own lines. > * I think several places indent by two tabs. > * Spaces around operators > * ... > * Many of the new comments would enjoy a bit TLC by a native speaker. > > * The way RTE_ALIAS creeps in many place where it doesn't seem to belong > seems to indicate that the design isn't yet ready. I share Robert's > suspicion that this would be better solved by referring to a special > range table entry. > > Based on the lack of activity around this and the fact that this needs a > *significant* chunk of work before being committable, I am going to mark > this as returned with feedback. I'm actively working towards converting our software at work to use Pg instead of SQL Server and before we switch we'll need this feature to be merged. I'll do what I can to get the verbage and style whipped into shape, though I doubt I can do much with the actual code. Hopefully I can get something in soon. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum scheduling starvation and frenzy
Jeff Janes wrote: > If you have a database with a large table in it that has just passed > autovacuum_freeze_max_age, all future workers will be funnelled into that > database until the wrap-around completes. But only one of those workers > can actually vacuum the one table which is holding back the frozenxid. > Maybe the 2nd worker to come along will find other useful work to do, but > eventually all the vacuuming that needs doing is already in progress, and > so each worker starts up, gets directed to this database, finds it can't > help, and exits. So all other databases are entirely starved of > autovacuuming for the entire duration of the wrap-around vacuuming of this > one large table. Bah. Of course :-( Note that if you have two databases in danger of wraparound, the oldest will always be chosen until it's no longer in danger. Ignoring the second one past freeze_max_age seems bad also. This code is in autovacuum.c, do_start_worker(). Not sure what does your proposal look like in terms of code. I think that instead of trying to get a single target database in that foreach loop, we could try to build a prioritized list (in-wraparound-danger first, then in-multixid-wraparound danger, then the one with the oldest autovac time of all the ones that remain); then recheck the wrap-around condition by seeing whether there are other workers in that database that started after the wraparound condition appeared. If there are, move down the list. The first in the list not skipped is chosen for vacuuming. (Do we need to consider the situation that all databases were skipped by the above logic, and if so then perhaps pick up the first DB in the list?) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
On Thu, May 15, 2014 at 2:34 PM, Bruce Momjian wrote: > On Mon, May 12, 2014 at 06:01:59PM +0300, Heikki Linnakangas wrote: >> >Some of the stuff in here will be influence whether your freezing >> >replacement patch gets in. Do you plan to further pursue that one? >> >> Not sure. I got to the point where it seemed to work, but I got a >> bit of a cold feet proceeding with it. I used the page header's LSN >> field to define the "epoch" of the page, but I started to feel >> uneasy about it. I would be much more comfortable with an extra >> field in the page header, even though that uses more disk space. And >> requires dealing with pg_upgrade. > > FYI, pg_upgrade copies pg_clog from the old cluster, so there will be a > pg_upgrade issue anyway. > > I am not excited about a 32x increase in clog size, especially since we > already do freezing at 200M transactions to allow for more aggressive > clog trimming. Extrapolating that out, it means we would freeze every > 6.25M transactions. It seems better to allow clog to grow larger than to force more-frequent freezing. If the larger clog size is a show-stopper (and I'm not sure I have an intelligent opinion on that just yet), one way to get around the problem would be to summarize CLOG entries after-the-fact. Once an XID precedes the xmin of every snapshot, we don't need to know the commit LSN any more. So we could read the old pg_clog files and write new summary files. Since we don't need to care about subcommitted transactions either, we could get by with just 1 bit per transaction, 1 = committed, 0 = aborted. Once we've written and fsync'd the summary files, we could throw away the original files. That might leave us with a smaller pg_clog than what we have today. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication woes
On 05/15/2014 08:46 PM, Andres Freund wrote: On 2014-05-15 20:07:23 +0300, Heikki Linnakangas wrote: Ok, so the immediate cause was quick to find: when decoding a commit-prepared WAL record, we have to use the XID from the record content (patch attached). The XID in the record header is the XID of the transaction doing the COMMIT PREPARED, which is always 0 after patch bb38fb0d43c8d7ff54072bfd8bd63154e536b384 which causes the assertion. But it was always wrong. After fixing, it no longer asserts or gives the above "could not map filenode" error. However, it still doesn't seem right. When I do the above as a regular transaction, ie: begin; insert into foo values (6); alter table foo alter column id type text; commit; pg_recvlogical prints this: BEGIN 708 table public.foo: INSERT: id[text]:'6' COMMIT 708 But if I do it as a prepared transaction: begin; insert into foo values (7); alter table foo alter column id type text; prepare transaction 'foo'; commit prepared 'foo'; How very wierd. The reason for this is that RecordTransactionCommitPrepared() forgets to fill a couple of fields in xl_xact_commit. Any reason dbId/tsId aren't filled? They aren't strictly needed because afaics they're only looked at for relcache invalidations which prepared xacts don't support, but still? Seems like an oversight in commit dd428c79, which added the fields to xl_xact_commit. They are needed. A prepared xact can indeed cause relcache invalidations, and removal of the init file. For example: begin; cluster pg_opclass using pg_opclass_oid_index ; prepare transaction 'foo'; commit prepared 'foo'; Attached patch fixes things, but I want to add some regression tests before commit. Looks good to me. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New timezones used in regression tests
On Thu, May 15, 2014 at 02:47:21PM -0400, Tom Lane wrote: > Bruce Momjian writes: > > On Mon, May 12, 2014 at 07:16:48PM -0400, Tom Lane wrote: > >> I agree, that seems an entirely gratuitous choice of zone. It does > >> seem like a good idea to test a zone that has a nonintegral offset > >> from GMT, but we can get that from almost anywhere as long as we're > >> testing a pre-1900 date. There's no need to use any zones that aren't > >> long-established and unlikely to change. > > > If we want a nonintegral offset, why are we not using 'Asia/Calcutta', > > which is +5:30 from UTC? > > I believe there's already one of those tests that considers a zone like > that. No, I meant a really odd offset, like Paris' +0:09:21 before they > adopted standardized time. Wow, OK, got it. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] autovacuum scheduling starvation and frenzy
In testing 9.4 with some long running tests, I noticed that autovacuum launcher/worker sometimes goes a bit nuts. It vacuums the same database repeatedly without respect to the nap time. As far as I can tell, the behavior is the same in older versions, but I haven't tested that. This is my understanding of what is happening: If you have a database with a large table in it that has just passed autovacuum_freeze_max_age, all future workers will be funnelled into that database until the wrap-around completes. But only one of those workers can actually vacuum the one table which is holding back the frozenxid. Maybe the 2nd worker to come along will find other useful work to do, but eventually all the vacuuming that needs doing is already in progress, and so each worker starts up, gets directed to this database, finds it can't help, and exits. So all other databases are entirely starved of autovacuuming for the entire duration of the wrap-around vacuuming of this one large table. Also, the launcher decides when to launch the next worker by looking at the scheduled time of the least-recently-vacuumed database (with the implicit intention that that is the one that will get chosen to vacuum next). But since the worker gets redirected to the wrap-around database instead of the least-recently-vacuumed database, the least-recently-vacuumed database never gets it schedule updated and always looks like it is chronologically overdue. That means the launcher keeps launching new workers as fast as the previous ones exit, ignoring the nap time. So there is one long running worker actually making progress, plus a frenzy of workers all attacking the same database, finding that there is nothing they can do. I think that a database more than autovacuum_freeze_max_age should get first priority, but only if its next scheduled vacuum time is in the past. If it can beneficially use more than one vacuum worker, they would usually accumulate there naturally within a few naptimes iterations[1]. And if it can't usefully use more than one worker, don't prevent other databases from using them. [1] you could argue that all other max_workers processes could become pinned down in long running vacuums of other nonrisk databases between the time that the database crosses autovacuum_freeze_max_age (and has its first worker started), and the time its nap time expires and so it becomes eligible for a second one. But that seems like a weak argument, as it could just have easily happened that all of them got pinned down in nonrisk databases a few transactions *before* the database crosses autovacuum_freeze_max_age in the first place. Does this analysis and proposal seem sound? Cheers, Jeff
Re: Race condition between PREPARE TRANSACTION and COMMIT PREPARED (was Re: [HACKERS] Problem with txid_snapshot_in/out() functionality)
On Thu, May 15, 2014 at 10:38 AM, Andres Freund wrote: >> . async.c and namespace.c does the same, and it hasn't been a >> problem. > > Well, it doesn't seem unreasonable to have C code using > PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP around a 2pc commit > to me. That'll break with this. > Perhaps we should just finally make cancel_before_shmem_exit search the > stack of callbacks. Yes, please. And while we're at it, perhaps we should make it Trap() or fail an Assert() if it doesn't find the callback it was told to remove. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Thu, May 15, 2014 at 8:06 AM, Bruce Momjian wrote: > On Tue, May 6, 2014 at 11:15:17PM +0100, Simon Riggs wrote: >> > Well, for what it's worth, I've encountered systems where setting >> > effective_cache_size too low resulted in bad query plans, but I've >> > never encountered the reverse situation. >> >> I agree with that. >> >> Though that misses my point, which is that you can't know that all of >> that memory is truly available on a server with many concurrent users. >> Choosing settings that undercost memory intensive plans are not the >> best choice for a default strategy in a mixed workload when cache may >> be better used elsewhere, even if such settings make sense for some >> individual users. > > This is the same problem we had with auto-tuning work_mem, in that we > didn't know what other concurrent activity was happening. Seems we need > concurrent activity detection before auto-tuning work_mem and > effective_cache_size. I think it's worse than that: we don't even know what else is happening *in the same query*. For example, look at this: http://www.postgresql.org/message-id/16161.1324414...@sss.pgh.pa.us That's pretty awful, and it's just one example of a broader class of problems that we haven't even tried to solve. We really need a way to limit memory usage on a per-query basis rather than a per-node basis. For example, consider a query plan that needs to do four sorts. If work_mem = 64MB, we'll happily use 256MB total, 64MB for each sort. Now, that might cause the system to swap: since there are four sorts, maybe we ought to have used only 16MB per sort, and switched to a heap sort if that wasn't enough. But it's even subtler than that: if we had known when building the query plan that we only had 16MB per sort rather than 64MB per sort, we would potentially have estimated higher costs for those sorts in the first place, which might have led to a different plan that needed fewer sorts to begin with. When you start to try to balance memory usage across multiple backends, things get even more complicated. If the first query that starts up is allowed to use all the available memory, and we respond to that by lowering the effective value of work_mem to something very small, a second query that shows up a bit later might choose a very inefficient plan as a result. That in turn might cause heavy I/O load on the system for a long time, making the first query run very slowly. We might have been better off just letting the first query finish, and the running the second one (with a much better plan) after it was done. Or, maybe we should have only let the first query take a certain fraction (half? 10%?) of the available memory, so that there was more left for the second guy. But that could be wrong too - it might cause the first plan to be unnecessarily inefficient when nobody was planning to run any other queries anyway. Plus, DBAs hate it when plans change on them unexpectedly, so anything that involves a feedback loop between current utilization and query plans will be unpopular with some people for that reason. These are hard problems. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gettimeofday is at the end of its usefulness?
On Tue, May 13, 2014 at 06:58:11PM -0400, Tom Lane wrote: > A recent question from Tim Kane prompted me to measure the overhead > costs of EXPLAIN ANALYZE, which I'd not checked in awhile. Things > are far worse than I thought. On my current server (by no means > lavish hardware: Xeon E5-2609 @2.40GHz) a simple seqscan can run > at something like 110 nsec per row: I assume you ran pg_test_timing too: Testing timing overhead for 3 seconds. Per loop time including overhead: 41.70 nsec Histogram of timing durations: < usec % of total count 1 95.83035 68935459 2 4.169232999133 4 0.00037268 8 0.4 31 16 0.0 1 32 0.0 1 My overhead of 41.70 nsec matches yours. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New timezones used in regression tests
On Tue, May 13, 2014 at 09:55:26AM -0400, Alvaro Herrera wrote: > Christoph Berg wrote: > > > Of course, Wikipedia has something to say about this: > > http://en.wikipedia.org/wiki/Timekeeping_on_Mars > > Nice. > > > I especially like MTC, Mars Time Coordinated. But whatever scheme gets > > chosen, it won't be a standard 24h day, so PostgreSQL has a whole lot > > of different problems to solve than to "fix" that little > > Mars/Mons_Olympus gem now... :) > > Maybe a new type, mars_timestamptz()? Or perhaps the celestial body > name should be part of the typmod for standard timestamptz ...? The latter seems a good bit more extensible. Conversions among the different timestamptzs might be problematic, as we are currently assuming certain approximations about spacetime that don't actually hold when we have time zones in significantly different reference frames. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New timezones used in regression tests
Bruce Momjian writes: > On Mon, May 12, 2014 at 07:16:48PM -0400, Tom Lane wrote: >> I agree, that seems an entirely gratuitous choice of zone. It does >> seem like a good idea to test a zone that has a nonintegral offset >> from GMT, but we can get that from almost anywhere as long as we're >> testing a pre-1900 date. There's no need to use any zones that aren't >> long-established and unlikely to change. > If we want a nonintegral offset, why are we not using 'Asia/Calcutta', > which is +5:30 from UTC? I believe there's already one of those tests that considers a zone like that. No, I meant a really odd offset, like Paris' +0:09:21 before they adopted standardized time. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freezing without write I/O
On Wed, May 14, 2014 at 8:46 PM, Jeff Janes wrote: >> +1. I can't think of many things we might do that would be more >> important. > > Can anyone guess how likely this approach is to make it into 9.5? I've been > pondering some incremental improvements over what we have now, but if this > revolutionary approach has a high chance of landing then any work on > incremental improvements would be pointless. Well, Heikki was saying on another thread that he had kind of gotten cold feet about this, so I gather he's not planning to pursue it. Not sure if I understood that correctly. If so, I guess it depends on whether someone else can pick it up, but we might first want to establish why he got cold feet and how worrying those problems seem to other people. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New timezones used in regression tests
On Mon, May 12, 2014 at 07:16:48PM -0400, Tom Lane wrote: > Christoph Berg writes: > > 84df54b22e8035addc7108abd9ff6995e8c49264 introduced timestamp > > constructors. In the regression tests, various time zones are tested, > > including America/Metlakatla. Now, if you configure using > > --with-system-tzdata, you'll get an error if that zone isn't there. > > Unfortunately, this is what I'm getting now when trying to build beta1 > > on Ubuntu 10.04 (lucid) with tzdata 2010i-1: > > I agree, that seems an entirely gratuitous choice of zone. It does > seem like a good idea to test a zone that has a nonintegral offset > from GMT, but we can get that from almost anywhere as long as we're > testing a pre-1900 date. There's no need to use any zones that aren't > long-established and unlikely to change. If we want a nonintegral offset, why are we not using 'Asia/Calcutta', which is +5:30 from UTC? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
On Mon, May 12, 2014 at 06:01:59PM +0300, Heikki Linnakangas wrote: > >Some of the stuff in here will be influence whether your freezing > >replacement patch gets in. Do you plan to further pursue that one? > > Not sure. I got to the point where it seemed to work, but I got a > bit of a cold feet proceeding with it. I used the page header's LSN > field to define the "epoch" of the page, but I started to feel > uneasy about it. I would be much more comfortable with an extra > field in the page header, even though that uses more disk space. And > requires dealing with pg_upgrade. FYI, pg_upgrade copies pg_clog from the old cluster, so there will be a pg_upgrade issue anyway. I am not excited about a 32x increase in clog size, especially since we already do freezing at 200M transactions to allow for more aggressive clog trimming. Extrapolating that out, it means we would freeze every 6.25M transactions. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_dump warnings in MinGW build
Now that popen and pclose don't throw thousands of warnings when compiling mingw builds, some other warnings stand out. parallel.c: In function 'pgpipe': parallel.c:1332:2: warning: overflow in implicit constant conversion [-Woverflow] parallel.c:1386:3: warning: overflow in implicit constant conversion [-Woverflow] I think the solution is to use the pgsocket typedef from src/include/port.h, rather than int. Like attached. But I'm far from being a typedef lawyer, so maybe I am all wet. Cheers, Jeff pgpipe.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wrapping in extended mode doesn't work well with default pager
Hello 2014-05-15 15:04 GMT+02:00 Sergey Muraviov : > Hi. > Please review the new patch. > This version works perfect Regards Pavel > > PS > Issues which were described by Tom and Pavel were relevant to single-line > headers. > So I've added appropriate regression tests to the patch. > > I've also attached complex regression tests for unicode linestyle and > multibyte symbols. > > > 2014-05-14 10:55 GMT+04:00 Pavel Stehule : > > sorry >> >> there is still small issue >> >> I have a plpgsql function: >> >> CREATE OR REPLACE FUNCTION public.foo_update_trg() >> RETURNS trigger >> LANGUAGE plpgsql >> AS $function$ >> DECLARE t text; >> BEGIN >> EXECUTE format('SELECT $1.%I', TG_ARGV[0]) INTO t USING old; >> RAISE NOTICE 'original value of "%" is "%"', TG_ARGV[0], t; >> RETURN NULL; >> END; >> $function$ >> >> Default expanded view of select * from pg_proc where proname = >> 'foo_update_trg'; is little bit broken (screenshoot 1) >> >> After wrap mode, it add useless new line into source code (screenshoot 2) >> >> but border2 fixes it (screenshots 3) >> >> Regards >> >> Pavel >> >> >> >> 2014-05-14 8:32 GMT+02:00 Pavel Stehule : >> >> Hello >>> >>> With this patch it works perfect >>> >>> Thank you >>> >>> Regards >>> >>> Pavel >>> >>> >>> 2014-05-13 21:33 GMT+02:00 Sergey Muraviov >>> : >>> >>> Please check this patch. 2014-05-12 22:56 GMT+04:00 Sergey Muraviov >>> >: Hi. > > I'll try to fix it tomorrow. > > > 2014-05-12 18:42 GMT+04:00 Tom Lane : > > Greg Stark writes: >> > On Mon, May 12, 2014 at 2:12 PM, Greg Stark wrote: >> >> Hm, there was an off by one error earlier in some cases, maybe we >> >> fixed it by breaking other case. Will investigate. >> >> > Those spaces are coming from the ascii wrapping indicators. i.e. >> the periods in: >> >> Ah. I wonder whether anyone will complain that the format changed? >> >> > Apparently we used to print those with border=1 in normal mode but >> in >> > expanded mode we left out the space for those on the outermost edges >> > since there was no need for them. If we put them in for wrapped mode >> > then we'll be inconsistent if we don't for nonwrapped mode though. >> And >> > if we don't put them in for wrapped mode then there's no way to >> > indicate wrapping versus newlines. >> >> Barring anyone complaining that the format changed, I'd say the issue >> is not that you added them but that the accounting for line length >> fails to include them. >> >> regards, tom lane >> > > > > -- > Best regards, > Sergey Muraviov > -- Best regards, Sergey MuraviovH >>> >>> >> > > > -- > Best regards, > Sergey Muraviov >
Re: [HACKERS] Logical replication woes
On 2014-05-15 19:46:57 +0200, Andres Freund wrote: > Attached patch fixes things, but I want to add some regression tests > before commit. And now actually attached. Will send a patch with regression tests later tonight or tomorrow. Need to eat first... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index 2cefa08..5d8e195 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -2082,9 +2082,10 @@ RecordTransactionCommitPrepared(TransactionId xid, /* Emit the XLOG commit record */ xlrec.xid = xid; + xlrec.crec.dbId = MyDatabaseId; + xlrec.crec.tsId = MyDatabaseTableSpace; xlrec.crec.xact_time = GetCurrentTimestamp(); xlrec.crec.xinfo = initfileinval ? XACT_COMPLETION_UPDATE_RELCACHE_FILE : 0; - xlrec.crec.nmsgs = 0; xlrec.crec.nrels = nrels; xlrec.crec.nsubxacts = nchildren; xlrec.crec.nmsgs = ninvalmsgs; diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c index 06b99e7..603d083 100644 --- a/src/backend/replication/logical/decode.c +++ b/src/backend/replication/logical/decode.c @@ -225,7 +225,7 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) subxacts = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]); invals = (SharedInvalidationMessage *) &(subxacts[xlrec->nsubxacts]); -DecodeCommit(ctx, buf, r->xl_xid, xlrec->dbId, +DecodeCommit(ctx, buf, prec->xid, xlrec->dbId, xlrec->xact_time, xlrec->nsubxacts, subxacts, xlrec->nmsgs, invals); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication woes
On 2014-05-15 20:07:23 +0300, Heikki Linnakangas wrote: > Ok, so the immediate cause was quick to find: when decoding a > commit-prepared WAL record, we have to use the XID from the record content > (patch attached). The XID in the record header is the XID of the transaction > doing the COMMIT PREPARED, which is always 0 after patch > bb38fb0d43c8d7ff54072bfd8bd63154e536b384 which causes the assertion. But it > was always wrong. After fixing, it no longer asserts or gives the above > "could not map filenode" error. > > However, it still doesn't seem right. When I do the above as a regular > transaction, ie: > > begin; insert into foo values (6); alter table foo alter column id type > text; commit; > > pg_recvlogical prints this: > > BEGIN 708 > table public.foo: INSERT: id[text]:'6' > COMMIT 708 > > But if I do it as a prepared transaction: > > begin; insert into foo values (7); alter table foo alter column id type > text; prepare transaction 'foo'; commit prepared 'foo'; How very wierd. The reason for this is that RecordTransactionCommitPrepared() forgets to fill a couple of fields in xl_xact_commit. Any reason dbId/tsId aren't filled? They aren't strictly needed because afaics they're only looked at for relcache invalidations which prepared xacts don't support, but still? That also explains why decoding for prepared xacts (besides the "typo" you found) didn't work anymore - the filtering at commit was added pretty late... Attached patch fixes things, but I want to add some regression tests before commit. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] buildfarm animals and 'snapshot too old'
On 05/15/2014 12:43 PM, Tomas Vondra wrote: Hi all, today I got a few of errors like these (this one is from last week, though): Status Line: 493 snapshot too old: Wed May 7 04:36:57 2014 GMT Content: snapshot to old: Wed May 7 04:36:57 2014 GMT on the new buildfarm animals. I believe it was my mistake (incorrectly configured local git mirror), but it got me thinking about how this will behave with the animals running CLOBBER_CACHE_RECURSIVELY. If I understand the Perl code correctly, it does this: (1) update the repository (2) run the tests (3) check that the snapshot is not older than 24 hours (pgstatus.pl:188) (4) fail if older Now, imagine that the test runs for days/weeks. This pretty much means it's wasted, because the results will be thrown away anyway, no? The 24 hours runs from the time of the latest commit on the branch in question, not the current time, but basically yes. We've never had machines with runs that long. The longest in recent times has been friarbird, which runs CLOBBER_CACHE_ALWAYS and takes around 4.5 hours. But we have had misconfigured machines reporting unbelievable snapshot times. I'll take a look and see if we can tighten up the sanity check. It's worth noting that one thing friarbird does is skip the install-check stage - it's almost certainly not going to have terribly much interesting to tell us from that, given it has already run a plain "make check". How long does a CLOBBER_CACHE_RECURSIVELY run take? days or weeks seems kinda nuts. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication woes
Hi, On 2014-05-15 20:07:23 +0300, Heikki Linnakangas wrote: > On 05/15/2014 07:57 PM, Heikki Linnakangas wrote: > >Spotted while testing pg_recvlogical: > > > >1. Set up pg_recvlogical to receive: > > > >./pg_recvlogical -S fooslot -d postgres --create > >./pg_recvlogical -S fooslot -d postgres --start -f - > > > >2. In another terminal, with psql: > > > >create table foo (id int4); > >begin; > > insert into foo values (4); > > alter table foo alter column id type text; > >prepare transaction 'foo'; > >commit prepared 'foo'; > >insert into foo values (1); > > > >3. With current HEAD, after commit > >bb38fb0d43c8d7ff54072bfd8bd63154e536b384, this produces an assertion > >failure: > > > >TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File: > >"reorderbuffer.c", Line: 508) > > > >I believe that's we no longer assign another XID to the transaction that > >does the COMMIT PREPARED. Previously, an extra XID, in addition to the > >XID of the prepared transaction, was assigned for use in locking the > >global transaction entry in shared memory, but that's no longer required. > > > >However, even with that patch reverted, it doesn't work correctly: > > > >ERROR: could not map filenode "base/12142/16390" to relation OID > >LOG: starting logical decoding for slot fooslot > >DETAIL: streaming transactions committing after 0/16D1670, reading WAL > >from 0/16BC470 > > Ok, so the immediate cause was quick to find: when decoding a > commit-prepared WAL record, we have to use the XID from the record content > (patch attached). The XID in the record header is the XID of the transaction > doing the COMMIT PREPARED, which is always 0 after patch > bb38fb0d43c8d7ff54072bfd8bd63154e536b384 which causes the assertion. But it > was always wrong. After fixing, it no longer asserts or gives the above > "could not map filenode" error. > > However, it still doesn't seem right. When I do the above as a regular > transaction, ie: > > begin; insert into foo values (6); alter table foo alter column id type > text; commit; > > pg_recvlogical prints this: > > BEGIN 708 > table public.foo: INSERT: id[text]:'6' > COMMIT 708 > > But if I do it as a prepared transaction: > > begin; insert into foo values (7); alter table foo alter column id type > text; prepare transaction 'foo'; commit prepared 'foo'; Looking into it. I at some point dropped the prepared xact tests and that was obviously a mistake. Will re-add them and fix. Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_recvlogical, stdout and SIGHUP
On 05/15/2014 07:59 PM, Andres Freund wrote: On 2014-05-13 17:43:47 +0300, Heikki Linnakangas wrote: On 05/13/2014 04:35 PM, Andres Freund wrote: On 2014-05-13 16:31:25 +0300, Heikki Linnakangas wrote: Another thing I noticed is that if when the output goes to a file, the file isn't re-opened immediately on SIGHUP. Only after receiving some data from the server. I believe that's also not intentional. Hm. I can't really get excited about that one. Not doing that seems to complicate matters unneccessarily. What's the problem here? Not sure if it matters in any real-world scenario, but I found it pretty surprising while playing with it. It should be trivial to fix; ISTM the problem is that there is a "continue" in the loop when select() is interrupted by signal, but the re-opening is done after the select() in the loop. I think all you need to do is move the check for output_reopen to the beginning of the loop. Thanks for fixing and sorry for being slow :(. Any reason you didn't also move the opening of the output file up? It seems a bit odd to not have an output file existing every now and then... No particular reason. But that would actually be a great idea, because currently you won't get any error you give pg_recvlogical an invalid path, until it receives the first piece of data from the server and tries to write it to the file. I'll go and do that. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication woes
On 05/15/2014 07:57 PM, Heikki Linnakangas wrote: Spotted while testing pg_recvlogical: 1. Set up pg_recvlogical to receive: ./pg_recvlogical -S fooslot -d postgres --create ./pg_recvlogical -S fooslot -d postgres --start -f - 2. In another terminal, with psql: create table foo (id int4); begin; insert into foo values (4); alter table foo alter column id type text; prepare transaction 'foo'; commit prepared 'foo'; insert into foo values (1); 3. With current HEAD, after commit bb38fb0d43c8d7ff54072bfd8bd63154e536b384, this produces an assertion failure: TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File: "reorderbuffer.c", Line: 508) I believe that's we no longer assign another XID to the transaction that does the COMMIT PREPARED. Previously, an extra XID, in addition to the XID of the prepared transaction, was assigned for use in locking the global transaction entry in shared memory, but that's no longer required. However, even with that patch reverted, it doesn't work correctly: ERROR: could not map filenode "base/12142/16390" to relation OID LOG: starting logical decoding for slot fooslot DETAIL: streaming transactions committing after 0/16D1670, reading WAL from 0/16BC470 Ok, so the immediate cause was quick to find: when decoding a commit-prepared WAL record, we have to use the XID from the record content (patch attached). The XID in the record header is the XID of the transaction doing the COMMIT PREPARED, which is always 0 after patch bb38fb0d43c8d7ff54072bfd8bd63154e536b384 which causes the assertion. But it was always wrong. After fixing, it no longer asserts or gives the above "could not map filenode" error. However, it still doesn't seem right. When I do the above as a regular transaction, ie: begin; insert into foo values (6); alter table foo alter column id type text; commit; pg_recvlogical prints this: BEGIN 708 table public.foo: INSERT: id[text]:'6' COMMIT 708 But if I do it as a prepared transaction: begin; insert into foo values (7); alter table foo alter column id type text; prepare transaction 'foo'; commit prepared 'foo'; pg_recvlogical prints nothing. - Heikki diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c index d6499d5..cc73652 100644 --- a/src/backend/replication/logical/decode.c +++ b/src/backend/replication/logical/decode.c @@ -225,7 +225,7 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) subxacts = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]); invals = (SharedInvalidationMessage *) &(subxacts[xlrec->nsubxacts]); -DecodeCommit(ctx, buf, prec->xl_xid, xlrec->dbId, +DecodeCommit(ctx, buf, prec->xid, xlrec->dbId, xlrec->xact_time, xlrec->nsubxacts, subxacts, xlrec->nmsgs, invals); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_recvlogical, stdout and SIGHUP
On 2014-05-13 17:43:47 +0300, Heikki Linnakangas wrote: > On 05/13/2014 04:35 PM, Andres Freund wrote: > >On 2014-05-13 16:31:25 +0300, Heikki Linnakangas wrote: > >>Another thing I noticed is that if when the output goes to a file, the file > >>isn't re-opened immediately on SIGHUP. Only after receiving some data from > >>the server. I believe that's also not intentional. > > > >Hm. I can't really get excited about that one. Not doing that seems to > >complicate matters unneccessarily. What's the problem here? > > Not sure if it matters in any real-world scenario, but I found it pretty > surprising while playing with it. It should be trivial to fix; ISTM the > problem is that there is a "continue" in the loop when select() is > interrupted by signal, but the re-opening is done after the select() in the > loop. I think all you need to do is move the check for output_reopen to the > beginning of the loop. Thanks for fixing and sorry for being slow :(. Any reason you didn't also move the opening of the output file up? It seems a bit odd to not have an output file existing every now and then... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Logical replication woes
Spotted while testing pg_recvlogical: 1. Set up pg_recvlogical to receive: ./pg_recvlogical -S fooslot -d postgres --create ./pg_recvlogical -S fooslot -d postgres --start -f - 2. In another terminal, with psql: create table foo (id int4); begin; insert into foo values (4); alter table foo alter column id type text; prepare transaction 'foo'; commit prepared 'foo'; insert into foo values (1); 3. With current HEAD, after commit bb38fb0d43c8d7ff54072bfd8bd63154e536b384, this produces an assertion failure: TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File: "reorderbuffer.c", Line: 508) I believe that's we no longer assign another XID to the transaction that does the COMMIT PREPARED. Previously, an extra XID, in addition to the XID of the prepared transaction, was assigned for use in locking the global transaction entry in shared memory, but that's no longer required. However, even with that patch reverted, it doesn't work correctly: ERROR: could not map filenode "base/12142/16390" to relation OID LOG: starting logical decoding for slot fooslot DETAIL: streaming transactions committing after 0/16D1670, reading WAL from 0/16BC470 - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] buildfarm animals and 'snapshot too old'
Hi all, today I got a few of errors like these (this one is from last week, though): Status Line: 493 snapshot too old: Wed May 7 04:36:57 2014 GMT Content: snapshot to old: Wed May 7 04:36:57 2014 GMT on the new buildfarm animals. I believe it was my mistake (incorrectly configured local git mirror), but it got me thinking about how this will behave with the animals running CLOBBER_CACHE_RECURSIVELY. If I understand the Perl code correctly, it does this: (1) update the repository (2) run the tests (3) check that the snapshot is not older than 24 hours (pgstatus.pl:188) (4) fail if older Now, imagine that the test runs for days/weeks. This pretty much means it's wasted, because the results will be thrown away anyway, no? regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unportability of setvbuf()
Alvaro Herrera writes: > Tom Lane wrote: >> It might also be reasonable to create a wrapper macro along the line of >> "PG_STD_IO_BUFFERING()" that would encapsulate the whole sequence >> setvbuf(stdout, NULL, _IOLBF, 0); >> setvbuf(stderr, NULL, _IONBF, 0); >> Or maybe we should have separate macros for those two calls. Or maybe >> this is just a useless layer of abstraction and PG_IOLBF is enough >> to make the calls portable. >> >> Thoughts? > I don't really know all that much about this stuff, but see commits > 6eda3e9c27781dec369542a9b20cba7c3d832a5e and its parent about > isolationtester. Yeah, making them both unbuffered is another scenario that has its use-cases, so maybe it's inappropriate to create a macro that presumes to define the One True Way. For the moment I'll just arrange for initdb to share the logic with syslogger. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Thu, May 15, 2014 at 11:36:51PM +0900, Amit Langote wrote: > > No, all memory allocat is per-process, except for shared memory. We > > probably need a way to record our large local memory allocations in > > PGPROC that other backends can see; same for effective cache size > > assumptions we make. > > > > I see. I thought there would be some centralised way to traverse, say, > a linked list of contexts that individual backends create or something > like that. But, I suppose it would not be straightforward to make any > of that work for what we are after here. The problem is locking overhead between sessions. Right now we avoid all of that, and I think if we just put the value in PGPROC, it will be good enough with limited locking required. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unportability of setvbuf()
Tom Lane wrote: > It might also be reasonable to create a wrapper macro along the line of > "PG_STD_IO_BUFFERING()" that would encapsulate the whole sequence > setvbuf(stdout, NULL, _IOLBF, 0); > setvbuf(stderr, NULL, _IONBF, 0); > Or maybe we should have separate macros for those two calls. Or maybe > this is just a useless layer of abstraction and PG_IOLBF is enough > to make the calls portable. > > Thoughts? I don't really know all that much about this stuff, but see commits 6eda3e9c27781dec369542a9b20cba7c3d832a5e and its parent about isolationtester. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with txid_snapshot_in/out() functionality
On 04/14/2014 11:55 AM, Marko Kreen wrote: On Sun, Apr 13, 2014 at 05:46:20PM -0400, Jan Wieck wrote: On 04/13/14 14:22, Jan Wieck wrote: On 04/13/14 08:27, Marko Kreen wrote: I think you need to do SET_VARSIZE also here. Alternative is to move SET_VARSIZE after sort_snapshot(). And it seems the drop-double-txid logic should be added also to txid_snapshot_recv(). It seems weird to have it behave differently >from txid_snapshot_in(). Thanks, yes on both issues. Will create another patch. New patch attached. New github commit is https://github.com/wieck/postgres/commit/b8fd0d2eb78791e5171e34aecd233fd06218890d Looks OK to me. Ok, committed. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Unportability of setvbuf()
>From the department of punishing good deeds ... in commit 2dc4f011fd I added setvbuf() calls to initdb to ensure that output to stdout and stderr would appear in a consistent order regardless of whether the output was going to a terminal or a file. The buildfarm shows that on several (but not all) Windows machines, initdb is now failing without printing anything. After a bit of research, I believe what is happening is: (1) On Windows, setvbuf interprets _IOLBF as _IOFBF. (2) If _IOFBF is specified and size is zero, it reports EINVAL. (3) If "parameter validation" is enabled, EINVAL turns into abort(). Thanks Microsoft for your careful attention to compliance with POSIX. I see that syslogger.c encountered this problem long ago and solved it by the expedient of using _IONBF not _IOLBF for the log output file. That's probably what we must do for stdout in initdb as well. It seems likely that we may need the same in other client programs someday. What I'm not totally sure about is how to wrap this up nicely. I'm inclined first of all to move syslogger.c's LBF_MODE symbol to port.h and rename it to, say, PG_IOLBF. It might also be reasonable to create a wrapper macro along the line of "PG_STD_IO_BUFFERING()" that would encapsulate the whole sequence setvbuf(stdout, NULL, _IOLBF, 0); setvbuf(stderr, NULL, _IONBF, 0); Or maybe we should have separate macros for those two calls. Or maybe this is just a useless layer of abstraction and PG_IOLBF is enough to make the calls portable. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Race condition between PREPARE TRANSACTION and COMMIT PREPARED (was Re: [HACKERS] Problem with txid_snapshot_in/out() functionality)
On 2014-05-15 17:21:28 +0300, Heikki Linnakangas wrote: > >Is it guaranteed that all paths have called LWLockReleaseAll() > >before calling the proc exit hooks? Otherwise we might end up waiting > >for ourselves... > > Hmm. AbortTransaction() will release locks before we get here, but the > before_shmem_exit() callpath will not. So an elog(FATAL), while holding > TwoPhaseStateLock would cause us to deadlock with ourself. But there are no > such elogs. > I copied this design from async.c, which is quite similar, so if there's a > problem that ought to be fixed too. And there are other more complicated > before_shmem callbacks that worry me more, like createdb_failure_callback(). > But I think they're all all right. Perhaps we should enforce that LWLockReleaseAll() is called first? E.g. in shmem_exit()? It'll happen in ProcKill() atm, but that's normally pretty much at the bottom of the stack. > >It's not particularly nice to register shmem exit hooks in the middle of > >normal processing because it makes it impossible to use > >cancel_before_shmem_exit() previously registered hooks. I think this > >should be registered at startup, if max_prepared_xacts > 0. > > . async.c and namespace.c does the same, and it hasn't been a > problem. Well, it doesn't seem unreasonable to have C code using PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP around a 2pc commit to me. That'll break with this. Perhaps we should just finally make cancel_before_shmem_exit search the stack of callbacks. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Thu, May 15, 2014 at 11:24 PM, Bruce Momjian wrote: > On Thu, May 15, 2014 at 10:23:19PM +0900, Amit Langote wrote: >> On Thu, May 15, 2014 at 9:06 PM, Bruce Momjian wrote: >> > >> > This is the same problem we had with auto-tuning work_mem, in that we >> > didn't know what other concurrent activity was happening. Seems we need >> > concurrent activity detection before auto-tuning work_mem and >> > effective_cache_size. >> > >> >> Perhaps I am missing something obvious here, but would mmgr have any >> useful numbers on this? Like any book-keeping info maintained by >> mcxt.c/aset.c? Would extending that interface help? > > No, all memory allocat is per-process, except for shared memory. We > probably need a way to record our large local memory allocations in > PGPROC that other backends can see; same for effective cache size > assumptions we make. > I see. I thought there would be some centralised way to traverse, say, a linked list of contexts that individual backends create or something like that. But, I suppose it would not be straightforward to make any of that work for what we are after here. -- Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Thu, May 15, 2014 at 10:23:19PM +0900, Amit Langote wrote: > On Thu, May 15, 2014 at 9:06 PM, Bruce Momjian wrote: > > > > This is the same problem we had with auto-tuning work_mem, in that we > > didn't know what other concurrent activity was happening. Seems we need > > concurrent activity detection before auto-tuning work_mem and > > effective_cache_size. > > > > Perhaps I am missing something obvious here, but would mmgr have any > useful numbers on this? Like any book-keeping info maintained by > mcxt.c/aset.c? Would extending that interface help? No, all memory allocat is per-process, except for shared memory. We probably need a way to record our large local memory allocations in PGPROC that other backends can see; same for effective cache size assumptions we make. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Race condition between PREPARE TRANSACTION and COMMIT PREPARED (was Re: [HACKERS] Problem with txid_snapshot_in/out() functionality)
On 05/06/2014 02:44 PM, Andres Freund wrote: On 2014-05-05 13:41:00 +0300, Heikki Linnakangas wrote: +/* + * Exit hook to unlock the global transaction entry we're working on. + */ +static void +AtProcExit_Twophase(int code, Datum arg) +{ + /* same logic as abort */ + AtAbort_Twophase(); +} + +/* + * Abort hook to unlock the global transaction entry we're working on. + */ +void +AtAbort_Twophase(void) +{ + if (MyLockedGxact == NULL) + return; + + /* +* If we were in process of preparing the transaction, but haven't +* written the WAL record yet, remove the global transaction entry. +* Same if we are in the process of finishing an already-prepared +* transaction, and fail after having already written the WAL 2nd +* phase commit or rollback record. +* +* After that it's too late to abort, so just unlock the GlobalTransaction +* entry. We might not have transfered all locks and other state to the +* prepared transaction yet, so this is a bit bogus, but it's the best we +* can do. +*/ + if (!MyLockedGxact->valid) + { + RemoveGXact(MyLockedGxact); + } + else + { + LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE); + + MyLockedGxact->locking_backend = InvalidBackendId; + + LWLockRelease(TwoPhaseStateLock); + } + MyLockedGxact = NULL; +} Is it guaranteed that all paths have called LWLockReleaseAll() before calling the proc exit hooks? Otherwise we might end up waiting for ourselves... Hmm. AbortTransaction() will release locks before we get here, but the before_shmem_exit() callpath will not. So an elog(FATAL), while holding TwoPhaseStateLock would cause us to deadlock with ourself. But there are no such elogs. I copied this design from async.c, which is quite similar, so if there's a problem that ought to be fixed too. And there are other more complicated before_shmem callbacks that worry me more, like createdb_failure_callback(). But I think they're all all right. /* * MarkAsPreparing @@ -261,29 +329,15 @@ MarkAsPreparing(TransactionId xid, const char *gid, errmsg("prepared transactions are disabled"), errhint("Set max_prepared_transactions to a nonzero value."))); - LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE); - - /* -* First, find and recycle any gxacts that failed during prepare. We do -* this partly to ensure we don't mistakenly say their GIDs are still -* reserved, and partly so we don't fail on out-of-slots unnecessarily. -*/ - for (i = 0; i < TwoPhaseState->numPrepXacts; i++) + /* on first call, register the exit hook */ + if (!twophaseExitRegistered) { - gxact = TwoPhaseState->prepXacts[i]; - if (!gxact->valid && !TransactionIdIsActive(gxact->locking_xid)) - { - /* It's dead Jim ... remove from the active array */ - TwoPhaseState->numPrepXacts--; - TwoPhaseState->prepXacts[i] = TwoPhaseState->prepXacts[TwoPhaseState->numPrepXacts]; - /* and put it back in the freelist */ - gxact->next = TwoPhaseState->freeGXacts; - TwoPhaseState->freeGXacts = gxact; - /* Back up index count too, so we don't miss scanning one */ - i--; - } + before_shmem_exit(AtProcExit_Twophase, 0); + twophaseExitRegistered = true; } It's not particularly nice to register shmem exit hooks in the middle of normal processing because it makes it impossible to use cancel_before_shmem_exit() previously registered hooks. I think this should be registered at startup, if max_prepared_xacts > 0. . async.c and namespace.c does the same, and it hasn't been a problem. I committed this now, but please let me know if you see a concrete problem with the locks. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: popen and pclose redefinitions causing many warning in Windows build
On 05/15/2014 04:15 PM, Michael Paquier wrote: On Thu, May 15, 2014 at 6:20 PM, Heikki Linnakangas wrote: Ok, I committed #undefs. I don't have a Mingw(-w64) environment to test with, so let's see if the buildfarm likes it. There does not seem to be a buildfarm machine using MinGW-w64... Jacana. It has "gcc 4.8.1" listed as the compiler, but if you look at the config in detail, it's mingw-w64. The popen/pclose warnings are there. It hasn't performed a build after I committed the fix yet. Btw, I tested latest master on a Windows box and MinGW-w64 is happier now. Thanks! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Thu, May 15, 2014 at 9:06 PM, Bruce Momjian wrote: > > This is the same problem we had with auto-tuning work_mem, in that we > didn't know what other concurrent activity was happening. Seems we need > concurrent activity detection before auto-tuning work_mem and > effective_cache_size. > Perhaps I am missing something obvious here, but would mmgr have any useful numbers on this? Like any book-keeping info maintained by mcxt.c/aset.c? Would extending that interface help? -- Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] popen and pclose redefinitions causing many warning in Windows build
On Thu, May 15, 2014 at 6:20 PM, Heikki Linnakangas wrote: > Ok, I committed #undefs. I don't have a Mingw(-w64) environment to test > with, so let's see if the buildfarm likes it. There does not seem to be a buildfarm machine using MinGW-w64... Btw, I tested latest master on a Windows box and MinGW-w64 is happier now. Thanks! -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Tue, May 6, 2014 at 11:15:17PM +0100, Simon Riggs wrote: > > Well, for what it's worth, I've encountered systems where setting > > effective_cache_size too low resulted in bad query plans, but I've > > never encountered the reverse situation. > > I agree with that. > > Though that misses my point, which is that you can't know that all of > that memory is truly available on a server with many concurrent users. > Choosing settings that undercost memory intensive plans are not the > best choice for a default strategy in a mixed workload when cache may > be better used elsewhere, even if such settings make sense for some > individual users. This is the same problem we had with auto-tuning work_mem, in that we didn't know what other concurrent activity was happening. Seems we need concurrent activity detection before auto-tuning work_mem and effective_cache_size. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Various cosmetic fixes
On 05/14/2014 08:49 PM, Euler Taveira wrote: While updating pt-br translation I noticed that some sentences could be improved. I also fix some style glitches. A set of patches are attached. Thanks, applied. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gettimeofday is at the end of its usefulness?
Hi, On 2014-05-13 18:58:11 -0400, Tom Lane wrote: > Anyway it looks like clock_gettime() might be worth using on Linux > just for the more precise output. It doesn't seem to exist on OS X > though, and I have no idea about elsewhere. Agreed that using clock_gettime() would be a good idea. I'd say we should have a wrapper around it that is able to provide nanosecond precision. If only gettimeofday() (and whatever windows is using) is available, we can dynamically fall back to that. > I'm curious if anybody has ideas about other things we might do for > portable high-precision timing. It's far from a solve-it-all, but can we perhaps try to coalesce repeated time measurements? We'll very frequently do a InstrStopNode(); /* minimal amount of work */ InstrStartNode(); which will measure the time twice. I think there's a fair number of scenarios where once would be enough. I'll freely admit that I haven't looked enough to determine how we could do that API wise. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gettimeofday is at the end of its usefulness?
On 2014-05-15 12:04:25 +0100, Benedikt Grundmann wrote: > On Thu, May 15, 2014 at 11:31 AM, Greg Stark wrote: > > > On Thu, May 15, 2014 at 8:19 AM, Benedikt Grundmann > > wrote: > > > I posted this on this mailing list before at Jane Street we have > > developed > > > very fast code to get timing information based on TSC if available. It's > > > all ocaml but well documented and mostly just calls to c functions so > > should > > > be easy to port to C and we release it under a very liberal license so it > > > should be no problem to take the ideas: > > > > What OS do you run it on though? How fast is your implementation > > compared to the kernel implementation of clock_gettime()? > > > > Are you sure your implementation is actually faster? And are you sure > > you're protected against clocks going backwards? I think you should > > put some i/o in the loop in the test and start several threads running > > it to make it more likely the thread is rescheduled to a different > > processor during the test. It suspect you'll find the rdtsc goes > > backwards sometimes or produces crazy results when switching > > processors. > > > > > There are benchmarks in the link I posted (obtained by a micro benchmarking > library we developed / use internally which takes great care to obtain > reliable numbers) . We use posix threads extensively. We internally spend > a lot of time setting up ntp and monitoring systems so that clock backwards > never happens (so with other words I wouldn't be surprised if the library > does NOT work correctly when it does -- our protection is outside). I do > not believe we have seen the tdtsc going backwards on thread context switch > you mention (and as said we use lots of threads). OS? Centos 6.5 > primarily. Did you test it on server with more one socket (i.e. not just multiple cores, but distinct cpu cases)? That's where you expect to see differences in TSC to have funny effects. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gettimeofday is at the end of its usefulness?
On Thu, May 15, 2014 at 11:31 AM, Greg Stark wrote: > On Thu, May 15, 2014 at 8:19 AM, Benedikt Grundmann > wrote: > > I posted this on this mailing list before at Jane Street we have > developed > > very fast code to get timing information based on TSC if available. It's > > all ocaml but well documented and mostly just calls to c functions so > should > > be easy to port to C and we release it under a very liberal license so it > > should be no problem to take the ideas: > > What OS do you run it on though? How fast is your implementation > compared to the kernel implementation of clock_gettime()? > > Are you sure your implementation is actually faster? And are you sure > you're protected against clocks going backwards? I think you should > put some i/o in the loop in the test and start several threads running > it to make it more likely the thread is rescheduled to a different > processor during the test. It suspect you'll find the rdtsc goes > backwards sometimes or produces crazy results when switching > processors. > > There are benchmarks in the link I posted (obtained by a micro benchmarking library we developed / use internally which takes great care to obtain reliable numbers) . We use posix threads extensively. We internally spend a lot of time setting up ntp and monitoring systems so that clock backwards never happens (so with other words I wouldn't be surprised if the library does NOT work correctly when it does -- our protection is outside). I do not believe we have seen the tdtsc going backwards on thread context switch you mention (and as said we use lots of threads). OS? Centos 6.5 primarily. -- > greg >
Re: [HACKERS] gettimeofday is at the end of its usefulness?
On Thu, May 15, 2014 at 8:19 AM, Benedikt Grundmann wrote: > I posted this on this mailing list before at Jane Street we have developed > very fast code to get timing information based on TSC if available. It's > all ocaml but well documented and mostly just calls to c functions so should > be easy to port to C and we release it under a very liberal license so it > should be no problem to take the ideas: What OS do you run it on though? How fast is your implementation compared to the kernel implementation of clock_gettime()? Are you sure your implementation is actually faster? And are you sure you're protected against clocks going backwards? I think you should put some i/o in the loop in the test and start several threads running it to make it more likely the thread is rescheduled to a different processor during the test. It suspect you'll find the rdtsc goes backwards sometimes or produces crazy results when switching processors. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: popen and pclose redefinitions causing many warning in Windows build
On 05/14/2014 06:06 PM, Noah Misch wrote: On Wed, May 14, 2014 at 05:51:24PM +0300, Heikki Linnakangas wrote: On 05/14/2014 05:37 PM, Noah Misch wrote: On Wed, May 14, 2014 at 03:15:38PM +0300, Heikki Linnakangas wrote: On 05/09/2014 02:56 AM, Noah Misch wrote: MinGW: http://sourceforge.net/p/mingw/mingw-org-wsl/ci/master/tree/include/stdio.h#l467 MinGW-w64: http://sourceforge.net/p/mingw-w64/code/HEAD/tree/trunk/mingw-w64-headers/crt/stdio.h#l496 Building with any recent MinGW-w64, 32-bit or 64-bit, gets the reported warnings; building with MinGW proper does not. Hmm. The MinGW-w64 header does this: #if !defined(NO_OLDNAMES) && !defined(popen) #define popen _popen #define pclose _pclose #endif So if we defined popen() before including stdio.h, that would get rid of the warning. But we don't usually do things in that order. True. I have no strong preference between that and use of #undef. I think I would prefer #undef. The risk with that is if some platform has #defined popen() to something else entirely, for some good reason, we would be bypassing that hypothetical wrapper. But I guess we'll cross that bridge if we get there. Works for me. Since "(popen)(x, y)" shall behave the same as "popen(x, y)", such a hypothetical system header would be buggy, anyway. Ok, I committed #undefs. I don't have a Mingw(-w64) environment to test with, so let's see if the buildfarm likes it. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error in running DBT2
Hi All, I am runnig dbt2 with last postgresql kit. pg9.4. I tried everything again after setting up awhole new machine again with ubuntu. Still facing the same error. I run the the *dbt2-pgsql-build-db -w 1 * but, after some time, I faced this error */home/abhi/dbt2_install/bin/dbt2-pgsql-load-stored-procs: 45: [: c: unexpected operator* */home/abhi/dbt2_install/bin/dbt2-pgsql-load-stored-procs: 53: [: c: unexpected operator* *unknown stored function type: c* and script ends!! Please guide me probable solution to get over it. Regards, Rohit On Wed, May 14, 2014 at 12:43 AM, Rohit Goyal wrote: > Hi Peter, > > I tried the solution suggested by you. Please problem still persists. > > I run the the *dbt2-pgsql-build-db -w 1 * > > but, after some time, I faced this error > > */home/abhi/dbt2_install/bin/dbt2-pgsql-load-stored-procs: 45: [: c: > unexpected operator* > */home/abhi/dbt2_install/bin/dbt2-pgsql-load-stored-procs: 53: [: c: > unexpected operator* > *unknown stored function type: c* > > and script ends!! > > moreover, I see that DB has been created and also 9 tables are there in > Database dbt2. Please suggest how to proceed. > > Regards, > rohit Goyal > > > > On Tue, May 13, 2014 at 9:44 PM, Peter Geoghegan wrote: > >> >> On Tue, May 13, 2014 at 12:36 PM, Rohit Goyal wrote: >> >>> This pattern the above found many times. Please guide me through!!! >>> >> >> IIRC, people have been working around this by setting >> standard_conforming_strings to "off". It really ought to be fixed in a >> principled way, though -- the real issue here is that dbt2 has severe >> bit-rot. >> >> -- >> Peter Geoghegan >> > > > > -- > Regards, > Rohit Goyal > -- Regards, Rohit Goyal
[HACKERS] Selectivity estimation for inet operators
New version of the selectivity estimation patch attached. I am adding it to CommitFest 2014-06. Previous version of it reviewed by Andreas Karlson on the previous CommitFest with the GiST support patch. The new version includes join selectivity estimation. Join selectivity is calculated in 4 steps: * matching first MCV to second MCV * searching first MCV in the second histogram * searching second MCV in the first histogram * searching boundaries of the first histogram in the second histogram Comparing the lists with each other slows down the function when statistics set to higher values. To avoid this problem I only use log(n) values of the lists. It is the first log(n) value for MCV, evenly separated values for histograms. In my tests, this optimization does not affect the planning time when statistics = 100, but does affect accuracy of the estimation. I can send the version without this optimization, if slow down with larger statistics is not a problem which should be solved on the selectivity estimation function. I also attach the script I was using for testing and I left log statements in the networkjoinsel() function to make testing easier. These statements should be removed before commit. inet-selfuncs-v4.patch Description: Binary data inet-selfuncs-test.sql Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gettimeofday is at the end of its usefulness?
On Thu, May 15, 2014 at 8:19 AM, Benedikt Grundmann < bgrundm...@janestreet.com> wrote: > I posted this on this mailing list before at Jane Street we have developed > very fast code to get timing information based on TSC if available. It's > all ocaml but well documented and mostly just calls to c functions so > should be easy to port to C and we release it under a very liberal license > so it should be no problem to take the ideas: > > https://github.com/janestreet/core/blob/master/lib/time_stamp_counter.mli > > Hope this is useful. > > Bene > > Also I'm sorry for top posting. hackers is the only mailing list I'm on that requires this and some others require top posting so this runs counter my habits and I only realized after sending... > > On Wed, May 14, 2014 at 12:41 PM, Robert Haas wrote: > >> On Tue, May 13, 2014 at 11:34 PM, Greg Stark wrote: >> > I always assumed the kernel used rdtsc to implement some of the high >> > performance timers. It can save the current time in a mapped page when >> > it schedules a process and then in the vdso syscall (ie in user-space) >> > it can use rdtsc to calculate the offset needed to adjust that >> > timestamp to the current time. This seems consistent with your >> > calculations that showed the 40ns overhead with +/- 10ns precision. >> >> Crazy idea: Instead of trying to time precisely the amount of time we >> spend in each node, configure a very-high frequency timer interrupt >> (or background thread?) that does: >> >> SomeGlobalVariablePointingToTheCurrentNode->profiling_counter++; >> >> -- >> Robert Haas >> EnterpriseDB: http://www.enterprisedb.com >> The Enterprise PostgreSQL Company >> >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers >> > >
Re: [HACKERS] gettimeofday is at the end of its usefulness?
I posted this on this mailing list before at Jane Street we have developed very fast code to get timing information based on TSC if available. It's all ocaml but well documented and mostly just calls to c functions so should be easy to port to C and we release it under a very liberal license so it should be no problem to take the ideas: https://github.com/janestreet/core/blob/master/lib/time_stamp_counter.mli Hope this is useful. Bene On Wed, May 14, 2014 at 12:41 PM, Robert Haas wrote: > On Tue, May 13, 2014 at 11:34 PM, Greg Stark wrote: > > I always assumed the kernel used rdtsc to implement some of the high > > performance timers. It can save the current time in a mapped page when > > it schedules a process and then in the vdso syscall (ie in user-space) > > it can use rdtsc to calculate the offset needed to adjust that > > timestamp to the current time. This seems consistent with your > > calculations that showed the 40ns overhead with +/- 10ns precision. > > Crazy idea: Instead of trying to time precisely the amount of time we > spend in each node, configure a very-high frequency timer interrupt > (or background thread?) that does: > > SomeGlobalVariablePointingToTheCurrentNode->profiling_counter++; > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >