[HACKERS] pg_stat_statements vs escape_string_warning
I happened to notice that if you run the regression tests with pg_stat_statements installed, you will often (not always) get a failure that looks like this: *** src/test/regress/expected/plpgsql.out Tue Jan 20 12:01:52 2015 --- src/test/regress/results/plpgsql.out Wed Jan 21 12:43:19 2015 *** *** 4672,4677 --- 4672,4683 HINT: Use the escape string syntax for backslashes, e.g., E'\\'. QUERY: SELECT 'foo\\bar\041baz' CONTEXT: PL/pgSQL function strtest() line 4 at RETURN + WARNING: nonstandard use of \\ in a string literal + LINE 1: SELECT 'foo\\bar\041baz' +^ + HINT: Use the escape string syntax for backslashes, e.g., E'\\'. + QUERY: SELECT 'foo\\bar\041baz' + CONTEXT: PL/pgSQL function strtest() line 4 at RETURN strtest - foo\bar!baz That is, we're getting an extra copy of the escape string warning message. Investigation disclosed that the reason is that pg_stat_statements' pgss_post_parse_analyze hook re-runs the lexer over the query string (see fill_in_constant_lengths()), so that you get an extra instance of any side-effects in the lexer. This is kind of annoying, especially since it's nondeterministic --- if there's already a pg_stat_statements entry matching SELECT ? then you don't see the extra warning. What I'm inclined to do about this is add an escape_string_warning bool field to struct core_yy_extra_type, which would be copied from the GUC variable by scanner_init(); then check_string_escape_warning() would consult that field instead of consulting the GUC directly. It would then be possible for fill_in_constant_lengths to reset that field after calling scanner_init so that no warnings appear during its reparse. As a matter of cleanliness I'm inclined to do the same with backslash_quote and standard_conforming_strings, so that callers of the core lexer are not tied to using the prevailing GUC settings but can control all these things. This isn't a back-patchable bug fix, but given the lack of prior complaints, maybe it doesn't matter. Alternatively, we could back-patch only the addition of escape_string_warning to the struct: that would fit into padding space in the struct so that there would be no ABI risk. Comments, objections? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: lock_time for pg_stat_database
Hi 2015-01-16 20:33 GMT+01:00 Jim Nasby jim.na...@bluetreble.com: On 1/16/15 12:30 PM, Pavel Stehule wrote: 2015-01-16 19:24 GMT+01:00 Pavel Stehule pavel.steh...@gmail.com mailto:pavel.steh...@gmail.com: 2015-01-16 19:06 GMT+01:00 Jim Nasby jim.na...@bluetreble.com mailto:jim.na...@bluetreble.com: On 1/16/15 11:35 AM, Pavel Stehule wrote: 2015-01-16 18:23 GMT+01:00 Jim Nasby jim.na...@bluetreble.com mailto:jim.na...@bluetreble.com mailto: Jim.Nasby@bluetreble.__com mailto:jim.na...@bluetreble.com: On 1/16/15 11:00 AM, Pavel Stehule wrote: Hi all, some time ago, I proposed a lock time measurement related to query. A main issue was a method, how to show this information. Today proposal is little bit simpler, but still useful. We can show a total lock time per database in pg_stat_database statistics. High number can be signal about lock issues. Would this not use the existing stats mechanisms? If so, couldn't we do this per table? (I realize that won't handle all cases; we'd still need a lock_time_other somewhere). it can use a current existing stats mechanisms I afraid so isn't possible to assign waiting time to table - because it depends on order Huh? Order of what? when you have a SELECT FROM T1, T2 and T1 is locked for t1, and T2 is locked for t2 -- but if t2 t1 then t2 is not important -- so what I have to cont as lock time for T1 and T2? If that select is waiting on a lock on t2, then it's waiting on that lock on that table. It doesn't matter who else has the lock. Also, what do you mean by 'lock'? Heavyweight? We already have some visibility there. What I wish we had was some way to know if we're spending a lot of time in a particular non-heavy lock. Actually measuring time probably wouldn't make sense but we might be able to count how often we fail initial acquisition or something. now, when I am thinking about it, lock_time is not good name - maybe waiting lock time (lock time should not be interesting, waiting is interesting) - it can be divided to some more categories - in GoodData we use Heavyweight, pages, and others categories. So do you see this somehow encompassing locks other than heavyweight locks? Because I think that's the biggest need here. Basically, something akin to TRACE_POSTGRESQL_LWLOCK_WAIT___START() that doesn't depend on dtrace. For these global statistics I see as important a common total waiting time for locks - we can use a more detailed granularity but I am not sure, if a common statistics are best tool. Locks may be global, but what you're waiting for a lock on certainly isn't. It's almost always a lock either on a table or a row in a table. Of course this does mean you can't just blindly report that you're blocked on some XID; that doesn't tell anyone anything. My motivations is - look to statistics -- and I can see ... lot of rollbacks -- issue, lot of deadlocks -- issue, lot of waiting time -- issue too. It is tool for people without possibility to use dtrace and similar tools and for everyday usage - simple check if locks are not a issue (or if locking is stable). Meh. SELECT sum(state_change) FROM pg_stat_activity WHERE waiting is just about as useful. Or just turn on lock logging. If you really want to add it at the database level I'm not opposed (so long as it leaves the door open for more granular locking later), but I can't really get excited about it either. and this proposal has sense only for heavyweight locks - because others locks are everywhere So what if they're everywhere? Right now if you're spending a lot of time waiting for LWLocks you have no way to know what's going on unless you happen to have dtrace. Obviously we're not going to something like issue a stats update every time we attempt to acquire an LWLock, but that doesn't mean we can't keep some counters on the locks and periodically report that. I was wrong - probably is possible to attach lock waiting time per table Regards Pavel -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
Re: [HACKERS] moving from contrib to bin
On Sat, Jan 17, 2015 at 02:08:34PM +0100, Andres Freund wrote: 7) Are we sure that the authors in the affected contrib modules are ok with their authorship notice being removed? I don't think Ants, Bruce or Simon have a problem with that, but ... I am fine. It means others can be blamed. ;-) -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements vs escape_string_warning
Peter Geoghegan p...@heroku.com writes: On Wed, Jan 21, 2015 at 10:14 AM, Tom Lane t...@sss.pgh.pa.us wrote: This isn't a back-patchable bug fix, but given the lack of prior complaints, maybe it doesn't matter. Alternatively, we could back-patch only the addition of escape_string_warning to the struct: that would fit into padding space in the struct so that there would be no ABI risk. I think that this is a good idea, but I see very little reason to back-patch. I'm not aware that the padding space argument has been used for something like this before. Oh, we definitely *have* done that kind of thing in the past, when there was sufficient motivation. But I'm not sure there's sufficient motivation here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements vs escape_string_warning
On Wed, Jan 21, 2015 at 10:14 AM, Tom Lane t...@sss.pgh.pa.us wrote: What I'm inclined to do about this is add an escape_string_warning bool field to struct core_yy_extra_type, which would be copied from the GUC variable by scanner_init(); then check_string_escape_warning() would consult that field instead of consulting the GUC directly. It would then be possible for fill_in_constant_lengths to reset that field after calling scanner_init so that no warnings appear during its reparse. I've noticed this too, and found it annoying. As a matter of cleanliness I'm inclined to do the same with backslash_quote and standard_conforming_strings, so that callers of the core lexer are not tied to using the prevailing GUC settings but can control all these things. +1 This isn't a back-patchable bug fix, but given the lack of prior complaints, maybe it doesn't matter. Alternatively, we could back-patch only the addition of escape_string_warning to the struct: that would fit into padding space in the struct so that there would be no ABI risk. Comments, objections? I think that this is a good idea, but I see very little reason to back-patch. I'm not aware that the padding space argument has been used for something like this before. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] moving from contrib to bin
On Sat, Jan 17, 2015 at 01:16:18PM +0100, Andres Freund wrote: Hi, FWIW, I find it rather annoying if people attach patchsets as tarballs. That makes it impossible to look at them in the mailreader since I really don't have anything reasonable to go on to teach it to treat it as a set of patches. I'd also like to see patches that primarily move code around as git diff -M -C style diffs (can also be passed to format-patch). That will show the file move and then additionally the changes that have been made in addition to the rename. There's no sane way the current diffs can be reviewed without applying them to a tree. FYI, the .gitconfig setting is 'renames': [diff] renames = copies -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch
Even following Robert's disabling of abbreviated keys on Windows, buildfarm animals hamerkop, brolga, currawong and bowerbird remain unhappy, with failing regression tests for collate and sometimes (but not always) aggregates. Some of these only use the C locale. I think that aggregates does not fail for brolga because it's built without locale_t support (not sure how to interpret MSVC configure output, but I think that the other animals do have such support). So maybe this code being executed within btsortsupport_worker(), for the C locale on Windows is at issue (note that abbreviation is still disabled): tss-locale = pg_newlocale_from_collation(collid); That doesn't explain the other problems, though. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Wed, 21 Jan 2015, Andrew Dunstan wrote: On 01/21/2015 09:27 AM, Arne Scheffer wrote: Sorry, corrected second try because of copypaste mistakes: VlG-Arne Comments appreciated. Definition var_samp = Sum of squared differences /n-1 Definition stddev_samp = sqrt(var_samp) Example N=4 1.) Sum of squared differences 1_4Sum(Xi-XM4)² = 2.) adding nothing 1_4Sum(Xi-XM4)² +0 +0 +0 = 3.) nothing changed 1_4Sum(Xi-XM4)² +(-1_3Sum(Xi-XM3)²+1_3Sum(Xi-XM3)²) +(-1_2Sum(Xi-XM2)²+1_2Sum(Xi-XM2)²) +(-1_1Sum(Xi-XM1)²+1_1Sum(Xi-XM1)²) = 4.) parts reordered (1_4Sum(Xi-XM4)²-1_3Sum(Xi-XM3)²) +(1_3Sum(Xi-XM3)²-1_2Sum(Xi-XM2)²) +(1_2Sum(Xi-XM2)²-1_1Sum(Xi-XM1)²) +1_1Sum(X1-XM1)² = 5.) (X4-XM4)(X4-XM3) + (X3-XM3)(X3-XM2) + (X2-XM2)(X2-XM1) + (X1-XM1)² = 6.) XM1=X1 = There it is - The iteration part of Welfords Algorithm (in reverse order) (X4-XM4)(X4-XM3) + (X3-XM3)(X3-XM2) + (X2-XM2)(X2-X1) + 0 The missing piece is 4.) to 5.) it's algebra, look at e.g.: http://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/ I have no idea what you are saying here. I'm sorry for that statistics stuff, my attempt was only to visualize in detail the mathematical reason for the iterating part of Welfords algorithm being computing the current sum of squared differences in every step - therefore it's in my opinion better to call the variable sum_of_squared_diffs (every statistician will be confused bei sum_of_variances, because: sample variance = sum_of_squared_diffs / n-1, have a look at Mr. Cooks explanation) - therefore deviding by n-1 is the unbiased estimator by definition. (have a look at Mr. Cooks explanation) - therefore I suggested (as a minor nomenclature issue) to call the column/description stdev_samp (PostgreSQL-nomenclature) / sample_ to indicate that information. (have a look at the PostgreSQL aggregate functions, it's doing that the same way) Here are comments in email to me from the author of http://www.johndcook.com/blog/standard_deviation regarding the divisor used: My code is using the unbiased form of the sample variance, dividing by n-1. I am relieved, now we are at least two persons saying that. :-) Insert into the commonly known definition Definition stddev_samp = sqrt(var_samp) from above, and it's exactly my point. Maybe I should add that in the code comments. Otherwise, I don't think we need a change. Huh? Why is it a bad thing to call the column stddev_samp analog to the aggregate function or make a note in the documentation, that the sample stddev is used to compute the solution? I really think it not a good strategy having the user to make a test or dive into the source code to determine the divisor used. E.g. David expected stdev_pop, so there is a need for documentation for cases with a small sample. VlG-Arne -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/21/2015 11:21 AM, Arne Scheffer wrote: Why is it a bad thing to call the column stddev_samp analog to the aggregate function or make a note in the documentation, that the sample stddev is used to compute the solution? I think you are making a mountain out of a molehill, frankly. These stats are not intended as anything other than a pretty indication of the shape, to see if they are significantly influenced by outliers. For any significantly large sample size the difference will be negligible. But I will add a note to the documentation, that seems reasonable. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Andrew Dunstan schrieb am 2015-01-21: On 01/21/2015 11:21 AM, Arne Scheffer wrote: Why is it a bad thing to call the column stddev_samp analog to the aggregate function or make a note in the documentation, that the sample stddev is used to compute the solution? I think you are making a mountain out of a molehill, frankly. These stats are not intended as anything other than a pretty indication of the shape, to see if they are significantly influenced by outliers. For any significantly large sample size the difference will be negligible. You're right, I maybe exaggerated the statistics part a bit. I wanted to help, because the patch is of interest for us. I will try to keep focus in the future. But I will add a note to the documentation, that seems reasonable. *happy* Thx Arne -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf
On Wed, Jan 21, 2015 at 3:38 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Jan 20, 2015 at 9:38 PM, Robert Haas robertmh...@gmail.com wrote: Okay and I was also not in favour of this approach. Okay I agree with this. The reason why sourcefile and sourceline are not sufficient is that they can only give the information about the setting in last file it is present. Assume max_connections (or any other setting) is available in both postgresql.conf and postgresql.auto.conf, then it will display the information about the setting in postgresql.auto.conf, so now user might not be able to decide whether that is the setting he want to retain unless he knows the information about setting in postgresql.conf. Now as I have suggested upthread, that we can have a new view pg_file_settings which will display information about settings even when there exists multiple entries for the same in different files. I think adding such information to existing view pg_settings would be difficult as the same code is used for show commands which we don't want to change. I think this new view is updated only when postmaster received SIGHUP or is started. And we can have new function like pg_update_file_setting() which updates this view. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional role attributes superuser review
All, I'm slightly mystified as to how including the word online helps here. It's unlikely that there will be an offline_backup permission, because if the system is off-line, SQL-level permissions are irrelevant. After re-reading through this thread is seems like EXCLUSIVEBACKUP (proposed by Magnus) seemed to be a potentially acceptable alternative. Relevant post: http://www.postgresql.org/message-id/cabuevez7bz0r85vut4rvxx0jkpih8hp8toqzgvpufl0kvcv...@mail.gmail.com We need to separate the logical backups (pg_dump) from the physical ones (start/stop+filesystem and pg_basebackup). We might also need to separate the two different ways of doing physical backups. Personalyl I think using the DUMP name makes that a lot more clear. Maybe we need to avoid using BACKUP alone as well, to make sure it doesn't go the other way - using BASEBACKUP and EXCLUSIVEBACKUP for those two different ones perhaps? Relevant post: http://www.postgresql.org/message-id/20141231144610.gs3...@tamriel.snowman.net I think I'm coming around to the EXCLUSIVEBACKUP option, per the discussion with Magnus. I don't particularly like LOGBACKUP and don't think it really makes sense, while PHYSBACKUP seems like it'd cover what REPLICATION already does. Thoughts? -Adam -- Adam Brightwell - adam.brightw...@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com
Re: [HACKERS] New CF app: changing email sender
On Tue, Jan 20, 2015 at 1:44 AM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hello, the new app's clean looking gets my favor and the integrated operation will do good for me:) By the way, I got the following notice when I tried to Add comment on the new app. Note!: This form will generate an email to the public mailinglist pgsql-hackers, with sender set to horiguchi.kyot...@oss.ntt.co.jp! Hmm. The mail address indeed *was* mine but is now obsolete, so that the email might bounce. But I haven't find how to change it within the app itself, and the PostgreSQL community account page. https://www.postgresql.org/account/ Could you tell me how do I change the sender address? Hi! I've just pushed a change to the main website which now lets you change the email address there. That meas you can go to https://www.postgresql.org/account/profile/ and change the email. After you have confirmed the change (an email will be sent to your new address when you change it), the the new one should be valid on the cf app (if it doesn't show up, log out of the cf app and back in, and it should show up). I will look into Andrews issue of being able to have multiple email addresses soon as well - but one feature per evening :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] proposal: plpgsql - Assert statement
Hi all is there some agreement on this behave of ASSERT statement? I would to assign last patch to next commitfest. Possible changes: * I would to simplify a behave of evaluating of message expression - probably I disallow NULL there. * GUC enable_asserts will be supported * a assert exception should not be handled by PLpgSQL handler -- like CANCEL exception Regards Pavel 2014-12-14 19:54 GMT+01:00 Pavel Stehule pavel.steh...@gmail.com: 2014-12-14 18:57 GMT+01:00 Alvaro Herrera alvhe...@2ndquadrant.com: Pavel Stehule wrote: Hi any comments to last proposal and patch? WTH is that pstrdup(hint is null) thing? Debugging leftover? No, only not well error message. I propose a syntax ASSERT boolean_expression [, text expression ] -- text expression is hint for assertion debugging This text expression should not be null when it is used. I am not sure, what is well behave - so when assertions fails and text expression is null, then I use message hint is null as hint. Regards Pavel -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
* Jim Nasby (jim.na...@bluetreble.com) wrote: On 1/21/15 5:38 PM, Stephen Frost wrote: Being startup-only won't help if the user is a superuser. Crap, I thought postgresql.auto.conf was handled as an #include and therefore you could still preempt it via postgresql.conf It's not just that.. Having superuser access should really be considered equivilant to having a shell as the unix user that postgres is running as. If this is being done for every execution of a query then I agree- SQL or plpgsql probably wouldn't be fast enough. That doesn't mean it makes sense to have pgaudit support calling a C function, it simply means that we need to find another way to configure auditing (which is what I think I've done...). I'm still nervous about overloading this onto the roles system; I think it will end up being very easy to accidentally break. But if others think it'll work then I guess I'm just being paranoid. Break in which way..? If you're saying it'll be easy for a user to misconfigure then I might agree with you- but documentation and examples can help to address that. If you're worried that future PG hacking will break it, well, I tend to doubt the GRANT piece is the area of concern there- the recent development work is really around event triggers and adding new object classes; the GRANT components have been reasonably stable for the past few years. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel Seq Scan
On Wed, Jan 21, 2015 at 4:31 PM, Amit Langote amitlangot...@gmail.com wrote: On Wednesday, January 21, 2015, Amit Kapila amit.kapil...@gmail.com wrote: Does it happen only when parallel_seqscan_degree max_worker_processes? I have max_worker_processes set to the default of 8 while parallel_seqscan_degree is 4. So, this may be a case different from Thom's. I think this is due to reason that memory for forming tuple in master backend is retained for longer time which is causing this statement to take much longer time than required. I have fixed the other issue as well reported by you in attached patch. I think this patch is still not completely ready for general purpose testing, however it could be helpful if we can run some tests to see in what kind of scenario's it gives benefit like in the test you are doing if rather than increasing seq_page_cost, you should add an expensive WHERE condition so that it should automatically select parallel plan. I think it is better to change one of the new parameter's (parallel_setup_cost, parallel_startup_cost and cpu_tuple_comm_cost) if you want your statement to use parallel plan, like in your example if you would have reduced cpu_tuple_comm_cost, it would have selected parallel plan, that way we can get some feedback about what should be the appropriate default values for the newly added parameters. I am already planing to do some tests in that regard, however if I get some feedback from other's that would be helpful. (Please point out me if my understanding is incorrect.) What happen if dynamic background worker process tries to reference temporary tables? Because buffer of temporary table blocks are allocated on private address space, its recent status is not visible to other process unless it is not flushed to the storage every time. Do we need to prohibit create_parallelscan_paths() to generate a path when target relation is temporary one? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On 21-01-2015 PM 09:43, Amit Kapila wrote: On Wed, Jan 21, 2015 at 4:31 PM, Amit Langote amitlangot...@gmail.com wrote: On Wednesday, January 21, 2015, Amit Kapila amit.kapil...@gmail.com wrote: Does it happen only when parallel_seqscan_degree max_worker_processes? I have max_worker_processes set to the default of 8 while parallel_seqscan_degree is 4. So, this may be a case different from Thom's. I think this is due to reason that memory for forming tuple in master backend is retained for longer time which is causing this statement to take much longer time than required. I have fixed the other issue as well reported by you in attached patch. Thanks for fixing. I think this patch is still not completely ready for general purpose testing, however it could be helpful if we can run some tests to see in what kind of scenario's it gives benefit like in the test you are doing if rather than increasing seq_page_cost, you should add an expensive WHERE condition so that it should automatically select parallel plan. I think it is better to change one of the new parameter's (parallel_setup_cost, parallel_startup_cost and cpu_tuple_comm_cost) if you want your statement to use parallel plan, like in your example if you would have reduced cpu_tuple_comm_cost, it would have selected parallel plan, that way we can get some feedback about what should be the appropriate default values for the newly added parameters. I am already planing to do some tests in that regard, however if I get some feedback from other's that would be helpful. Perhaps you are aware or you've postponed working on it, but I see that a plan executing in a worker does not know about instrumentation. It results in the EXPLAIN ANALYZE showing incorrect figures. For example compare the normal seqscan and parallel seqscan below: postgres=# EXPLAIN ANALYZE SELECT * FROM test WHERE sqrt(a) 3456 AND md5(a::text) LIKE 'ac%'; QUERY PLAN --- Seq Scan on test (cost=0.00..310228.52 rows=16120 width=4) (actual time=0.497..17062.436 rows=39028 loops=1) Filter: ((sqrt((a)::double precision) 3456::double precision) AND (md5((a)::text) ~~ 'ac%'::text)) Rows Removed by Filter: 9960972 Planning time: 0.206 ms Execution time: 17378.413 ms (5 rows) postgres=# EXPLAIN ANALYZE SELECT * FROM test WHERE sqrt(a) 3456 AND md5(a::text) LIKE 'ac%'; QUERY PLAN --- Parallel Seq Scan on test (cost=0.00..255486.08 rows=16120 width=4) (actual time=7.329..4906.981 rows=39028 loops=1) Filter: ((sqrt((a)::double precision) 3456::double precision) AND (md5((a)::text) ~~ 'ac%'::text)) Rows Removed by Filter: 1992710 Number of Workers: 4 Number of Blocks Per Worker: 8849 Planning time: 0.137 ms Execution time: 6077.782 ms (7 rows) Note the Rows Removed by Filter. I guess the difference may be because, all the rows filtered by workers were not accounted for. I'm not quite sure, but since exec_worker_stmt goes the Portal way, QueryDesc.instrument_options remains unset and hence no instrumentation opportunities in a worker backend. One option may be to pass instrument_options down to worker_stmt? By the way, 17s and 6s compare really well in favor of parallel seqscan above, :) Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On 22-01-2015 PM 02:30, Amit Kapila wrote: Perhaps you are aware or you've postponed working on it, but I see that a plan executing in a worker does not know about instrumentation. I have deferred it until other main parts are stabilised/reviewed. Once that is done, we can take a call what is best we can do for instrumentation. Thom has reported the same as well upthread. Ah, I missed Thom's report. Note the Rows Removed by Filter. I guess the difference may be because, all the rows filtered by workers were not accounted for. I'm not quite sure, but since exec_worker_stmt goes the Portal way, QueryDesc.instrument_options remains unset and hence no instrumentation opportunities in a worker backend. One option may be to pass instrument_options down to worker_stmt? I think there is more to it, master backend need to process that information as well. I see. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Thu, Jan 22, 2015 at 6:37 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: (Please point out me if my understanding is incorrect.) What happen if dynamic background worker process tries to reference temporary tables? Because buffer of temporary table blocks are allocated on private address space, its recent status is not visible to other process unless it is not flushed to the storage every time. Do we need to prohibit create_parallelscan_paths() to generate a path when target relation is temporary one? Yes, we need to prohibit parallel scans on temporary relations. Will fix. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Parallel Seq Scan
On Thu, Jan 22, 2015 at 10:44 AM, Amit Langote langote_amit...@lab.ntt.co.jp wrote: On 21-01-2015 PM 09:43, Amit Kapila wrote: On Wed, Jan 21, 2015 at 4:31 PM, Amit Langote amitlangot...@gmail.com wrote: On Wednesday, January 21, 2015, Amit Kapila amit.kapil...@gmail.com wrote: Does it happen only when parallel_seqscan_degree max_worker_processes? I have max_worker_processes set to the default of 8 while parallel_seqscan_degree is 4. So, this may be a case different from Thom's. I think this is due to reason that memory for forming tuple in master backend is retained for longer time which is causing this statement to take much longer time than required. I have fixed the other issue as well reported by you in attached patch. Thanks for fixing. I think this patch is still not completely ready for general purpose testing, however it could be helpful if we can run some tests to see in what kind of scenario's it gives benefit like in the test you are doing if rather than increasing seq_page_cost, you should add an expensive WHERE condition so that it should automatically select parallel plan. I think it is better to change one of the new parameter's (parallel_setup_cost, parallel_startup_cost and cpu_tuple_comm_cost) if you want your statement to use parallel plan, like in your example if you would have reduced cpu_tuple_comm_cost, it would have selected parallel plan, that way we can get some feedback about what should be the appropriate default values for the newly added parameters. I am already planing to do some tests in that regard, however if I get some feedback from other's that would be helpful. Perhaps you are aware or you've postponed working on it, but I see that a plan executing in a worker does not know about instrumentation. I have deferred it until other main parts are stabilised/reviewed. Once that is done, we can take a call what is best we can do for instrumentation. Thom has reported the same as well upthread. Note the Rows Removed by Filter. I guess the difference may be because, all the rows filtered by workers were not accounted for. I'm not quite sure, but since exec_worker_stmt goes the Portal way, QueryDesc.instrument_options remains unset and hence no instrumentation opportunities in a worker backend. One option may be to pass instrument_options down to worker_stmt? I think there is more to it, master backend need to process that information as well. By the way, 17s and 6s compare really well in favor of parallel seqscan above, :) That sounds interesting. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] New CF app deployment
Magnus, Now that I'm back on this side of the Pacific, is there any additional data entry/cleanup which needs doing? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf
On Wed, Jan 21, 2015 at 9:43 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Wed, Jan 21, 2015 at 3:38 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Jan 20, 2015 at 9:38 PM, Robert Haas robertmh...@gmail.com wrote: The reason why sourcefile and sourceline are not sufficient is that they can only give the information about the setting in last file it is present. Assume max_connections (or any other setting) is available in both postgresql.conf and postgresql.auto.conf, then it will display the information about the setting in postgresql.auto.conf, so now user might not be able to decide whether that is the setting he want to retain unless he knows the information about setting in postgresql.conf. Now as I have suggested upthread, that we can have a new view pg_file_settings which will display information about settings even when there exists multiple entries for the same in different files. I think adding such information to existing view pg_settings would be difficult as the same code is used for show commands which we don't want to change. I think this new view is updated only when postmaster received SIGHUP or is started. And we can have new function like pg_update_file_setting() which updates this view. If that is doable without much complication, then it might not be bad idea to just add additional columns to existing view (pg_settings). I think you can once evaluate the details like what additional columns (other than what pg_settings has) are required and how you want to update them. After doing so further discussion could be more meaningful. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Thu, Jan 22, 2015 at 8:22 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Amit Kapila wrote: On Wed, Jan 21, 2015 at 8:51 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I didn't understand the coding in GetQueryResult(); why do we check the result status of the last returned result only? It seems simpler to me to check it inside the loop, but maybe there's a reason you didn't do it like that? Also, what is the reason we were ignoring those errors only in completedb mode? It doesn't seem like it would cause any harm if we did it in all cases. That means we can just not have the completeDb flag at all. IIRC it is done to match the existing behaviour where such errors are ignored we use this utility to vacuum database. I think that's fine, but we should do it always, not just in whole-database mode. I've been hacking this patch today BTW; hope to have something to show tomorrow. Great! With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Amit Kapila wrote: On Wed, Jan 21, 2015 at 8:51 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I didn't understand the coding in GetQueryResult(); why do we check the result status of the last returned result only? It seems simpler to me to check it inside the loop, but maybe there's a reason you didn't do it like that? Also, what is the reason we were ignoring those errors only in completedb mode? It doesn't seem like it would cause any harm if we did it in all cases. That means we can just not have the completeDb flag at all. IIRC it is done to match the existing behaviour where such errors are ignored we use this utility to vacuum database. I think that's fine, but we should do it always, not just in whole-database mode. I've been hacking this patch today BTW; hope to have something to show tomorrow. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?
Sure, but nobody who is not a developer is going to care about that. A typical user who sees pgstat wait timeout, or doesn't, isn't going to be able to make anything at all out of that. As a user, I wholeheartedly disagree. That warning helped me massively in diagnosing an unhealthy database server in the past at TripAdvisor (i.e. high end server class box, not a raspberry pie). I have realtime monitoring that looks at pg_stat_database at regular intervals particularly for the velocity of change of xact_commit and xact_rollback columns, similar to how check_postgres does it. https://github.com/bucardo/check_postgres/blob/master/check_postgres.pl#L4234 When one of those servers was unhealthy, it stopped reporting statistics for 30 seconds+ at a time. My dashboard which polled far more frequently than that indicated the server was normally processing 0 tps with intermittent spikes. I went directly onto the server and sampled pg_stat_database. That warning was the only thing that directly indicated that the statistics collector was not to be trusted. It obviously was a victim of what was going on in the server, but its pretty important to know when your methods for measuring server health are lying to you. The spiky TPS at first glance appears like some sort of live lock, not just that the server is overloaded. Now, I know: 0 change in stats = collector broken. Rereading the docks, Also, the collector itself emits a new report at most once per PGSTAT_STAT_INTERVAL milliseconds (500 ms unless altered while building the server). Without context this merely reads: We sleep for 500ms, plus the time to write the file, plus whenever the OS decides to enforce the timer interrupt... so like 550-650ms. It doesn't read, When server is unhealthy, but _still_ serving queries, the stats collector might not be able to keep up and will just stop reporting stats all together. I think the warning is incredibly valuable. Along those lines I'd also love to see a pg_stat_snapshot_timestamp() for monitoring code to use to determine if its using a stale snapshot, as well as helping to smooth graphs of the statistics that are based on highly granular snapshotting. - Matt Kelly
Re: [HACKERS] New CF app deployment
On Thu, Jan 22, 2015 at 10:47 AM, Josh Berkus j...@agliodbs.com wrote: Now that I'm back on this side of the Pacific, is there any additional data entry/cleanup which needs doing? An extra look would be worth it. Magnus or I may have missed patch entries between the old and new apps. My2c. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?
On 2015-01-21 22:43:03 -0500, Matt Kelly wrote: Sure, but nobody who is not a developer is going to care about that. A typical user who sees pgstat wait timeout, or doesn't, isn't going to be able to make anything at all out of that. As a user, I wholeheartedly disagree. Note that that the change we discussed wasn't removing the message. It was changing the log level from WARNING to LOG. Which means the change is not going to the client anymore, but still to the server log (perhaps surprisingly, the likelihood for the latter actually increases). I think the warning is incredibly valuable. Along those lines I'd also love to see a pg_stat_snapshot_timestamp() for monitoring code to use to determine if its using a stale snapshot, as well as helping to smooth graphs of the statistics that are based on highly granular snapshotting. I can see that being useful. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dereferenced pointers checked as NULL in btree_utils_var.c
On 01/21/2015 07:14 AM, Michael Paquier wrote: Also, looking at the code of gist, gbt_var_same is called through gistKeyIsEQ, which is used for an insertion or for a split. The point is that when doing an insertion, a call to gistgetadjusted is done and we look if one of the keys is NULL. If one of them is, code does not go through gistKeyIsEQ, so the NULL checks do not seem necessary in gbt_var_same. I think you are confusing NULL pointers with an SQL NULL. gistgetadjusted checks that it's not an SQL NULL (!oldisnull[i]), but it does not check if it's a NULL pointer (DatumGetPointer(oldentries[i].key) != NULL). The case we're worried about is that the value is not an SQL NULL, i.e. isnull flag is false, but the Datum is a NULL pointer. Nevertheless, looking at the code, I don't that a NULL pointer can ever be passed to the same-method, for any of the built-in or contrib opclasses, but it's very confusing because some functions check for that. My proof goes like this: 1. The key value passed as argument must've been originally formed by the compress, union, or picksplit methods. 1.1. Some compress methods do nothing, and just return the passed-in key, which comes from the table and cannot be a NULL pointer (the compress method is never called for SQL NULLs). Other compress methods don't check for a NULL pointer, and would crash if there was one. gist_poly_compress() and gist_circle_compress() do check for a NULL, but they only return a NULL key if the input key is NULL, which cannot happen because the input comes from a table. So I believe the NULL-checks in those functions are dead code, and none of the compress methods ever return a NULL key. 1.2. None of the union methods return a NULL pointer (nor expect one as input). 1.3. None of the picksplit methods return a NULL pointer. They all return one of the original values, or one formed with the union method. The picksplit method can return a NULL pointer in the spl_ldatum or spl_rdatum field, in the degenerate case that it puts all keys on the same halve. However, the caller, gistUserPickSplit checks for that and immediately overwrites spl_ldatum and spl_rdatum with sane values in that case. I don't understand what this sentence in the documentation on the compress method means: Depending on your needs, you could also need to care about compressing NULL values in there, storing for example (Datum) 0 like gist_circle_compress does. The compress method is never called for NULLs, so the above is nonsense. I think it should be removed, as well as the checks in gist_circle_compress and gist_poly_compress. According to git history, the check in gist_circle_compress been there ever since the module was imported into contrib/rtree_gist, in 2001. The documentation was added later: commit a0a3883dd977d6618899ccd14258a0696912a9d2 Author: Tom Lane t...@sss.pgh.pa.us Date: Fri Jun 12 19:48:53 2009 + Improve documentation about GiST opclass support functions. Dimitri Fontaine I'm guessing that Tom added that sentence (it was not in the patch that Dimitri submitted originally) just because there was that check in the existing function, without realizing that the check was in fact dead code. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Partitioning: issues/ideas (Was: Re: [HACKERS] On partitioning)
On 21-01-2015 AM 01:42, Robert Haas wrote: On Mon, Jan 19, 2015 at 8:48 PM, Amit Langote langote_amit...@lab.ntt.co.jp wrote: Specifically, do we regard a partitions as pg_inherits children of its partitioning parent? I don't think this is totally an all-or-nothing decision. I think everyone is agreed that we need to not break things that work today -- e.g. Merge Append. What that implies for pg_inherits isn't altogether clear. One point is that an implementation may end up establishing the parent-partition hierarchy somewhere other than (or in addition to) pg_inherits. One intention would be to avoid tying partitioning scheme to certain inheritance features that use pg_inherits. For example, consider call sites of find_all_inheritors(). One notable example is Append/MergeAppend which would be of interest to partitioning. We would want to reuse that part of the infrastructure but we could might as well write an equivalent, say find_all_partitions() which scans something other than pg_inherits to get all partitions. IMHO, there's little reason to avoid putting pg_inherits entries in for the partitions, and then this just works. We can find other ways to make it work if that turns out to be better, but if we don't have one, there's no reason to complicate things. Ok, I will go forward and stick to pg_inherits approach for now. Perhaps the concerns I am expressing have other solutions that don't require abandoning pg_inherits approach altogether. Agree that some concrete idea of internal representation should help guide the catalog structure. If we are going to cache the partitioning info in relcache (which we most definitely will), then we should try to make sure to consider the scenario of having a lot of partitioned tables with a lot of individual partitions. It looks like it would be similar to a scenarios where there are a lot of inheritance hierarchies. But, availability of partitioning feature would definitely cause these numbers to grow larger. Perhaps this is an important point driving this discussion. Yeah, it would be good if the costs of supporting, say, 1000 partitions were negligible. A primary question for me about partition-pruning is when do we do it? Should we model it after relation_excluded_by_constraints() and hence totally plan-time? But, the tone of the discussion is that we postpone partition-pruning to execution-time and hence my perhaps misdirected attempts to inject it into some executor machinery. It's useful to prune partitions at plan time, because then you only have to do the work once. But sometimes you don't know enough to do it at plan time, so it's useful to do it at execution time, too. Then, you can do it differently for every tuple based on the actual value you have. There's no point in doing 999 unnecessary relation scans if we can tell which partition the actual run-time value must be in. But I think execution-time pruning can be a follow-on patch. If you don't restrict the scope of the first patch as much as possible, you're not going to have much luck getting this committed. Ok, I will limit myself to focusing on following things at the moment: * Provide syntax in CREATE TABLE to declare partition key * Provide syntax in CREATE TABLE to declare a table as partition of a partitioned table and values it contains * Arrange to have partition key and values stored in appropriate catalogs (existing or new) * Arrange to cache partitioning info of partitioned tables in relcache Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Wednesday, January 21, 2015, Amit Kapila amit.kapil...@gmail.com wrote: On Wed, Jan 21, 2015 at 12:47 PM, Amit Langote langote_amit...@lab.ntt.co.jp javascript:_e(%7B%7D,'cvml','langote_amit...@lab.ntt.co.jp'); wrote: On 20-01-2015 PM 11:29, Amit Kapila wrote: Note - I have yet to handle the new node types introduced at some of the places and need to verify prepared queries and some other things, however I think it will be good if I can get some feedback at current stage. I got an assertion failure: In src/backend/executor/execTuples.c: ExecStoreTuple() /* passing shouldFree=true for a tuple on a disk page is not sane */ Assert(BufferIsValid(buffer) ? (!shouldFree) : true); Good Catch! The reason is that while master backend is scanning from a heap page, if it finds another tuple/tuples's from shared memory message queue it will process those tuples first and in such a scenario, the scan descriptor will still have reference to buffer which it is using from scanning from heap. Your proposed fix will work. After fixing this, the assertion failure seems to be gone though I observed the blocked (CPU maxed out) state as reported elsewhere by Thom Brown. Does it happen only when parallel_seqscan_degree max_worker_processes? I have max_worker_processes set to the default of 8 while parallel_seqscan_degree is 4. So, this may be a case different from Thom's. Thanks, Amit
Re: [HACKERS] Re: [COMMITTERS] pgsql: Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2
--On 20. Januar 2015 17:15:01 +0100 Andres Freund and...@2ndquadrant.com wrote: I personally think that being able to at least compile/make check old versions a bit longer is a good idea. +1 from me for this idea. -- Thanks Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Wed, Jan 21, 2015 at 12:47 PM, Amit Langote langote_amit...@lab.ntt.co.jp wrote: On 20-01-2015 PM 11:29, Amit Kapila wrote: Note - I have yet to handle the new node types introduced at some of the places and need to verify prepared queries and some other things, however I think it will be good if I can get some feedback at current stage. I got an assertion failure: In src/backend/executor/execTuples.c: ExecStoreTuple() /* passing shouldFree=true for a tuple on a disk page is not sane */ Assert(BufferIsValid(buffer) ? (!shouldFree) : true); Good Catch! The reason is that while master backend is scanning from a heap page, if it finds another tuple/tuples's from shared memory message queue it will process those tuples first and in such a scenario, the scan descriptor will still have reference to buffer which it is using from scanning from heap. Your proposed fix will work. After fixing this, the assertion failure seems to be gone though I observed the blocked (CPU maxed out) state as reported elsewhere by Thom Brown. Does it happen only when parallel_seqscan_degree max_worker_processes? Thanks for checking the patch. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Error check always bypassed in tablefunc.c
On Tue, Jan 20, 2015 at 4:05 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Tue, Jan 20, 2015 at 8:47 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Mon, Jan 19, 2015 at 11:06 PM, Joe Conway m...@joeconway.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/19/2015 08:16 AM, Alvaro Herrera wrote: Haven't looked at this patch, but I wonder if it would be better to replace the innards of connectby with a rewrite of the query to use standard WITH queries. Maybe we can remove a couple hundred lines from tablefunc.c? Seems like a good idea -- connectby is really obsolete for quite a while now other than as an SRF example. I guess we only keep it around for backwards compatibility? For master, yes we could brush up things a bit. Now do we really do the same for back-branches? I would think that the answer there is something close to the patch I sent. So, using a WITH RECURSIVE, here is a query equivalent to what connectby does: [...] Thoughts? Looking at this stuff, actually I do not think that it is possible for now to support orderby_fld at the same level with WITH RECURSIVE in connectby because we need to reorder the items of the base table within the 2nd query of the UNION ALL to give something like that and WITH RECURSIVE does not support ORDER BY (and mutual recursion between WITH items). Another thing to note is that WITH RECURSIVE weakens the infinite recursion detection. I don't think it would be good to weaken that... Attached is a result of this random hacking, resulting in some cleanup btw: 1 file changed, 110 insertions(+), 264 deletions(-) Regards, -- Michael diff --git a/contrib/tablefunc/tablefunc.c b/contrib/tablefunc/tablefunc.c index 3388fab..e7c3674 100644 --- a/contrib/tablefunc/tablefunc.c +++ b/contrib/tablefunc/tablefunc.c @@ -54,7 +54,6 @@ static Tuplestorestate *get_crosstab_tuplestore(char *sql, bool randomAccess); static void validateConnectbyTupleDesc(TupleDesc tupdesc, bool show_branch, bool show_serial); static bool compatCrosstabTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2); -static bool compatConnectbyTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2); static void get_normal_pair(float8 *x1, float8 *x2); static Tuplestorestate *connectby(char *relname, char *key_fld, @@ -68,21 +67,6 @@ static Tuplestorestate *connectby(char *relname, MemoryContext per_query_ctx, bool randomAccess, AttInMetadata *attinmeta); -static Tuplestorestate *build_tuplestore_recursively(char *key_fld, - char *parent_key_fld, - char *relname, - char *orderby_fld, - char *branch_delim, - char *start_with, - char *branch, - int level, - int *serial, - int max_depth, - bool show_branch, - bool show_serial, - MemoryContext per_query_ctx, - AttInMetadata *attinmeta, - Tuplestorestate *tupstore); typedef struct { @@ -1161,14 +1145,16 @@ connectby(char *relname, Tuplestorestate *tupstore = NULL; int ret; MemoryContext oldcontext; - - int serial = 1; + StringInfoData inner_sql, outer_sql, orderby_sql; + char **values; + int num_cols; /* Connect to SPI manager */ if ((ret = SPI_connect()) 0) /* internal error */ elog(ERROR, connectby: SPI_connect returned %d, ret); + /* switch to longer term context to create the tuple store */ oldcontext = MemoryContextSwitchTo(per_query_ctx); @@ -1177,244 +1163,137 @@ connectby(char *relname, MemoryContextSwitchTo(oldcontext); - /* now go get the whole tree */ - tupstore = build_tuplestore_recursively(key_fld, - parent_key_fld, - relname, - orderby_fld, - branch_delim, - start_with, - start_with, /* current_branch */ - 0, /* initial level is 0 */ - serial, /* initial serial is 1 */ - max_depth, - show_branch, - show_serial, - per_query_ctx, - attinmeta, - tupstore); - - SPI_finish(); - - return tupstore; -} - -static Tuplestorestate * -build_tuplestore_recursively(char *key_fld, - char *parent_key_fld, - char *relname, - char *orderby_fld, - char *branch_delim, - char *start_with, - char *branch, - int level, - int *serial, - int max_depth, - bool show_branch, - bool show_serial, - MemoryContext per_query_ctx, - AttInMetadata *attinmeta, - Tuplestorestate *tupstore) -{ - TupleDesc tupdesc = attinmeta-tupdesc; - int ret; - int proc; - int serial_column; - StringInfoData sql; - char **values; - char *current_key; - char *current_key_parent; - char current_level[INT32_STRLEN]; - char serial_str[INT32_STRLEN]; - char *current_branch; - HeapTuple tuple; - - if (max_depth 0 level max_depth) - return tupstore; - - initStringInfo(sql); - - /* Build initial sql statement */ - if (!show_serial) - { -
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
[After a discussion on IRC with Stephen…] At 2015-01-20 21:47:02 -0500, sfr...@snowman.net wrote: Review the opening of this email though and consider that we could look at what privileges has the audit role granted to the current role? as defining what is to be audited. Right, I understand now how that would work. I'll try to find time to (a) implement this, (b) remove the backwards-compatibility code, and (c) split up the USE_DEPARSE_FUNCTIONS stuff. For example, what if I want to see all the tables created and dropped by a particular user? I hadn't been intending to address that with this scheme, but I think we have that by looking for privilege grants where the audit role is the grantee and the role-to-be-audited the grantor. For CREATE, yes, with a bit of extra ACL-checking code in the utility hook; but I don't think we'll get very far without the ability to log ALTER/DROP too. :-) So there has to be some alternative mechanism for that, and I'm hoping Robert (or anyone!) has something in mind. Well, I was primairly digging for someone to say yes, the object access stuff will be reworked to be based on event triggers, thus removing the need for the object access bits. (This doesn't entirely make sense to me, but it's tangential anyway, so I won't comment on it for now.) -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
Peter == Peter Geoghegan p...@heroku.com writes: Peter You'll probably prefer the attached. This patch works by Peter disabling abbreviation, but only after writing out runs, with Peter the final merge left to go. That way, it doesn't matter when Peter abbreviated keys are not read back from disk (or regenerated). This seems tolerable to me for a quick fix. The merits of storing the abbreviation vs. re-abbreviating on input can be studied later. Peter I believe this bug was missed because it only occurs when there Peter are multiple runs, and not in the common case where there is one Peter big initial run that is found already sorted when we reach Peter mergeruns(). Ah, yes, there is an optimization for the one-run case which bypasses all further comparisons, hiding the problem. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg
On Tue, 2015-01-20 at 23:37 +0100, Tomas Vondra wrote: Tom's message where he points that out is here: http://www.postgresql.org/message-id/20707.1396372...@sss.pgh.pa.us That message also says: I think a patch that stood a chance of getting committed would need to detect whether the aggregate was being called in simple or grouped contexts, and apply different behaviors in the two cases. I take that as an objection to any patch which does not distinguish between the grouped and ungrouped aggregate cases, which includes your patch. I don't agree with that objection (or perhaps I don't understand it); but given the strong words above, I need to get some kind of response before I can consider committing your patch. I generally agree that having two API 'facets' with different behavior is slightly awkward and assymetric, but I wouldn't call that ugly. Right, your words are more precise (and polite). My apologies. I actually modified both APIs initially, but I think Ali is right that not breaking the existing API (and keeping the original behavior in that case) is better. We can break it any time we want in the future, but it's impossible to unbreak it ;-) We can't break the old API, and I'm not suggesting that we do. I was hoping to find some alternative. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
* Abhijit Menon-Sen (a...@2ndquadrant.com) wrote: At 2015-01-20 21:47:02 -0500, sfr...@snowman.net wrote: Review the opening of this email though and consider that we could look at what privileges has the audit role granted to the current role? as defining what is to be audited. Right, I understand now how that would work. I'll try to find time to (a) implement this, (b) remove the backwards-compatibility code, and (c) split up the USE_DEPARSE_FUNCTIONS stuff. Great! Thanks! For example, what if I want to see all the tables created and dropped by a particular user? I hadn't been intending to address that with this scheme, but I think we have that by looking for privilege grants where the audit role is the grantee and the role-to-be-audited the grantor. For CREATE, yes, with a bit of extra ACL-checking code in the utility hook; but I don't think we'll get very far without the ability to log ALTER/DROP too. :-) So there has to be some alternative mechanism for that, and I'm hoping Robert (or anyone!) has something in mind. ALTER/DROP can be logged based on the USAGE privilege for the schema. We can't differentiate those cases but we can at least handle them as a group. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Re: [COMMITTERS] pgsql: Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2
Bernd Helmle wrote: --On 20. Januar 2015 17:15:01 +0100 Andres Freund and...@2ndquadrant.com wrote: I personally think that being able to at least compile/make check old versions a bit longer is a good idea. +1 from me for this idea. Already done yesterday :-) Thanks, -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel mode and parallel contexts
On Wed, Jan 21, 2015 at 2:11 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Jan 20, 2015 at 9:52 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jan 20, 2015 at 9:41 AM, Amit Kapila amit.kapil...@gmail.com wrote: It seems [WaitForBackgroundWorkerShutdown] has possibility to wait forever. Assume one of the worker is not able to start (not able to attach to shared memory or some other reason), then status returned by GetBackgroundWorkerPid() will be BGWH_NOT_YET_STARTED and after that it can wait forever in WaitLatch. I don't think that's right. The status only remains BGWH_NOT_YET_STARTED until the postmaster forks the child process. I think the control flow can reach the above location before postmaster could fork all the workers. Consider a case that we have launched all workers during ExecutorStart phase and then before postmaster could start all workers an error occurs in master backend and then it try to Abort the transaction and destroy the parallel context, at that moment it will get the above status and wait forever in above code. I am able to reproduce above scenario with debugger by using parallel_seqscan patch. Hmm. Well, if you can reproduce it, there clearly must be a bug. But I'm not quite sure where. What should happen in that case is that the process that started the worker has to wait for the postmaster to actually start it, and then after that for the new process to die, and then it should return. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
Peter == Peter Geoghegan p...@heroku.com writes: Peter Basically, the intersection of the datum sort case with Peter abbreviated keys seems complicated. Not to me. To me it seems completely trivial. Now, I follow this general principle that someone who is not doing the work should never say X is easy to someone who _is_ doing it, unless they're prepared to at least outline the solution on request or otherwise contribute. So see the attached patch (which I will concede could probably do with more comments, it's a quick hack intended for illustration) and tell me what you think is missing that would make it a complicated problem. Peter I tended to think that the solution was to force a heaptuple Peter sort instead (where abbreviation naturally can be used), This seems completely wrong - why should the caller have to worry about this implementation detail? The caller shouldn't have to know about what types or what circumstances might or might not benefit from abbreviation. -- Andrew (irc:RhodiumToad) diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c index b6e302f..0dbb277 100644 --- a/src/backend/utils/sort/tuplesort.c +++ b/src/backend/utils/sort/tuplesort.c @@ -901,30 +901,34 @@ tuplesort_begin_datum(Oid datumType, Oid sortOperator, Oid sortCollation, state-copytup = copytup_datum; state-writetup = writetup_datum; state-readtup = readtup_datum; + state-abbrevNext = 10; state-datumType = datumType; - /* Prepare SortSupport data */ - state-onlyKey = (SortSupport) palloc0(sizeof(SortSupportData)); - - state-onlyKey-ssup_cxt = CurrentMemoryContext; - state-onlyKey-ssup_collation = sortCollation; - state-onlyKey-ssup_nulls_first = nullsFirstFlag; - /* - * Conversion to abbreviated representation infeasible in the Datum case. - * It must be possible to subsequently fetch original datum values within - * tuplesort_getdatum(), which would require special-case preservation of - * original values. - */ - state-onlyKey-abbreviate = false; - - PrepareSortSupportFromOrderingOp(sortOperator, state-onlyKey); - /* lookup necessary attributes of the datum type */ get_typlenbyval(datumType, typlen, typbyval); state-datumTypeLen = typlen; state-datumTypeByVal = typbyval; + /* Prepare SortSupport data */ + state-sortKeys = (SortSupport) palloc0(sizeof(SortSupportData)); + + state-sortKeys-ssup_cxt = CurrentMemoryContext; + state-sortKeys-ssup_collation = sortCollation; + state-sortKeys-ssup_nulls_first = nullsFirstFlag; + state-sortKeys-abbreviate = !typbyval; + + PrepareSortSupportFromOrderingOp(sortOperator, state-sortKeys); + + /* + * The onlyKey optimization cannot be used with abbreviated keys, since + * tie-breaker comparisons may be required. Typically, the optimization is + * only of value to pass-by-value types anyway, whereas abbreviated keys + * are typically only of value to pass-by-reference types. + */ + if (!state-sortKeys-abbrev_converter) + state-onlyKey = state-sortKeys; + MemoryContextSwitchTo(oldcontext); return state; @@ -1318,10 +1322,43 @@ tuplesort_putdatum(Tuplesortstate *state, Datum val, bool isNull) } else { - stup.datum1 = datumCopy(val, false, state-datumTypeLen); + Datum original = datumCopy(val, false, state-datumTypeLen); stup.isnull1 = false; - stup.tuple = DatumGetPointer(stup.datum1); + stup.tuple = DatumGetPointer(original); USEMEM(state, GetMemoryChunkSpace(stup.tuple)); + + if (!state-sortKeys-abbrev_converter) + { + stup.datum1 = original; + } + else if (!consider_abort_common(state)) + { + /* Store abbreviated key representation */ + stup.datum1 = state-sortKeys-abbrev_converter(original, + state-sortKeys); + } + else + { + /* Abort abbreviation */ + int i; + + stup.datum1 = original; + + /* + * Set state to be consistent with never trying abbreviation. + * + * Alter datum1 representation in already-copied tuples, so as to + * ensure a consistent representation (current tuple was just handled). + * Note that we rely on all tuples copied so far actually being + * contained within memtuples array. + */ + for (i = 0; i state-memtupcount; i++) + { +SortTuple *mtup = state-memtuples[i]; + +mtup-datum1 = PointerGetDatum(mtup-tuple); + } + } } puttuple_common(state, stup); @@ -1887,9 +1924,9 @@ tuplesort_getdatum(Tuplesortstate *state, bool forward, else { if (should_free) - *val = stup.datum1; + *val = PointerGetDatum(stup.tuple); else - *val = datumCopy(stup.datum1, false, state-datumTypeLen); + *val = datumCopy(PointerGetDatum(stup.tuple), false, state-datumTypeLen); *isNull = false; } @@ -3712,9 +3749,22 @@ readtup_index(Tuplesortstate *state, SortTuple *stup, static int comparetup_datum(const SortTuple *a, const SortTuple *b, Tuplesortstate *state) { - return ApplySortComparator(a-datum1, a-isnull1, - b-datum1, b-isnull1, - state-onlyKey); + int compare; + +
Re: [HACKERS] Parallel Seq Scan
On Wed, Jan 21, 2015 at 4:31 PM, Amit Langote amitlangot...@gmail.com wrote: On Wednesday, January 21, 2015, Amit Kapila amit.kapil...@gmail.com wrote: Does it happen only when parallel_seqscan_degree max_worker_processes? I have max_worker_processes set to the default of 8 while parallel_seqscan_degree is 4. So, this may be a case different from Thom's. I think this is due to reason that memory for forming tuple in master backend is retained for longer time which is causing this statement to take much longer time than required. I have fixed the other issue as well reported by you in attached patch. I think this patch is still not completely ready for general purpose testing, however it could be helpful if we can run some tests to see in what kind of scenario's it gives benefit like in the test you are doing if rather than increasing seq_page_cost, you should add an expensive WHERE condition so that it should automatically select parallel plan. I think it is better to change one of the new parameter's (parallel_setup_cost, parallel_startup_cost and cpu_tuple_comm_cost) if you want your statement to use parallel plan, like in your example if you would have reduced cpu_tuple_comm_cost, it would have selected parallel plan, that way we can get some feedback about what should be the appropriate default values for the newly added parameters. I am already planing to do some tests in that regard, however if I get some feedback from other's that would be helpful. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com parallel_seqscan_v5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
David G Johnston schrieb am 2015-01-21: Andrew Dunstan wrote On 01/20/2015 01:26 PM, Arne Scheffer wrote: And a very minor aspect: The term standard deviation in your code stands for (corrected) sample standard deviation, I think, because you devide by n-1 instead of n to keep the estimator unbiased. How about mentioning the prefix sample to indicate this beiing the estimator? I don't understand. I'm following pretty exactly the calculations stated at lt;http://www.johndcook.com/blog/standard_deviation/gt; I'm not a statistician. Perhaps others who are more literate in statistics can comment on this paragraph. I'm largely in the same boat as Andrew but... I take it that Arne is referring to: http://en.wikipedia.org/wiki/Bessel's_correction Yes, it is. but the mere presence of an (n-1) divisor does not mean that is what is happening. In this particular situation I believe the (n-1) simply is a necessary part of the recurrence formula and not any attempt to correct for sampling bias when estimating a population's variance. That's wrong, it's applied in the end to the sum of squared differences and therefore per definition the corrected sample standard deviation estimator. In fact, as far as the database knows, the values provided to this function do represent an entire population and such a correction would be unnecessary. I That would probably be an exotic assumption in a working database and it is not, what is computed here! guess it boils down to whether future queries are considered part of the population or whether the population changes upon each query being run and thus we are calculating the ever-changing population variance. Yes, indeed correct. And exactly to avoid that misunderstanding, I suggested to use the sample term. To speak in Postgresql terms; applied in Andrews/Welfords algorithm is stddev_samp(le), not stddev_pop(ulation). Therefore stddev in Postgres is only kept for historical reasons, look at http://www.postgresql.org/docs/9.4/static/functions-aggregate.html Table 9-43. VlG-Arne Note point 3 in the linked Wikipedia article. David J. -- View this message in context: http://postgresql.nabble.com/Add-min-and-max-execute-statement-time-in-pg-stat-statement-tp5774989p5834805.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
I don't understand. I'm following pretty exactly the calculations stated at lt;http://www.johndcook.com/blog/standard_deviation/gt; I'm not a statistician. Perhaps others who are more literate in Maybe I'm mistaken here, but I think, the algorithm is not that complicated. I try to explain it further: Comments appreciated. Definition var_samp = Sum of squared differences /n-1 Definition stddev_samp = sqrt(var_samp) Example N=4 1.) Sum of squared differences 1_4Sum(Xi-XM4)² = 2.) adding nothing 1_4Sum(Xi-XM4)² +0 +0 +0 = 3.) nothing changed 1_4Sum(Xi-XM4)² +(-1_3Sum(Xi-XM3)²+1_3Sum(Xi-XM3)²) +(-1_2Sum(Xi-XM2)²+1_2Sum(Xi-XM3)²) +(-1_1Sum(Xi-XM2)²+1_1Sum(Xi-XM3)²) = 4.) parts reordered (1_4Sum(Xi-XM4)²-1_3Sum(Xi-XM3)²) +(1_3Sum(Xi-XM3)²-1_2Sum(Xi-XM2)²) +(1_2Sum(Xi-XM2)²-1_1Sum(Xi-XM2)²) +1_1Sum(X1-XM1)² = 5.) (X4-XM4)(X4-XM3) + (X3-XM3)(X3-XM2) + (X2-XM2)(X2-XM1) + (X1-XM1)² = 6.) XM1=X1 = There it is - The iteration part of Welfords Algorithm (in reverse order) (X4-XM4)(X4-XM3) + (X3-XM3)(X3-XM2) + (X2-XM2)(X2-X1) + 0 The missing piece is 4.) to 5.) it's algebra, look at e.g.: http://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/ Thanks. Still not quite sure what to do, though :-) I guess in the end we want the answer to come up with similar results to the builtin stddev SQL function. I'll try to set up a test program, to see if we do. If you want to go this way: Maybe this is one of the very few times, you have to use a small sample ;-) VlG-Arne cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Sorry, corrected second try because of copypaste mistakes: VlG-Arne Comments appreciated. Definition var_samp = Sum of squared differences /n-1 Definition stddev_samp = sqrt(var_samp) Example N=4 1.) Sum of squared differences 1_4Sum(Xi-XM4)² = 2.) adding nothing 1_4Sum(Xi-XM4)² +0 +0 +0 = 3.) nothing changed 1_4Sum(Xi-XM4)² +(-1_3Sum(Xi-XM3)²+1_3Sum(Xi-XM3)²) +(-1_2Sum(Xi-XM2)²+1_2Sum(Xi-XM2)²) +(-1_1Sum(Xi-XM1)²+1_1Sum(Xi-XM1)²) = 4.) parts reordered (1_4Sum(Xi-XM4)²-1_3Sum(Xi-XM3)²) +(1_3Sum(Xi-XM3)²-1_2Sum(Xi-XM2)²) +(1_2Sum(Xi-XM2)²-1_1Sum(Xi-XM1)²) +1_1Sum(X1-XM1)² = 5.) (X4-XM4)(X4-XM3) + (X3-XM3)(X3-XM2) + (X2-XM2)(X2-XM1) + (X1-XM1)² = 6.) XM1=X1 = There it is - The iteration part of Welfords Algorithm (in reverse order) (X4-XM4)(X4-XM3) + (X3-XM3)(X3-XM2) + (X2-XM2)(X2-X1) + 0 The missing piece is 4.) to 5.) it's algebra, look at e.g.: http://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/21/2015 09:27 AM, Arne Scheffer wrote: Sorry, corrected second try because of copypaste mistakes: VlG-Arne Comments appreciated. Definition var_samp = Sum of squared differences /n-1 Definition stddev_samp = sqrt(var_samp) Example N=4 1.) Sum of squared differences 1_4Sum(Xi-XM4)² = 2.) adding nothing 1_4Sum(Xi-XM4)² +0 +0 +0 = 3.) nothing changed 1_4Sum(Xi-XM4)² +(-1_3Sum(Xi-XM3)²+1_3Sum(Xi-XM3)²) +(-1_2Sum(Xi-XM2)²+1_2Sum(Xi-XM2)²) +(-1_1Sum(Xi-XM1)²+1_1Sum(Xi-XM1)²) = 4.) parts reordered (1_4Sum(Xi-XM4)²-1_3Sum(Xi-XM3)²) +(1_3Sum(Xi-XM3)²-1_2Sum(Xi-XM2)²) +(1_2Sum(Xi-XM2)²-1_1Sum(Xi-XM1)²) +1_1Sum(X1-XM1)² = 5.) (X4-XM4)(X4-XM3) + (X3-XM3)(X3-XM2) + (X2-XM2)(X2-XM1) + (X1-XM1)² = 6.) XM1=X1 = There it is - The iteration part of Welfords Algorithm (in reverse order) (X4-XM4)(X4-XM3) + (X3-XM3)(X3-XM2) + (X2-XM2)(X2-X1) + 0 The missing piece is 4.) to 5.) it's algebra, look at e.g.: http://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/ I have no idea what you are saying here. Here are comments in email to me from the author of http://www.johndcook.com/blog/standard_deviation regarding the divisor used: My code is using the unbiased form of the sample variance, dividing by n-1. It's usually not worthwhile to make a distinction between a sample and a population because the population is often itself a sample. For example, if you could measure the height of everyone on earth at one instance, that's the entire population, but it's still a sample from all who have lived and who ever will live. Also, for large n, there's hardly any difference between 1/n and 1/(n-1). Maybe I should add that in the code comments. Otherwise, I don't think we need a change. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel mode and parallel contexts
On Wed, Jan 21, 2015 at 6:35 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jan 21, 2015 at 2:11 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Jan 20, 2015 at 9:52 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jan 20, 2015 at 9:41 AM, Amit Kapila amit.kapil...@gmail.com wrote: It seems [WaitForBackgroundWorkerShutdown] has possibility to wait forever. Assume one of the worker is not able to start (not able to attach to shared memory or some other reason), then status returned by GetBackgroundWorkerPid() will be BGWH_NOT_YET_STARTED and after that it can wait forever in WaitLatch. I don't think that's right. The status only remains BGWH_NOT_YET_STARTED until the postmaster forks the child process. I think the control flow can reach the above location before postmaster could fork all the workers. Consider a case that we have launched all workers during ExecutorStart phase and then before postmaster could start all workers an error occurs in master backend and then it try to Abort the transaction and destroy the parallel context, at that moment it will get the above status and wait forever in above code. I am able to reproduce above scenario with debugger by using parallel_seqscan patch. Hmm. Well, if you can reproduce it, there clearly must be a bug. But I'm not quite sure where. What should happen in that case is that the process that started the worker has to wait for the postmaster to actually start it, Okay, I think this should solve the issue, also it should be done before call of TerminateBackgroundWorker(). With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
I didn't understand the coding in GetQueryResult(); why do we check the result status of the last returned result only? It seems simpler to me to check it inside the loop, but maybe there's a reason you didn't do it like that? Also, what is the reason we were ignoring those errors only in completedb mode? It doesn't seem like it would cause any harm if we did it in all cases. That means we can just not have the completeDb flag at all. Finally, I think it's better to report the missing relation error, even if we're going to return true to continue processing other tables. That makes the situation clearer to the user. So the function would end up looking like this: /* * GetQueryResult * * Process the query result. Returns true if there's no error, false * otherwise -- but errors about trying to vacuum a missing relation are * ignored. */ static bool GetQueryResult(PGconn *conn, errorOptions *erropts) { PGresult*result = NULL; SetCancelConn(conn); while ((result = PQgetResult(conn)) != NULL) { /* * If errors are found, report them. Errors about a missing table are * harmless so we continue processing, but die for other errors. */ if (PQresultStatus(result) != PGRES_COMMAND_OK) { char *sqlState = PQresultErrorField(result, PG_DIAG_SQLSTATE); fprintf(stderr, _(%s: vacuuming of database \%s\ failed: %s), erropts-progname, erropts-dbname, PQerrorMessage(conn)); if (sqlState strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0) { PQclear(result); return false; } } PQclear(result); } ResetCancelConn(); return true; } -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers