Re: [HACKERS] tablespaces inside $PGDATA considered harmful
On 26/09/17 20:44, Mark Kirkwood wrote: $ pg_basebackup -D . WARNING: could not read symbolic link "pg_tblspc/space1": Invalid argument pg_basebackup: directory "/data0/pgdata/11/pg_tblspc/space1" exists but is not empty pg_basebackup: removing contents of data directory "." Err - actually this example is wrong - sorry. In fact pg_basebackup is complaining because it does not want to overwrite the contents of the tablespace (need to use the -T option as I'm on the same host)! A correct example of pg_basebackup failing due to tablespaces inside $PGDATA/pg_tblspc can be easily demonstrated by trying to set up streaming replication on another host: $ pg_basebackup -h 10.0.119.100 -P -D . WARNING: could not read symbolic link "pg_tblspc/space1": Invalid argument pg_basebackup: could not create directory "./pg_tblspc": File exists Fortunately this can be worked around by changing to tar format: $ pg_basebackup -h 10.0.119.100 -Ft -P -D . WARNING: could not read symbolic link "pg_tblspc/space1": Invalid argument 1560632/1560632 kB (100%), 2/2 tablespaces ...however, not that great that the plain mode is busted. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug with pg_basebackup and 'shared' tablespace
On 30/09/17 06:43, Robert Haas wrote: On Fri, Sep 29, 2017 at 2:06 AM, Michael Paquierwrote: My tendency about this patch is still that it should be rejected. This is presenting additional handling for no real gain. I vehemently disagree. If the server lets you create a tablespace, then everything that happens after that ought to work. On another thread, there is the issue that if you create a tablespace inside $PGDATA, things break. We should either unbreak those things or not allow creating the tablespace in the first place. On this thread, there is the issue that if you create two tablespaces for different PG versions in the same directory, things break. We should either unbreak those things or not allow creating the tablespace in the first place. It is completely awful behavior to let users do things and then punish them later for having done them. Users are not obliged to read the minds of the developers and guess what things the developers consider "reasonable". They should be able to count on the principle that if they do something that we consider wrong, they'll get an error when they try to do it -- not have it superficially appear to work and then explode later. To put that another way, there should be ONE rule about what is or is not allowable in a particular situation, and all commands, utilities, etc. that deal with that situation should handle it in a uniform fashion. Each .c file shouldn't get to make up its own notion of what is or is not supported. +1 It seems simply wrong (and potentially dangerous) to allow users to arrange a system state that cannot be backed up (easily/without surgery etc etc). Also the customer concerned that did exactly that started the conversation about it with me like this (paraphrasing) 'So this pg_basebackup thing is a bit temperamental...'. I'm thinking we do not want to be giving users this impression. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tablespaces inside $PGDATA considered harmful
On 29/04/15 09:35, Bruce Momjian wrote: On Fri, Apr 24, 2015 at 01:05:03PM -0400, Bruce Momjian wrote: This way, both pg_dump and pg_upgrade will issue warnings, though, of course, those warnings can be ignored. I am hopeful these two warnings will be sufficient and we will not need make these errors, with the possible inconvenience it will cause. I am still afraid that someone will ignore the new errors pg_dump would generate and lose data. I just don't remember enough cases where we threw new errors on _data_ restore. Frankly, those using pg_upgrade already will have to move the old tablespaces out of the old cluster if they ever want to delete those clusters, so I am hopeful these additional warnings will help eliminate this practice, which is already cumbersome and useless. I am not planning to revisit this for 9.6. (resurrecting an old thread) I encountered this the other day, a customer had created tablespaces with directories inside $PGDATA/pg_tblspc. This is just pathalogical - e.g (v11 checkout with PGDATA=/data0/pgdata/11): bench=# CREATE TABLESPACE space1 LOCATION '/data0/pgdata/11/pg_tblspc/space1'; WARNING: tablespace location should not be inside the data directory CREATE TABLESPACE bench=# ALTER TABLE pgbench_accounts SET TABLESPACE space1; ALTER TABLE Ok, so I've been warned: $ pg_basebackup -D . WARNING: could not read symbolic link "pg_tblspc/space1": Invalid argument pg_basebackup: directory "/data0/pgdata/11/pg_tblspc/space1" exists but is not empty pg_basebackup: removing contents of data directory "." So pg_basebackup is completely broken by this construction - should we not prohibit the creation of tablespace directories under $PGDATA (or at least $PGDATA/pg_tblspc) at this point? regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MAIN, Uncompressed?
On 26/08/17 12:18, Simon Riggs wrote: On 25 August 2017 at 20:53, Tom Lanewrote: Greg Stark writes: I think this is a particularly old piece of code and we're lucky the default heuristics have served well for all this time because I doubt many people fiddle with these storage attributes. The time may have come to come up with a better UI for the storage attributes because people are doing new things (like json) and wanting more control over this heuristic. Yeah, I could get behind a basic rethinking of the tuptoaster control knobs. I'm just not in love with Simon's specific proposal, especially not if we can't think of a better name for it than "MAINU". Extended/External would be just fine if you could set the toast target, so I think a better suggestion would be to make "toast_target" a per-attribute option . +1, have thought about this myself previouslythank you for bringing it up! regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On 21/07/17 15:58, Joshua D. Drake wrote: On 07/19/2017 07:57 PM, Tom Lane wrote: Peter Geogheganwrites: My argument for the importance of index bloat to the more general bloat problem is simple: any bloat that accumulates, that cannot be cleaned up, will probably accumulate until it impacts performance quite noticeably. But that just begs the question: *does* it accumulate indefinitely, or does it eventually reach a more-or-less steady state? The traditional wisdom about btrees, for instance, is that no matter how full you pack them to start with, the steady state is going to involve something like 1/3rd free space. You can call that bloat if you want, but it's not likely that you'll be able to reduce the number significantly without paying exorbitant costs. I'm not claiming that we don't have any problems, but I do think it's important to draw a distinction between bloat and normal operating overhead. Agreed but we aren't talking about 30% I don't think. Here is where I am at. It took until 30 minutes ago for the tests to finish: name | setting -+--- autovacuum | on autovacuum_analyze_scale_factor | 0.1 autovacuum_analyze_threshold| 50 autovacuum_freeze_max_age | 2 autovacuum_max_workers | 3 autovacuum_multixact_freeze_max_age | 4 autovacuum_naptime | 60 autovacuum_vacuum_cost_delay| 20 autovacuum_vacuum_cost_limit| -1 autovacuum_vacuum_scale_factor | 0.2 autovacuum_vacuum_threshold | 50 autovacuum_work_mem | -1 log_autovacuum_min_duration | -1 Test 1: 55G/srv/main TPS:955 Test 2: 112G/srv/main TPS:531 (Not sure what happened here, long checkpoint?) Test 3: 109G/srv/main TPS:868 Test 4: 143G TPS:840 Test 5: 154G TPS: 722 I am running the query here: https://wiki.postgresql.org/wiki/Index_Maintenance#Summarize_keyspace_of_a_B-Tree_index And will post a followup. Once the query finishes I am going to launch the tests with autovacuum_vacuum_cost_limit of 5000. Is there anything else you folks would like me to change? I usually advise setting autovacuum_naptime = 10s (or even 5s) for workloads that do a lot of updates (or inserts + deletes) - as on modern HW a lot of churn can happen in 1 minute, and that just makes vacuum's job harder. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New partitioning - some feedback
On 16/07/17 05:24, David Fetter wrote: On Fri, Jul 14, 2017 at 09:49:25PM -0500, Robert Haas wrote: On Mon, Jul 10, 2017 at 5:46 PM, David Fetterwrote: With utmost respect, it's less messy than adding '!' to the already way too random and mysterious syntax of psql's \ commands. What should '\det!' mean? What about '\dT!'? Since \det lists foreign tables, \det! would list foreign tables even if they are partitions. Plain \det would show only the ones that are not partitions. \dT! wouldn't be meaningful, since \dT lists data types and data types can't be partitions. If you're trying to conjure up a rule that every \d command must accept the same set of modifiers, a quick look at the output of \? and a little experimentation will quickly show you that neither S nor + apply to all command types, so I see no reason why that would need to be true for a new modifier either. TBH, I think we should just leave this well enough alone. We're post-beta2 now, there's no clear consensus on what to do here, and there will be very little opportunity for users to give us feedback if we stick a change into an August beta3 before a September final release. I think a new modifier would be too rushed at this stage, but there's no reason to throw out the progress on \d vs \dt. +1 And similarly, there seemed to be a reasonably clear push to label the 'partitions' as such. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New partitioning - some feedback
On 07/07/17 19:54, Michael Banck wrote: On Fri, Jul 07, 2017 at 07:40:55PM +1200, Mark Kirkwood wrote: On 07/07/17 13:29, Amit Langote wrote: Someone complained about this awhile back [1]. And then it came up again [2], where Noah appeared to take a stance that partitions should be visible in views / output of commands that list "tables". Although I too tend to prefer not filling up the \d output space by listing partitions (pg_class.relispartition = true relations), there wasn't perhaps enough push for creating a patch for that. If some committer is willing to consider such a patch, I can make one. Yeah, me too (clearly). However if the consensus is that all these partition tables *must* be shown in \d output, then I'd be happy if they were identified as such rather than just 'table' (e.g 'partition table'). +1. Or maybe just 'partition' is enough if 'partition table' would widen the column output unnecessarily. Yeah, that is probably better (and 'partition table' is potentially confusing as Robert pointed out). Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New partitioning - some feedback
On 07/07/17 13:29, Amit Langote wrote: Someone complained about this awhile back [1]. And then it came up again [2], where Noah appeared to take a stance that partitions should be visible in views / output of commands that list "tables". Although I too tend to prefer not filling up the \d output space by listing partitions (pg_class.relispartition = true relations), there wasn't perhaps enough push for creating a patch for that. If some committer is willing to consider such a patch, I can make one. Yeah, me too (clearly). However if the consensus is that all these partition tables *must* be shown in \d output, then I'd be happy if they were identified as such rather than just 'table' (e.g 'partition table'). regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] New partitioning - some feedback
I've been trying out the new partitioning in version 10. Firstly, I must say this is excellent - so much nicer than the old inheritance based method! My only niggle is the display of partitioned tables via \d etc. e.g: part=# \d List of relations Schema | Name | Type | Owner +--+---+-- public | date_fact| table | postgres public | date_fact_201705 | table | postgres public | date_fact_201706 | table | postgres public | date_fact_20170601 | table | postgres public | date_fact_2017060100 | table | postgres public | date_fact_201707 | table | postgres public | date_fact_rest | table | postgres (7 rows) Now it can be inferred from the names that date_fact is a partitioned table and the various date_fact_ are its partitions - but \d is not providing any hints of this. The more detailed individual describe is fine: part=# \d date_fact Table "public.date_fact" Column | Type | Collation | Nullable | Default +--+---+--+- id | integer | | not null | dte| timestamp with time zone | | not null | val| integer | | not null | Partition key: RANGE (dte) Number of partitions: 6 (Use \d+ to list them.) I'd prefer *not* to see a table and its partitions all intermixed in the same display (especially with nothing indicating which are partitions) - as this will make for unwieldy long lists when tables have many partitions. Also it would be good if the 'main' partitioned table and its 'partitions' showed up as a different type in some way. I note the they do in pg_class: part=# SELECT relname,relkind,relispartition FROM pg_class WHERE relname LIKE 'date_fact%'; relname| relkind | relispartition --+-+ date_fact| p | f date_fact_201705 | r | t date_fact_201706 | r | t date_fact_20170601 | r | t date_fact_2017060100 | r | t date_fact_201707 | r | t date_fact_rest | r | t (7 rows) ...so it looks to be possible to hide the partitions from the main display and/or mark them as such. Now I realize that making this comment now that beta is out is a bit annoying - apologies, but I think seeing a huge list of 'tables' is going to make \d frustrating for folk doing partitioning. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical replication: \dRp+ and "for all tables"
On 15/06/17 11:10, Tom Lane wrote: Jeff Janeswrites: On Sat, Jun 10, 2017 at 7:42 AM, Tom Lane wrote: In the second place, this really fails to respond to what I'd call the main usability problem with \dRp+, which is that the all-tables property is likely to lead to an unreadably bulky list of affected tables. What I'd say the patch ought to do is *replace* \dRp+'s list of affected tables with a notation like "(all tables)" when puballtables is true. I'd considered that, but I find the pager does a fine job of dealing with the bulkiness of the list. Have you tried it with a few tens of thousands of tables? Even if your pager makes it work comfortably, others might find it less satisfactory. I thought it might be a good idea to not only point out that it is all tables, but also remind people of exactly what tables those are currently (in case it had slipped their mind that all tables will include table from other schemas not in their search_path, for example) I'm not really buying that. If they don't know what "all tables" means, a voluminous list isn't likely to help much. I was hoping we'd get some more votes in this thread, but it seems like we've only got three, and by my count two of them are for just printing "all tables". I'd certainly prefer to see 'all tables' - in addition to being more compact, it also reflects more correctly how the publication was defined. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Make ANALYZE more selective about what is a "most common value"?
On 06/06/17 10:12, Gavin Flower wrote: On 06/06/17 09:41, Mark Kirkwood wrote: On 05/06/17 09:30, Tom Lane wrote: I've been thinking about the behavior discussed in https://www.postgresql.org/message-id/flat/20170522132017.29944.48391%40wrigleys.postgresql.org and it seems to me that there are a couple of things we ought to do about it. First, I think we need a larger hard floor on the number of occurrences of a value that're required to make ANALYZE decide it is a "most common value". The existing coding is willing to believe that anything that appears at least twice in the sample is a potential MCV, but that design originated when we were envisioning stats samples of just a few thousand rows --- specifically, default_statistics_target was originally just 10, leading to a 3000-row sample size. So accepting two-appearance values as MCVs would lead to a minimum MCV frequency estimate of 1/1500. Now it could be a tenth or a hundredth of that. As a round number, I'm thinking that a good floor would be a frequency estimate of 1/1000. With today's typical sample size of 3 rows, a value would have to appear at least 30 times in the sample to be believed to be an MCV. That seems like it gives us a reasonable margin of error against the kind of sampling noise seen in the above-cited thread. Second, the code also has a rule that potential MCVs need to have an estimated frequency at least 25% larger than what it thinks the "average" value's frequency is. A rule of that general form seems like a good idea, but I now think the 25% threshold is far too small to do anything useful. In particular, in any case like this where there are more distinct values than there are sample rows, the "average frequency" estimate will correspond to less than one occurrence in the sample, so that this rule is totally useless to filter anything that we would otherwise consider as an MCV. I wonder if we shouldn't make it be "at least double the estimated average frequency". Or possibly calculate the sample standard deviation and make use of that to help decide on a more flexible cutoff than twice the avg frequency? Are there any research papers that might help us here (I'm drowning in a sea of barely relevant search results for most phrases I've tried so far)? I recall there were some that Tom referenced when this stuff was originally written. On the other hand I do have access to some mathematicians specializing in statistics - so can get their thoughts on this issue if you feel it would be worthwhile. Cheers Mark The standard deviation (sd) is proportional to the square root of the number in the sample in a Normal Distribution. In a Normal Distribution, about 2/3 the values will be within plus or minus one sd of the mean. There seems to be an implicit assumption that the distribution of values follows the Normal Distribution - has this been verified? I suspect that real data will have a skewed distribution of values, and may even be multi modal (multiple peaks) The Normal Distribution has one central peak with 2 tails of the same shape & size. So in a sample of 100 the sd is proportional to 10%, for 10,000 the sd is proportional to 1%. So essentially, the larger the sample size the more reliable is knowledge of the most common values (ignoring pathologically extreme distributions!) - the measure of reliability depends on the distribution. How about selecting the cut off as the mean plus one sd, or something of that nature? Note that the cut off point may result in no mcv's being selected - especially for small samples. If practicable, it would be good to sample real datasets. Suggest looking at datasets were the current mechanism looks reasonable, and ones were the estimates are too far off. Also, if possible, try any new selection method on the datasets and see what the difference is. The above is based on what I remember from my university statistics papers, I took it up to 4th year level many moons ago. With respect to the assumption of Normal distribution - it is easy to come up with an example that shows it need not be: consider our our Pgbench 'accounts' table. The distribution of 'bid' values is not Normal (it is Uniform). Now I realized I made people think about Normal distributions by mentioning computing the standard deviation of the sample (and I probably should have said 'standard deviation of the *sample frequencies*') - but this was only a suggestion for working out how to (maybe) be less arbitrary about how we decide what values to put in the MCV list (currently 25% different from the mean frequency). regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Make ANALYZE more selective about what is a "most common value"?
On 05/06/17 09:30, Tom Lane wrote: I've been thinking about the behavior discussed in https://www.postgresql.org/message-id/flat/20170522132017.29944.48391%40wrigleys.postgresql.org and it seems to me that there are a couple of things we ought to do about it. First, I think we need a larger hard floor on the number of occurrences of a value that're required to make ANALYZE decide it is a "most common value". The existing coding is willing to believe that anything that appears at least twice in the sample is a potential MCV, but that design originated when we were envisioning stats samples of just a few thousand rows --- specifically, default_statistics_target was originally just 10, leading to a 3000-row sample size. So accepting two-appearance values as MCVs would lead to a minimum MCV frequency estimate of 1/1500. Now it could be a tenth or a hundredth of that. As a round number, I'm thinking that a good floor would be a frequency estimate of 1/1000. With today's typical sample size of 3 rows, a value would have to appear at least 30 times in the sample to be believed to be an MCV. That seems like it gives us a reasonable margin of error against the kind of sampling noise seen in the above-cited thread. Second, the code also has a rule that potential MCVs need to have an estimated frequency at least 25% larger than what it thinks the "average" value's frequency is. A rule of that general form seems like a good idea, but I now think the 25% threshold is far too small to do anything useful. In particular, in any case like this where there are more distinct values than there are sample rows, the "average frequency" estimate will correspond to less than one occurrence in the sample, so that this rule is totally useless to filter anything that we would otherwise consider as an MCV. I wonder if we shouldn't make it be "at least double the estimated average frequency". Or possibly calculate the sample standard deviation and make use of that to help decide on a more flexible cutoff than twice the avg frequency? Are there any research papers that might help us here (I'm drowning in a sea of barely relevant search results for most phrases I've tried so far)? I recall there were some that Tom referenced when this stuff was originally written. On the other hand I do have access to some mathematicians specializing in statistics - so can get their thoughts on this issue if you feel it would be worthwhile. Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical replication - still unstable after all these months
On 05/06/17 13:08, Mark Kirkwood wrote: On 05/06/17 00:04, Erik Rijkers wrote: On 2017-05-31 16:20, Erik Rijkers wrote: On 2017-05-31 11:16, Petr Jelinek wrote: [...] Thanks to Mark's offer I was able to study the issue as it happened and found the cause of this. [0001-Improve-handover-logic-between-sync-and-apply-worker.patch] This looks good: -- out_20170531_1141.txt 100 -- pgbench -c 90 -j 8 -T 60 -P 12 -n -- scale 25 100 -- All is well. So this is 100x a 1-minute test with 100x success. (This on the most fastidious machine (slow disks, meagre specs) that used to give 15% failures) [Improve-handover-logic-between-sync-and-apply-worker-v2.patch] No errors after (several days of) running variants of this. (2500x 1 minute runs; 12x 1-hour runs) Same here, no errors with the v2 patch applied (approx 2 days - all 1 minute runs) Further, reapplying the v1 patch (with a bit of editing as I wanted to apply it to my current master), gets a failure with missing rows in the history table quite quickly. I'll put back the v2 patch and resume runs with that, but I'm cautiously optimistic that the v2 patch solves the issue. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical replication - still unstable after all these months
On 05/06/17 00:04, Erik Rijkers wrote: On 2017-05-31 16:20, Erik Rijkers wrote: On 2017-05-31 11:16, Petr Jelinek wrote: [...] Thanks to Mark's offer I was able to study the issue as it happened and found the cause of this. [0001-Improve-handover-logic-between-sync-and-apply-worker.patch] This looks good: -- out_20170531_1141.txt 100 -- pgbench -c 90 -j 8 -T 60 -P 12 -n -- scale 25 100 -- All is well. So this is 100x a 1-minute test with 100x success. (This on the most fastidious machine (slow disks, meagre specs) that used to give 15% failures) [Improve-handover-logic-between-sync-and-apply-worker-v2.patch] No errors after (several days of) running variants of this. (2500x 1 minute runs; 12x 1-hour runs) Same here, no errors with the v2 patch applied (approx 2 days - all 1 minute runs) regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical replication - still unstable after all these months
On 03/06/17 11:10, Petr Jelinek wrote: On 02/06/17 22:29, Petr Jelinek wrote: On 02/06/17 08:55, Mark Kirkwood wrote: On 02/06/17 17:11, Erik Rijkers wrote: On 2017-06-02 00:46, Mark Kirkwood wrote: On 31/05/17 21:16, Petr Jelinek wrote: I'm seeing a new failure with the patch applied - this time the history table has missing rows. Petr, I'll put back your access :-) Is this error during 1-minute runs? I'm asking because I've moved back to longer (1-hour) runs (no errors so far), and I'd like to keep track of what the most 'vulnerable' parameters are. Yeah, still using your test config (with my minor modifications). When I got the error the 1st time, I did a complete make clean and rebuildbut it is still possible I've 'done it wrong' - so independent confirmation would be good! Well, I've seen this issue as well while I was developing the fix, but the patch I proposed fixed it for me as well as the original issue. While I was testing something for different thread I noticed that I manage transactions incorrectly in this patch. Fixed here, I didn't test it much yet (it takes a while as you know :) ). Not sure if it's related to the issue you've seen though. Ok - I've applied this version, and running tests again. I needed to do a git pull to apply the patch, so getting some other changes too! Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical replication - still unstable after all these months
On 02/06/17 17:11, Erik Rijkers wrote: On 2017-06-02 00:46, Mark Kirkwood wrote: On 31/05/17 21:16, Petr Jelinek wrote: I'm seeing a new failure with the patch applied - this time the history table has missing rows. Petr, I'll put back your access :-) Is this error during 1-minute runs? I'm asking because I've moved back to longer (1-hour) runs (no errors so far), and I'd like to keep track of what the most 'vulnerable' parameters are. Yeah, still using your test config (with my minor modifications). When I got the error the 1st time, I did a complete make clean and rebuildbut it is still possible I've 'done it wrong' - so independent confirmation would be good! Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical replication - still unstable after all these months
On 31/05/17 21:16, Petr Jelinek wrote: On 29/05/17 23:06, Mark Kirkwood wrote: On 29/05/17 23:14, Petr Jelinek wrote: On 29/05/17 03:33, Jeff Janes wrote: What would you want to look at? Would saving the WAL from the master be helpful? Useful info is, logs from provider (mainly the logical decoding logs that mention LSNs), logs from subscriber (the lines about when sync workers finished), contents of the pg_subscription_rel (with srrelid casted to regclass so we know which table is which), and pg_waldump output around the LSNs found in the logs and in the pg_subscription_rel (+ few lines before and some after to get context). It's enough to only care about LSNs for the table(s) that are out of sync. I have a run that aborted with failure (accounts table md5 mismatch). Petr - would you like to have access to the machine ? If so send me you public key and I'll set it up. Thanks to Mark's offer I was able to study the issue as it happened and found the cause of this. The busy loop in apply stops at the point when worker shmem state indicates that table synchronization was finished, but that might not be visible in the next transaction if it takes long to flush the final commit to disk so we might ignore couple of transactions for given table in the main apply because we think it's still being synchronized. This also explains why I could not reproduce it on my testing machine (fast ssd disk array where flushes were always fast) and why it happens relatively rarely because it's one specific commit during the whole synchronization process that needs to be slow. So as solution I changed the busy loop in the apply to wait for in-catalog status rather than in-memory status to make sure things are really there and flushed. While working on this I realized that the handover itself is bit more complex than necessary (especially for debugging and for other people understanding it) so I did some small changes as part of this patch to make the sequences of states table goes during synchronization process to always be the same. This might cause unnecessary update per one table synchronization process in some cases but that seems like small enough price to pay for clearer logic. And it also fixes another potential bug that I identified where we might write wrong state to catalog if main apply crashed while sync worker was waiting for status update. I've been running tests on this overnight on another machine where I was able to reproduce the original issue within few runs (once I found what causes it) and so far looks good. I'm seeing a new failure with the patch applied - this time the history table has missing rows. Petr, I'll put back your access :-) regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical replication - still unstable after all these months
On 29/05/17 23:14, Petr Jelinek wrote: On 29/05/17 03:33, Jeff Janes wrote: What would you want to look at? Would saving the WAL from the master be helpful? Useful info is, logs from provider (mainly the logical decoding logs that mention LSNs), logs from subscriber (the lines about when sync workers finished), contents of the pg_subscription_rel (with srrelid casted to regclass so we know which table is which), and pg_waldump output around the LSNs found in the logs and in the pg_subscription_rel (+ few lines before and some after to get context). It's enough to only care about LSNs for the table(s) that are out of sync. I have a run that aborted with failure (accounts table md5 mismatch). Petr - would you like to have access to the machine ? If so send me you public key and I'll set it up. Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical replication - still unstable after all these months
On 29/05/17 13:33, Jeff Janes wrote: On Sun, May 28, 2017 at 3:17 PM, Mark Kirkwood <mark.kirkw...@catalyst.net.nz <mailto:mark.kirkw...@catalyst.net.nz>> wrote: On 28/05/17 19:01, Mark Kirkwood wrote: So running in cloud land now...so for no errors - will update. The framework ran 600 tests last night, and I see 3 'NOK' results, i.e 3 failed test runs (all scale 25 and 8 pgbench clients). Given the way the test decides on failure (gets tired of waiting for the table md5's to match) - it begs the question 'What if it had waited a bit longer'? However from what I can see in all cases: - the rowcounts were the same in master and replica - the md5 of pgbench_accounts was different All four tables should be wrong if there is still a transaction it is waiting for, as all the changes happen in a single transaction. Yeah, my thought exactly. I also got a failure, after 87 iterations of a similar test case. It waited for hours, as mine requires manual intervention to stop waiting. On the subscriber, one account still had a zero balance, while the history table on the subscriber agreed with both history and accounts on the publisher and the account should not have been zero, so definitely a transaction atomicity got busted. Interesting, great that we are seeing the same thing. I altered the script to also save the tellers and branches tables and repeated the runs, but so far it hasn't failed again in over 800 iterations using the altered script. ...so does seem possible that there is some bug being tickled here. Unfortunately the test framework blasts away the failed tables and subscription and continues on...I'm going to amend it to stop on failure so I can have a closer look at what happened. What would you want to look at? Would saving the WAL from the master be helpful? Good question - I initially wanted to see if anything changed if I waited longer (which you have already figured out) and what was actually wrong with the accounts record (which you have gotten to as well)! Nice. After that, I'd thought of doing another transaction on an accounts record that lives in the same page as the 'gammy' one on the master - generally poke about and see what is happening :-) regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical replication - still unstable after all these months
On 29/05/17 16:26, Erik Rijkers wrote: On 2017-05-29 00:17, Mark Kirkwood wrote: On 28/05/17 19:01, Mark Kirkwood wrote: So running in cloud land now...so for no errors - will update. The framework ran 600 tests last night, and I see 3 'NOK' results, i.e 3 failed test runs (all scale 25 and 8 pgbench clients). Given the way Could you also give the params for the successful runs? Scale 25, clients 90 and 64, scale 5 clients 90, 64 and 8 (i.e the defaults in the driver script). Can you say anything about hardware? (My experience is that older, slower, 'worse' hardware makes for more fails.) It's running in a openstack cloud (so is a libvirt guest): 4 cpus, 4 GB ram and 2 disks: one for each Postgres instance, both formatted xfs. Hmm so maybe I should run a VM on my workstation and crank the IOPS limit way down...in the meantime I'll just let it run :-) Many thanks, by the way. I'm glad that it turns out I'm probably not doing something uniquely stupid (although I'm not glad that there seems to be a bug, and an elusive one at that) Yeah looks like something subtle :-( Hopefully now its out in the open we'll all figure it together! regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical replication - still unstable after all these months
On 28/05/17 19:01, Mark Kirkwood wrote: So running in cloud land now...so for no errors - will update. The framework ran 600 tests last night, and I see 3 'NOK' results, i.e 3 failed test runs (all scale 25 and 8 pgbench clients). Given the way the test decides on failure (gets tired of waiting for the table md5's to match) - it begs the question 'What if it had waited a bit longer'? However from what I can see in all cases: - the rowcounts were the same in master and replica - the md5 of pgbench_accounts was different ...so does seem possible that there is some bug being tickled here. Unfortunately the test framework blasts away the failed tables and subscription and continues on...I'm going to amend it to stop on failure so I can have a closer look at what happened. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical replication - still unstable after all these months
On 27/05/17 20:30, Erik Rijkers wrote: Here is what I have: instances.sh: starts up 2 assert enabled sessions instances_fast.sh: alternative to instances.sh starts up 2 assert disabled 'fast' sessions testset.sh loop to call pgbench_derail2.sh with varying params pgbench_derail2.sh main test program can be called 'standalone' ./pgbench_derail2.sh $scale $clients $duration $date_str so for instance this should work: ./pgbench_derail2.sh 25 64 60 20170527_1019 to remove publication and subscription from sessions, add a 5th parameter 'clean' ./pgbench_derail2.sh 1 1 1 1 'clean' pubsub.sh displays replication state. also called by pgbench_derail2.sh must be in path result.sh display results I keep this in a screen-session as: watch -n 20 './result.sh 201705' Peculiar to my setup also: server version at compile time stamped with date + commit hash I misuse information_schema.sql_packages at compile time to store patch information instances are in $pg_stuff_dir/pg_installations/pgsql. So you'll have to outcomment a line here and there, and adapt paths, ports, and things like that. It's a bit messy, I should have used perl from the beginning... Considering it is all shell - pretty nice! I spent a bit of time today getting this working in a vanilla Ubuntu 16.04 cloud server. I found a few things that didn't work (suspect Erik has some default env variables set for ports and databases). These were sufficient to stop logical replication working for me at all - due to no dbname specified in the subscription connection. Given I had to make some changes anyway, I moved all the config into one place (new file config.sh) - made all the programs use /bin/bash as interpreter (/bin/sh just does not work for scripts on Ubuntu), added ports and databases as reqd and fixed the need to mess too much with PATH (see attached diff). So running in cloud land now...so for no errors - will update. regards Mark diff -Naur test.orig/config.sh test/config.sh --- test.orig/config.sh 1970-01-01 12:00:00.0 +1200 +++ test/config.sh 2017-05-28 15:21:33.261701918 +1200 @@ -0,0 +1,13 @@ +#!/bin/bash +port1=6972 +project1=logical_replication +port2=6973 +project2=logical_replication2 +pg_stuff_dir=$HOME/pg_stuff +PATH1=$pg_stuff_dir/pg_installations/pgsql.$project1/bin:$PATH +PATH2=$pg_stuff_dir/pg_installations/pgsql.$project2/bin:$PATH +server_dir1=$pg_stuff_dir/pg_installations/pgsql.$project1 +server_dir2=$pg_stuff_dir/pg_installations/pgsql.$project2 +data_dir1=$server_dir1/data +data_dir2=$server_dir2/data +db=bench diff -Naur test.orig/instances_fast.sh test/instances_fast.sh --- test.orig/instances_fast.sh 2017-05-28 15:18:33.315780487 +1200 +++ test/instances_fast.sh 2017-05-28 15:19:02.511439749 +1200 @@ -1,17 +1,10 @@ -#!/bin/sh +#!/bin/bash #assertions on in $pg_stuff_dir/pg_installations/pgsql./bin #assertions off in $pg_stuff_dir/pg_installations/pgsql./bin.fast -port1=6972 project1=logical_replication -port2=6973 project2=logical_replication2 -pg_stuff_dir=$HOME/pg_stuff -PATH1=$pg_stuff_dir/pg_installations/pgsql.$project1/bin.fast:$PATH -PATH2=$pg_stuff_dir/pg_installations/pgsql.$project2/bin.fast:$PATH -server_dir1=$pg_stuff_dir/pg_installations/pgsql.$project1 -server_dir2=$pg_stuff_dir/pg_installations/pgsql.$project2 -data_dir1=$server_dir1/data -data_dir2=$server_dir2/data +. config.sh + options1=" -c wal_level=logical -c max_replication_slots=10 @@ -36,6 +29,6 @@ -c log_filename=logfile.${project2} -c log_replication_commands=on " -export PATH=$PATH1; PG=$(which postgres); $PG -D $data_dir1 -p $port1 ${options1} & -export PATH=$PATH2; PG=$(which postgres); $PG -D $data_dir2 -p $port2 ${options2} & +export PATH=$PATH1; export PG=$(which postgres); $PG -D $data_dir1 -p $port1 ${options1} & +export PATH=$PATH2; export PG=$(which postgres); $PG -D $data_dir2 -p $port2 ${options2} & diff -Naur test.orig/instances.sh test/instances.sh --- test.orig/instances.sh 2017-05-28 15:18:33.291780768 +1200 +++ test/instances.sh 2017-05-28 15:19:02.511439749 +1200 @@ -1,17 +1,9 @@ -#!/bin/sh +#!/bin/bash #assertions on in $pg_stuff_dir/pg_installations/pgsql./bin #assertions off in $pg_stuff_dir/pg_installations/pgsql./bin.fast -port1=6972 project1=logical_replication -port2=6973 project2=logical_replication2 -pg_stuff_dir=$HOME/pg_stuff -PATH1=$pg_stuff_dir/pg_installations/pgsql.$project1/bin:$PATH -PATH2=$pg_stuff_dir/pg_installations/pgsql.$project2/bin:$PATH -server_dir1=$pg_stuff_dir/pg_installations/pgsql.$project1 -server_dir2=$pg_stuff_dir/pg_installations/pgsql.$project2 -data_dir1=$server_dir1/data -data_dir2=$server_dir2/data +. config.sh options1=" -c wal_level=logical diff -Naur test.orig/pgbench_derail2.sh test/pgbench_derail2.sh --- test.orig/pgbench_derail2.sh 2017-05-28 15:18:33.363779926 +1200 +++ test/pgbench_derail2.sh 2017-05-28 15:19:02.511439749 +1200 @@ -11,11 +11,13 @@ #I
Re: [HACKERS] logical replication - still unstable after all these months
Sorry - I see you have done this already. On 28/05/17 11:15, Mark Kirkwood wrote: Interesting - might be good to see your test script too (so we can better understand how you are deciding if the runs are successful or not). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical replication - still unstable after all these months
Interesting - might be good to see your test script too (so we can better understand how you are deciding if the runs are successful or not). Also, any idea which rows are different? If you want something out of the box that will do that for you see DBIx::Compare. regards Mark On 28/05/17 04:12, Erik Rijkers wrote: ok, ok... ( The thing is, I am trying to pre-digest the output but it takes time ) I can do this now: attached some output that belongs with this group of 100 1-minute runs: -- out_20170525_1426.txt 100 -- pgbench -c 64 -j 8 -T 60 -P 12 -n -- scale 25 82 -- All is well. 18 -- Not good. That is the worst set of runs of what I showed earlier. that is: out_20170525_1426.txt and 2x18 logfiles that the 18 failed runs produced. Those logfiles have names like: logrep.20170525_1426.1436.1.scale_25.clients_64.NOK.log logrep.20170525_1426.1436.2.scale_25.clients_64.NOK.log .1.=primary .2.=replica Please disregard the errors around pg_current_wal_location(). (it was caused by some code to dump some wal into zipfiles which obviously stopped working after the function was removed/renamed) There are also some uninportant errors from the test-harness where I call with the wrong port. Not interesting, I don't think. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical replication - still unstable after all these months
On 26/05/17 20:09, Erik Rijkers wrote: The idea is simple enough: startup instance1 startup instance2 (on same machine) primary: init pgbench tables primary: add primary key to pgbench_history copy empty tables to replica by dump/restore primary: start publication replica: start subscription primary: run 1-minute pgbench wait till the 4 md5's of primary pgbench tables are the same as the 4 md5's of replica pgbench tables (this will need a time-out). log 'ok' or 'not ok' primary: clean up publication replica: clean up subscription shutdown primary shutdown replica this whole thing 100 I might have a look at scripting this up (especially if it keeps raining here)... Some questions that might help me get it right: - do you think we need to stop and start the instances every time? - do we need to init pgbench each time? - could we just drop the subscription and publication and truncate the replica tables instead? - what scale pgbench are you running? - how many clients for the 1 min pgbench run? - are you starting the pgbench run while the copy_data jobs for the subscription are still running? - how exactly are you calculating those md5's? Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical replication - still unstable after all these months
On 26/05/17 20:09, Erik Rijkers wrote: On 2017-05-26 09:40, Simon Riggs wrote: If we can find out what the bug is with a repeatable test case we can fix it. Could you provide more details? Thanks I will, just need some time to clean things up a bit. But what I would like is for someone else to repeat my 100x1-minute tests, taking as core that snippet I posted in my previous email. I built bash-stuff around that core (to take md5's, shut-down/start-up the two instances between runs, write info to log-files, etc). But it would be good if someone else made that separately because if that then does not fail, it would prove that my test-harness is at fault (and not logical replication). Will do - what I had been doing was running pgbench, waiting until the row counts on the replica pgbench_history were the same as the primary, then summing the %balance and delta fields from the primary and replica dbs and comparing. So far - all match up ok. However I'd been running a longer time frames (5 minutes), so not the same number of repetitions as yet. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication existing data copy
On 25/03/17 07:52, Peter Eisentraut wrote: On 3/24/17 10:09, Petr Jelinek wrote: On 24/03/17 15:05, Peter Eisentraut wrote: On 3/23/17 19:32, Petr Jelinek wrote: Yes, I also forgot to check if the table actually exists on subscriber when fetching them in CREATE SUBSCRIPTION (we have check during replication but not there). I think for this we can probably just change the missing_ok argument of RangeVarGetRelid() to false. Unless we want the custom error message, in which case we need to change AlterSubscription_refresh(), because right now it errors out because missing_ok = false. You are right, stupid me. Committed this version. Minor niggle: bench=# DROP PUBLICATION pgbench; DROP STATISTICS <=== I'm guessing that notification is wrong. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication existing data copy
On 24/03/17 12:32, Petr Jelinek wrote: On 24/03/17 00:14, Mark Kirkwood wrote: On 24/03/17 02:00, Peter Eisentraut wrote: On 3/21/17 21:38, Peter Eisentraut wrote: This patch is looking pretty good to me, modulo the failing pg_dump tests. Attached is a fixup patch. I have mainly updated some comments and variable naming for (my) clarity. No functional changes. Committed all that. Testing now this patch is in, I'm unable to create a subscription: (master) bench=# CREATE PUBLICATION pgbench FOR TABLE pgbench_accounts , pgbench_branches, pgbench_tellers, pgbench_history WITH (PUBLISH INSERT, PUBLISH UPDATE, PUBLISH DELETE); (slave) bench=# CREATE SUBSCRIPTION pgbench CONNECTION 'port=5447 user=postgres dbname=bench' PUBLICATION pgbench WITH (COPY DATA); ERROR: duplicate key value violates unique constraint "pg_subscription_rel_srrelid_srsubid_index" DETAIL: Key (srrelid, srsubid)=(0, 16389) already exists. This is a pair of freshly initdb'ed instances, the master has a size 100 pgbench schema. I'm guessing this is a different bug from the segfault also reported Yes, I also forgot to check if the table actually exists on subscriber when fetching them in CREATE SUBSCRIPTION (we have check during replication but not there). Attached patches should fix both issues. Yep, does seem to. I note that (probably intensional) specifying 'COPY DATA' does not 'CREATE TABLES' for you...ok, I probably didn't read ...something somewhere...but anyway, after creating the tables it all seems to work. Nice. However one minor observation - as Michael Banck noted - the elapsed time for slave to catch up after running: $ pgbench -c8 -T600 bench on the master was (subjectively) much longer than for physical streaming replication. Is this expected? regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication existing data copy
On 24/03/17 02:00, Peter Eisentraut wrote: On 3/21/17 21:38, Peter Eisentraut wrote: This patch is looking pretty good to me, modulo the failing pg_dump tests. Attached is a fixup patch. I have mainly updated some comments and variable naming for (my) clarity. No functional changes. Committed all that. Testing now this patch is in, I'm unable to create a subscription: (master) bench=# CREATE PUBLICATION pgbench FOR TABLE pgbench_accounts , pgbench_branches, pgbench_tellers, pgbench_history WITH (PUBLISH INSERT, PUBLISH UPDATE, PUBLISH DELETE); (slave) bench=# CREATE SUBSCRIPTION pgbench CONNECTION 'port=5447 user=postgres dbname=bench' PUBLICATION pgbench WITH (COPY DATA); ERROR: duplicate key value violates unique constraint "pg_subscription_rel_srrelid_srsubid_index" DETAIL: Key (srrelid, srsubid)=(0, 16389) already exists. This is a pair of freshly initdb'ed instances, the master has a size 100 pgbench schema. I'm guessing this is a different bug from the segfault also reported regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On 15/03/17 06:29, Robert Haas wrote: Great! I've committed the latest version of the patch, with some cosmetic changes. It would be astonishing if there weren't a bug or two left, but I think overall this is very solid work, and I think it's time to put this out there and see how things go. Awesome, great work! Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Logical replication initial feedback
Hi, Thought I'd take a look at how this works (checking out current master). You guys have manged to get this to the point where it is *ridiculously* easy to set up a basic replication scenario - awesome i must say: - change a few parameters on the master (max_wal_senders,max_replication_slots,wal_level) - create a publication on the master - pg_dump/restore relevant tables - create a subscription of the slave ..and done. Very nice! best wishes Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNDO and in-place update
On 23/11/16 16:31, Tom Lane wrote: Robert Haaswrites: [ Let's invent Oracle-style UNDO logs ] I dunno. I remember being told years ago, by an ex-Oracle engineer, that he thought our approach was better. I don't recall all the details of the conversation but I think his key point was basically this: - Reading a page that has been recently modified gets significantly more expensive; it is necessary to read the associated UNDO entries and do a bunch of calculation that is significantly more complex than what is required today. Also ROLLBACK becomes vastly more expensive than COMMIT (I can recall many years ago when I used to be an Oracle DBA reading whole chapters of novels waiting for failed batch jobs to roll back). However I'd like to add that I agree this is worth looking at, as ideally it would be great to be able to choose whether to have No-UNDO or UNDO on a table by table basis... regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: About CMake v2
I see there are some patches for the putenv issue with Visual studio 2013 in progress on this list - it is probably best to work with the author to see if 2015 has any new issues and keep all changes for that *out* of the cmake patches. regards Mark On 16/11/16 21:22, Yury Zhuravlev wrote: I made this small wrapper special for MSVC 2015 without Service Packs because postgres macross were in conflict with MS internal functions. After some time and some updates for MSVC Michael Paquier could not reproduce my problem but I keep this patch to avoid problems in the future. I can check old behavior again and revert all changes if needed and ofcourse I have plans to make separate patch for this changes. Thanks! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: About CMake v2
Yeah, there seems to be a lot of these. Looking through them almost all concern the addition of piece of code to wrap putenv. e.g: --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -1350,7 +1350,7 @@ exec_command(const char *cmd, char *newval; newval = psprintf("%s=%s", envvar, envval); - putenv(newval); + pg_putenv_proxy(newval); success = true; /* Where pg_putenv_proxy either calls putenv or pgwin32_putenv (the latter on windows I'd guess). I wonder if this could have been avoided, since the original code handles this sort of thing. There are also some minor - and not immediately obvious - changes to a number of macros in various includes...If I'm feeling keen I'll experiment to see how far I can get without any source changes at all. regards Mark On 09/11/16 08:37, Peter Eisentraut wrote: There are a number of changes in .[ch] and .pl files that are unclear and not explained. Please explain them. You can also submit separate preliminary patches if you need to do some refactoring. Ultimately, I would expect this patch not to require C code changes. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: About CMake v2
I had a bit a play around to see if I could get this building properly again with v10 (i.e master). I've attached a minimal *incremental* patch that needs to be applied after v1. This gets everything under the src tree building (added the new file_utils.c and reordered some link libs and removed some duplicates). I haven't made any changes with respect to it trying to detect and build everything. One added nit I see is that unless I'm missing something there appears to be no way to stop it trying to build all the contribs...so an option to enable/disable their build is needed. To make it display what options there are I use: $ mkdir build; cd build ; cmake .. -LH And to do a build that works: $ cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local/pgsql/10 -DWITH_PERL=OFF -DWITH_OPENSSL=OFF -DWITH_PYTHON=OFF With reference to Tom's comments, I'm thinking that these should all default to 'OFF' plus an additional variable WITH_CONTRIB (or similar) should default to OFF too. regards Mark On 09/11/16 08:37, Peter Eisentraut wrote: On 9/17/16 1:21 PM, Yury Zhuravlev wrote: Now, I published the first version of the patch. I tried this out. Because of some file moves in initdb and pg_basebackup, the build fails: [ 74%] Linking C executable initdb Undefined symbols for architecture x86_64: "_fsync_pgdata", referenced from: _main in initdb.c.o ld: symbol(s) not found for architecture x86_64 collect2: error: ld returned 1 exit status make[2]: *** [src/bin/initdb/CMakeFiles/initdb.dir/build.make:177: src/bin/initdb/initdb] Error 1 make[1]: *** [CMakeFiles/Makefile2:2893: src/bin/initdb/CMakeFiles/initdb.dir/all] Error 2 make: *** [Makefile:128: all] Error 2 Please submit an updated patch. I suggest you use git format-patch to produce patches. This is easier to apply, especially when there are a lot of new files involved. Also use the git facilities to check for whitespace errors. Please supply some documentation, such as - what are the basic commands - how to set include/library paths, choose configure options - how to set CFLAGS - how to see raw build commands - what are the targets for all/world/check/docs etc. - explain directory structure I suggest for now you could put this into a README.cmake file in your patch. We don't need to commit it that way, but it would help in the meantime. When I run cmake without options, it seems to do opportunistic feature checking. For example, it turns on OpenSSL or Python support if it can find it, otherwise it turns it off. We need this to be deterministic. Without options, choose the basic feature set, require all other features to be turned on explicitly, fail if they can't be found. Whatever the Autoconf-based build does now has been fairly deliberately tuned, so there should be very little reason to deviate from that. The Python check appears to be picking up pieces from two different Python installations: -- Found PythonInterp: /usr/local/bin/python (found version "2.7.12") -- Found PythonLibs: /usr/lib/libpython2.7.dylib (found version "2.7.10") The check results otherwise look OK, but I'm a bit confused about the order. It checks for some functions before all the header files are checked for. Is that intentional? There are a number of changes in .[ch] and .pl files that are unclear and not explained. Please explain them. You can also submit separate preliminary patches if you need to do some refactoring. Ultimately, I would expect this patch not to require C code changes. diff --git a/src/bin/pg_archivecleanup/CMakeLists.txt b/src/bin/pg_archivecleanup/CMakeLists.txt index 7c79ac3..06d7be4 100644 --- a/src/bin/pg_archivecleanup/CMakeLists.txt +++ b/src/bin/pg_archivecleanup/CMakeLists.txt @@ -7,9 +7,8 @@ include_directories(BEFORE add_executable(pg_archivecleanup pg_archivecleanup.c) target_link_libraries(pg_archivecleanup - port -# pq pgcommon + port ${M_LIB} ) diff --git a/src/bin/pg_basebackup/CMakeLists.txt b/src/bin/pg_basebackup/CMakeLists.txt index a985e08..a9bdd66 100644 --- a/src/bin/pg_basebackup/CMakeLists.txt +++ b/src/bin/pg_basebackup/CMakeLists.txt @@ -12,9 +12,12 @@ endif() add_library(common_basebackup STATIC receivelog.c streamutil.c + walmethods.c ) target_link_libraries(common_basebackup + pgfeutils + pgcommon port pq ) @@ -29,12 +32,8 @@ foreach(loop_var IN ITEMS ${basebackup_list}) add_executable(${loop_var} ${loop_var}.c) target_link_libraries(${loop_var} - pgfeutils - port - pq - pgcommon - ${M_LIB} common_basebackup + ${M_LIB} ) CMAKE_SET_TARGET_FOLDER(${loop_var} bin) if(ZLIB_FOUND) diff --git a/src/common/CMakeLists.txt b/src/common/CMakeLists.txt index 7a24165..9338482 100644 --- a/src/common/CMakeLists.txt +++ b/src/common/CMakeLists.txt @@ -18,6 +18,7 @@ set(pgcommon_srv_SRCS set(pgcommon_SRCS ${pgcommon_srv_SRCS} fe_memutils.c + file_utils.c restricted_token.c ${PROJECT_SOURCE_DIR}/src/include/parser/gram.h )
Re: [HACKERS] WIP: About CMake v2
On 11/11/16 08:15, Yury Zhuravlev wrote: Craig Ringer wrote: So personally I think it'd be fine if a cmake build defaulted to finding and using what it could, but offered a --minimal mode or whatever that gets us just core postgres + whatever we enable explicitly. But our current behaviour is OK too. To me it's best way. But I'm not sure what Tom Lane will accept this. I just had a play with this patch - nice! (ok so it needs a fix so that the compile completes as mentioned prev). I would recommend making it behave as Tom suggested. *Then* add an --autodetect or similar option that makes behave in the 'finding and using what it could' manner as a 2nd patch. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On 25/09/16 18:18, Amit Kapila wrote: On Sat, Sep 24, 2016 at 10:49 PM, Greg Starkwrote: On Thu, Sep 22, 2016 at 3:23 AM, Tom Lane wrote: But to kick the hash AM as such to the curb is to say "sorry, there will never be O(1) index lookups in Postgres". Well there's plenty of halfway solutions for that. We could move hash indexes to contrib or even have them in core as experimental_hash or unlogged_hash until the day they achieve their potential. We definitely shouldn't discourage people from working on hash indexes Okay, but to me it appears that naming it as experimental_hash or moving it to contrib could discourage people or at the very least people will be less motivated. Thinking on those lines a year or so back would have been a wise direction, but now when already there is lot of work done (patches to make it wal-enabled, more concurrent and performant, page inspect module are available) for hash indexes and still more is in progress, that sounds like a step backward then step forward. +1 I think so too - I've seen many email threads over the years on this list that essentially state "we need hash indexes wal logged to make progress with them"...and Amit et al has/have done this (more than this obviously - made 'em better too) and I'm astonished that folk are suggesting anything other than 'commit this great patch now!'... regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On 17/09/16 06:38, Andres Freund wrote: On 2016-09-16 09:12:22 -0700, Jeff Janes wrote: On Thu, Sep 15, 2016 at 7:23 AM, Andres Freundwrote: One earlier question about this is whether that is actually a worthwhile goal. Are the speed and space benefits big enough in the general case? Could those benefits not be achieved in a more maintainable manner by adding a layer that uses a btree over hash(columns), and adds appropriate rechecks after heap scans? Note that I'm not saying that hash indexes are not worthwhile, I'm just doubtful that question has been explored sufficiently. I think that exploring it well requires good code. If the code is good, why not commit it? Because getting there requires a lot of effort, debugging it afterwards would take effort, and maintaining it would also takes a fair amount? Adding code isn't free. I'm rather unenthused about having a hash index implementation that's mildly better in some corner cases, but otherwise doesn't have much benefit. That'll mean we'll have to step up our user education a lot, and we'll have to maintain something for little benefit. While I see the point of what you are saying here, I recall all previous discussions about has indexes tended to go a bit like this: - until WAL logging of hash indexes is written it is not worthwhile trying to make improvements to them - WAL logging will be a lot of work, patches 1st please Now someone has done that work, and we seem to be objecting that because they are not improved then the patches are (maybe) not worthwhile. I think that is - essentially - somewhat unfair. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On 16/09/16 18:35, Amit Kapila wrote: On Thu, Sep 15, 2016 at 7:53 PM, Andres Freundwrote: Hi, On 2016-05-10 17:39:22 +0530, Amit Kapila wrote: For making hash indexes usable in production systems, we need to improve its concurrency and make them crash-safe by WAL logging them. One earlier question about this is whether that is actually a worthwhile goal. Are the speed and space benefits big enough in the general case? I think there will surely by speed benefits w.r.t reads for larger indexes. All our measurements till now have shown that there is a benefit varying from 30~60% (for reads) with hash index as compare to btree, and I think it could be even more if we further increase the size of index. On space front, I have not done any detailed study, so it is not right to conclude anything, but it appears to me that if the index is on char/varchar column where size of key is 10 or 20 bytes, hash indexes should be beneficial as they store just hash-key. Could those benefits not be achieved in a more maintainable manner by adding a layer that uses a btree over hash(columns), and adds appropriate rechecks after heap scans? I don't think it can be faster for reads than using real hash index, but surely one can have that as a workaround. Note that I'm not saying that hash indexes are not worthwhile, I'm just doubtful that question has been explored sufficiently. I think theoretically hash indexes are faster than btree considering logarithmic complexity (O(1) vs. O(logn)), also the results after recent optimizations indicate that hash indexes are faster than btree for equal to searches. I am not saying after the recent set of patches proposed for hash indexes they will be better in all kind of cases. It could be beneficial for cases where indexed columns are not updated heavily. I think one can definitely argue that we can some optimizations in btree and make them equivalent or better than hash indexes, but I am not sure if it is possible for all-kind of use-cases. I think having the choice for a more equality optimized index design is desirable. Now that they are wal logged they are first class citizens so to speak. I suspect that there are a lot of further speed optimizations that can be considered to tease out the best performance - now that the basics of reliability have been sorted. I think this patch/set of patches is/are important! regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] less expensive pg_buffercache on big shmem
On 02/09/16 15:19, Andres Freund wrote: On 2016-09-02 08:31:42 +0530, Robert Haas wrote: I wonder whether we ought to just switch from the consistent method to the semiconsistent method and call it good. +1. I think, before long, we're going to have to switch away from having locks & partitions in the first place. So I don't see a problem relaxing this. It's not like that consistency really buys you anything... I'd even consider not using any locks. +1 as well. When I wrote the original module I copied the design of the pg_locks view - as it was safe and consistent. Now it is clear that the ability to look at (semi-consistent) contents of the buffer cache is more important than having a theoretically correct point in time snapshot. Go for it :-) regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On 13/09/16 01:20, Jesper Pedersen wrote: On 09/01/2016 11:55 PM, Amit Kapila wrote: I have fixed all other issues you have raised. Updated patch is attached with this mail. The following script hangs on idx_val creation - just with v5, WAL patch not applied. Are you sure it is actually hanging? I see 100% cpu for a few minutes but the index eventually completes ok for me (v5 patch applied to today's master). Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On 11/09/16 19:16, Mark Kirkwood wrote: On 11/09/16 17:01, Amit Kapila wrote: ...Do you think we can do some read-only workload benchmarking using this server? If yes, then probably you can use concurrent hash index patch [1] and cache the metapage patch [2] (I think Mithun needs to rebase his patch) to do so. [1] - https://www.postgresql.org/message-id/caa4ek1j6b8o4pcepqrxnyblvbftonmjeem+qn0jzx31-obx...@mail.gmail.com [2] - https://www.postgresql.org/message-id/cad__ouhj29cebif_flge4t9vj_-cfxbwcxhjo+d_16txbem...@mail.gmail.com I can do - are we checking checking for hangs/assertions or comparing patched vs unpatched performance (for the metapage patch)? So, assuming the latter - testing performance with and without the metapage patch: For my 1st runs: - cpus 16, ran 16G - size 100, clients 32 I'm seeing no difference in performance for read only (-S) pgbench workload (with everybody using has indexes). I guess not that surprising as the db fites in ram (1.6G and we have 16G). So I'll retry with a bigger dataset (suspect size 2000 is needed). regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On 11/09/16 17:01, Amit Kapila wrote: On Sun, Sep 11, 2016 at 4:10 AM, Mark Kirkwood <mark.kirkw...@catalyst.net.nz> wrote: performed several 10 hour runs on size 100 database using 32 and 64 clients. For the last run I rebuilt with assertions enabled. No hangs or assertion failures. Thanks for verification. Do you think we can do some read-only workload benchmarking using this server? If yes, then probably you can use concurrent hash index patch [1] and cache the metapage patch [2] (I think Mithun needs to rebase his patch) to do so. [1] - https://www.postgresql.org/message-id/caa4ek1j6b8o4pcepqrxnyblvbftonmjeem+qn0jzx31-obx...@mail.gmail.com [2] - https://www.postgresql.org/message-id/cad__ouhj29cebif_flge4t9vj_-cfxbwcxhjo+d_16txbem...@mail.gmail.com I can do - are we checking checking for hangs/assertions or comparing patched vs unpatched performance (for the metapage patch)? regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On 09/09/16 14:50, Mark Kirkwood wrote: Yeah, good suggestion about replacing (essentially) all the indexes with hash ones and testing. I did some short runs with this type of schema yesterday (actually to get a feel for if hash performance vs btree was compareable - does seem tp be) - but probably longer ones with higher concurrency (as high as I can manage on a single socket i7 anyway) is a good plan. If Ashutosh has access to seriously large numbers of cores then that is even better :-) I managed to find a slightly bigger server (used our openstack cloud to run a 16 cpu vm). With the schema modified as follows: bench=# \d pgbench_accounts Table "public.pgbench_accounts" Column | Type | Modifiers --+---+--- aid | integer | not null bid | integer | abalance | integer | filler | character(84) | Indexes: "pgbench_accounts_pkey" hash (aid) bench=# \d pgbench_branches Table "public.pgbench_branches" Column | Type | Modifiers --+---+--- bid | integer | not null bbalance | integer | filler | character(88) | Indexes: "pgbench_branches_pkey" hash (bid) bench=# \d pgbench_tellers Table "public.pgbench_tellers" Column | Type | Modifiers --+---+--- tid | integer | not null bid | integer | tbalance | integer | filler | character(84) | Indexes: "pgbench_tellers_pkey" hash (tid) bench=# \d pgbench_history Table "public.pgbench_history" Column |Type | Modifiers +-+--- tid| integer | bid| integer | aid| integer | delta | integer | mtime | timestamp without time zone | filler | character(22) | Indexes: "pgbench_history_pkey" hash (bid) performed several 10 hour runs on size 100 database using 32 and 64 clients. For the last run I rebuilt with assertions enabled. No hangs or assertion failures. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On 09/09/16 07:09, Jeff Janes wrote: On Wed, Sep 7, 2016 at 3:29 AM, Ashutosh Sharma> wrote: > Thanks to Ashutosh Sharma for doing the testing of the patch and > helping me in analyzing some of the above issues. Hi All, I would like to summarize the test-cases that i have executed for validating WAL logging in hash index feature. 1) I have mainly ran the pgbench test with read-write workload at the scale factor of 1000 and various client counts like 16, 64 and 128 for time duration of 30 mins, 1 hr and 24 hrs. I have executed this test on highly configured power2 machine with 128 cores and 512GB of RAM. I ran the test-case both with and without the replication setup. Please note that i have changed the schema of pgbench tables created during initialisation phase. The new schema of pgbench tables looks as shown below on both master and standby: postgres=# \d pgbench_accounts Table "public.pgbench_accounts" Column | Type | Modifiers --+---+--- aid | integer | not null bid | integer | abalance | integer | filler | character(84) | Indexes: "pgbench_accounts_pkey" PRIMARY KEY, btree (aid) "pgbench_accounts_bid" hash (bid) postgres=# \d pgbench_history Table "public.pgbench_history" Column |Type | Modifiers +-+--- tid| integer | bid| integer | aid| integer | delta | integer | mtime | timestamp without time zone | filler | character(22) | Indexes: "pgbench_history_bid" hash (bid) Hi Ashutosh, This schema will test the maintenance of hash indexes, but it will never use hash indexes for searching, so it limits the amount of test coverage you will get. While searching shouldn't generate novel types of WAL records (that I know of), it will generate locking and timing issues that might uncover bugs (if there are any left to uncover, of course). I would drop the primary key on pgbench_accounts and replace it with a hash index and test it that way (except I don't have a 128 core machine at my disposal, so really I am suggesting that you do this...) The lack of primary key and the non-uniqueness of the hash index should not be an operational problem, because the built in pgbench runs never attempt to violate the constraints anyway. In fact, I'd replace all of the indexes on the rest of the pgbench tables with hash indexes, too, just for additional testing. I plan to do testing using my own testing harness after changing it to insert a lot of dummy tuples (ones with negative values in the pseudo-pk column, which are never queried by the core part of the harness) and deleting them at random intervals. I think that none of pgbench's built in tests are likely to give the bucket splitting and squeezing code very much exercise. Is there a way to gather statistics on how many of each type of WAL record are actually getting sent over the replication link? The only way I can think of is to turn on wal archving as well as replication, then using pg_xlogdump to gather the stats. I've run my original test for a while now and have not seen any problems. But I realized I forgot to compile with enable-casserts, to I will have to redo it to make sure the assertion failures have been fixed. In my original testing I did very rarely get a deadlock (or some kind of hang), and I haven't seen that again so far. It was probably the same source as the one Mark observed, and so the same fix. Cheers, Jeff Yeah, good suggestion about replacing (essentially) all the indexes with hash ones and testing. I did some short runs with this type of schema yesterday (actually to get a feel for if hash performance vs btree was compareable - does seem tp be) - but probably longer ones with higher concurrency (as high as I can manage on a single socket i7 anyway) is a good plan. If Ashutosh has access to seriously large numbers of cores then that is even better :-) Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On 07/09/16 21:58, Amit Kapila wrote: On Wed, Aug 24, 2016 at 10:32 PM, Jeff Janes <jeff.ja...@gmail.com> wrote: On Tue, Aug 23, 2016 at 10:05 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: On Wed, Aug 24, 2016 at 2:37 AM, Jeff Janes <jeff.ja...@gmail.com> wrote: After an intentionally created crash, I get an Assert triggering: TRAP: FailedAssertion("!(((freep)[(bitmapbit)/32] & (1<<((bitmapbit)%32", File: "hashovfl.c", Line: 553) freep[0] is zero and bitmapbit is 16. Here what is happening is that when it tries to clear the bitmapbit, it expects it to be set. Now, I think the reason for why it didn't find the bit as set could be that after the new overflow page is added and the bit corresponding to it is set, you might have crashed the system and the replay would not have set the bit. Then while freeing the overflow page it can hit the Assert as mentioned by you. I think the problem here could be that I am using REGBUF_STANDARD to log the bitmap page updates which seems to be causing the issue. As bitmap page doesn't follow the standard page layout, it would have omitted the actual contents while taking full page image and then during replay, it would not have set the bit, because page doesn't need REDO. I think here the fix is to use REGBUF_NO_IMAGE as we use for vm buffers. If you can send me the detailed steps for how you have produced the problem, then I can verify after fixing whether you are seeing the same problem or something else. The test is rather awkward, it might be easier to just have me test it. Okay, I have fixed this issue as explained above. Apart from that, I have fixed another issue reported by Mark Kirkwood upthread and few other issues found during internal testing by Ashutosh Sharma. The locking issue reported by Mark and Ashutosh is that the patch didn't maintain the locking order while adding overflow page as it maintains in other write operations (lock the bucket pages first and then metapage to perform the write operation). I have added the comments in _hash_addovflpage() to explain the locking order used in modified patch. During stress testing with pgbench using master-standby setup, we found an issue which indicates that WAL replay machinery doesn't expect completely zeroed pages (See explanation of RBM_NORMAL mode atop XLogReadBufferExtended). Previously before freeing the overflow page we were zeroing it, now I have changed it to just initialize the page such that the page will be empty. Apart from above, I have added support for old snapshot threshold in the hash index code. Thanks to Ashutosh Sharma for doing the testing of the patch and helping me in analyzing some of the above issues. I forgot to mention in my initial mail that Robert and I had some off-list discussions about the design of this patch, many thanks to him for providing inputs. Repeating my tests with these new patches applied points to the hang issue being solved. I tested several 10 minute runs (any of which was enough to elicit the hang previously). I'll do some longer ones, but looks good! regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On 24/08/16 17:01, Mark Kirkwood wrote: ...actually I was wrong there, only 2 of them were the same. So I've attached a new log of bt's of them all. And I can reproduce with only 1 session, figured that might be a helpful piece of the puzzle (trace attached). $ pgbench -c 1 -T600 bench bench=# SELECT pid,state,now()-xact_start AS wait,wait_event_type,wait_event,query FROM pg_stat_activity WHERE datname='bench' ORDER BY wait DESC; pid | state | wait | wait_event_type | wait_event | query ---++-+-++--- - 17266 | active | 00:45:23.027555 | LWLockTranche | buffer_content | INSERT INTO pgbenc h_history (tid, bid, aid, delta, mtime) VALUES (352, 76, 1305123, -1604, CURRENT_TIMESTAM P); (1 row) $ gdb -p 17266 (gdb) bt #0 0x7f7b4d533387 in semop () at ../sysdeps/unix/syscall-template.S:84 #1 0x00660331 in PGSemaphoreLock (sema=sema@entry=0x7f7b4cad1690) at pg_sema.c:387 #2 0x006bde9b in LWLockAcquire (lock=0x7f7b44583fe4, mode=LW_EXCLUSIVE) at lwlock.c:1288 #3 0x0049a625 in _hash_getbuf_with_strategy (rel=rel@entry=0x7f7b4e0c1908, blkno=blkno@entry=12, access=access@entry=2, flags=flags@entry=1, bstrategy=bstrategy@entry=0x0) at hashpage.c:252 #4 0x00498040 in _hash_freeovflpage (rel=rel@entry=0x7f7b4e0c1908, bucketbuf=bucketbuf@entry=12796, ovflbuf=ovflbuf@entry=236, wbuf=wbuf@entry=13166, itups=itups@entry=0x7ffc693fc800, itup_offsets=itup_offsets@entry=0x13067d0, tups_size=0x7ffc693fd4c0, nitups=34, bstrategy=0x0) at hashovfl.c:517 #5 0x00499985 in _hash_squeezebucket (rel=rel@entry=0x7f7b4e0c1908, bucket=bucket@entry=7, bucket_blkno=bucket_blkno@entry=10, bucket_buf=bucket_buf@entry=12796, bstrategy=bstrategy@entry=0x0) at hashovfl.c:1010 #6 0x00496caf in hashbucketcleanup (rel=rel@entry=0x7f7b4e0c1908, bucket_buf=bucket_buf@entry=12796, bucket_blkno=bucket_blkno@entry=10, bstrategy=bstrategy@entry=0x0, maxbucket=22, highmask=31, lowmask=15, tuples_removed=0x0, num_index_tuples=0x0, bucket_has_garbage=1 '\001', delay=0 '\000', callback=0x0, callback_state=0x0) at hash.c:937 #7 0x0049b353 in _hash_expandtable (rel=rel@entry=0x7f7b4e0c1908, metabuf=metabuf@entry=6179) at hashpage.c:785 #8 0x00497bae in _hash_doinsert (rel=rel@entry=0x7f7b4e0c1908, itup=itup@entry=0x13060b8) at hashinsert.c:313 #9 0x0049617f in hashinsert (rel=0x7f7b4e0c1908, values=, isnull=, ht_ctid=0x1305f7c, heapRel=, checkUnique=) at hash.c:247 #10 0x005bf060 in ExecInsertIndexTuples (slot=slot@entry=0x1304a70, tupleid=tupleid@entry=0x1305f7c, estate=estate@entry=0x13043b0, noDupErr=noDupErr@entry=0 '\000', specConflict=specConflict@entry=0x0, arbiterIndexes=arbiterIndexes@entry=0x0) at execIndexing.c:388 #11 0x005dd011 in ExecInsert (canSetTag=1 '\001', estate=0x13043b0, onconflict=, arbiterIndexes=0x0, planSlot=0x1304a70, slot=0x1304a70, mtstate=0x1304600) at nodeModifyTable.c:481 #12 ExecModifyTable (node=node@entry=0x1304600) at nodeModifyTable.c:1496 #13 0x005c3948 in ExecProcNode (node=node@entry=0x1304600) at execProcnode.c:396 #14 0x005bfdbf in ExecutePlan (dest=0x12f4bd8, direction=, numberTuples=0, sendTuples=, operation=CMD_INSERT, use_parallel_mode=, planstate=0x1304600, estate=0x13043b0) at execMain.c:1567 #15 standard_ExecutorRun (queryDesc=0x1303fa0, direction=, count=0) at execMain.c:338 #16 0x006ce669 in ProcessQuery (plan=, sourceText=0x12ccf50 "INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (352, 76, 1305123, -1604, CURRENT_TIMESTAMP);", params=0x0, dest=0x12f4bd8, completionTag=0x7ffc69400c30 "") at pquery.c:187 #17 0x006ce89c in PortalRunMulti (portal=portal@entry=0x1271fe0, isTopLevel=isTopLevel@entry=1 '\001', setHoldSnapshot=setHoldSnapshot@entry=0 '\000', dest=dest@entry=0x12f4bd8, ---Type to continue, or q to quit--- altdest=altdest@entry=0x12f4bd8, completionTag=completionTag@entry=0x7ffc69400c30 "") at pquery.c:1303 #18 0x006cf34e in PortalRun (portal=portal@entry=0x1271fe0, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x12f4bd8, altdest=altdest@entry=0x12f4bd8, completionTag=completionTag@entry=0x7ffc69400c30 "") at pquery.c:815 #19 0x006cc45b in exec_simple_query ( query_string=0x12ccf50 "INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (352, 76, 1305123, -1604, CURRENT_TIMESTAMP)
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On 24/08/16 16:52, Mark Kirkwood wrote: On 24/08/16 16:33, Amit Kapila wrote: On Wed, Aug 24, 2016 at 9:53 AM, Mark Kirkwood <mark.kirkw...@catalyst.net.nz> wrote: On 24/08/16 15:36, Amit Kapila wrote: On Wed, Aug 24, 2016 at 8:54 AM, Mark Kirkwood <mark.kirkw...@catalyst.net.nz> wrote: Can you get the call stacks? For every stuck backend? (just double checking)... Yeah. Cool, I managed to reproduce with a reduced workload of 4 backends, then noticed that the traces for 3 of 'em were all the same. So I've attached the 2 unique ones, plus noted what pg_stat_activity thought the wait event was, in case that is useful. ...actually I was wrong there, only 2 of them were the same. So I've attached a new log of bt's of them all. $ pgbench -c 4 -T600 bench postgres=# SELECT pid, state,now()-xact_start AS wait,wait_event_type,wait_event,query FROM pg_stat_activity WHERE datname='bench' ORDER BY wait DESC; pid | state | wait | wait_event_type | wait_event | query ---++-+-++- --- 16549 | active | 00:06:00.252146 | LWLockTranche | buffer_content | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (190, 24, 9513890, -2566, C URRENT_TIMESTAMP); 16548 | active | 00:06:00.235003 | LWLockTranche | buffer_content | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (445, 3, 5688076, 1021, CUR RENT_TIMESTAMP); 16547 | active | 00:06:00.218032 | LWLockTranche | buffer_content | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (803, 97, 6871273, -35, CUR RENT_TIMESTAMP); 16546 | active | 00:06:00.192543 | Lock| transactionid | UPDATE pgbench_branches SET bbalance = bbalance + -2823 WHERE bid = 3; (4 rows) $ gdb 16546 (gdb) bt #0 0x7f51daa3a133 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:84 #1 0x006abdb6 in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7fffd197d3a0, cur_timeout=-1, set=0xdc9288) at latch.c:987 #2 WaitEventSetWait (set=set@entry=0xdc9288, timeout=timeout@entry=-1, occurred_events=occurred_events@entry=0x7fffd197d3a0, nevents=nevents@entry=1) at latch.c:941 #3 0x006ac1ca in WaitLatchOrSocket (latch=0x7f51d9fd969c, wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1, timeout=-1, timeout@entry=0) at latch.c:347 #4 0x006ac27d in WaitLatch (latch=, wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=0) at latch.c:302 #5 0x006bb5e3 in ProcSleep (locallock=locallock@entry=0xd3d690, lockMethodTable=lockMethodTable@entry=0x9149a0 ) at proc.c:1219 #6 0x006b639d in WaitOnLock (locallock=locallock@entry=0xd3d690, owner=owner@entry=0xdc4cb0) at lock.c:1703 #7 0x006b7a77 in LockAcquireExtended (locktag=locktag@entry=0x7fffd197d700, lockmode=lockmode@entry=5, sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000', reportMemoryError=reportMemoryError@entry=1 '\001') at lock.c:998 #8 0x006b7d51 in LockAcquire (locktag=locktag@entry=0x7fffd197d700, lockmode=lockmode@entry=5, sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000') at lock.c:688 #9 0x006b58fe in XactLockTableWait (xid=xid@entry=7667, rel=rel@entry=0x7f51db5c6398, ctid=ctid@entry=0x7fffd197d844, oper=oper@entry=XLTW_Update) at lmgr.c:587 #10 0x004a5d29 in heap_update (relation=relation@entry=0x7f51db5c6398, otid=otid@entry=0x7fffd197d9e0, newtup=newtup@entry=0xd4b3d0, cid=2, crosscheck=0x0, wait=wait@entry=1 '\001', hufd=0x7fffd197d8f0, lockmode=0x7fffd197d8ec) at heapam.c:3755 #11 0x005dbeaa in ExecUpdate (tupleid=tupleid@entry=0x7fffd197d9e0, oldtuple=oldtuple@entry=0x0, slot=slot@entry=0xd4adb0, planSlot=planSlot@entry=0xdc8ea0, epqstate=epqstate@entry=0xdc76c8, estate=estate@entry=0xdc73d0, canSetTag=1 '\001') at nodeModifyTable.c:922 #12 0x005dc633 in ExecModifyTable (node=node@entry=0xdc7620) at nodeModifyTable.c:1501 #13 0x005c3948 in ExecProcNode (node=node@entry=0xdc7620) at execProcnode.c:396 #14 0x005bfdbf in ExecutePlan (dest=0xdc9e60, direction=, numberTuples=0, sendTuples=, operation=CMD_UPDATE, use_parallel_mode=, planstate=0xdc7620, estate=0xdc73d0) at execMain.c:1567 #15 standard_ExecutorRun (queryDesc=0xdc6fc0, direction=, count=0) at execMain.c:338 #16 0x006ce669 in ProcessQuery (plan=, sourceText=0xd8ff70 "UPDATE pgbench_branches SET bbalance = bbalance + -2823 WHERE bid = 3;", params=0x0, dest=0xdc9e60, completionTag=0x7fffd197dfc0 "") at pquery.c:187 #17 0x006ce89c in PortalRunMulti (portal=portal@entry=0xd35000, isT
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On 24/08/16 16:33, Amit Kapila wrote: On Wed, Aug 24, 2016 at 9:53 AM, Mark Kirkwood <mark.kirkw...@catalyst.net.nz> wrote: On 24/08/16 15:36, Amit Kapila wrote: On Wed, Aug 24, 2016 at 8:54 AM, Mark Kirkwood <mark.kirkw...@catalyst.net.nz> wrote: Can you get the call stacks? For every stuck backend? (just double checking)... Yeah. Cool, I managed to reproduce with a reduced workload of 4 backends, then noticed that the traces for 3 of 'em were all the same. So I've attached the 2 unique ones, plus noted what pg_stat_activity thought the wait event was, in case that is useful. Cheers Mark $ pgbench -c 4 -T600 bench postgres=# SELECT pid, state,now()-xact_start AS wait,wait_event_type,wait_event,query FROM pg_stat_activity WHERE datname='bench' ORDER BY wait DESC; pid | state | wait | wait_event_type | wait_event | query ---++-+-++- --- 16549 | active | 00:06:00.252146 | LWLockTranche | buffer_content | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (190, 24, 9513890, -2566, C URRENT_TIMESTAMP); 16548 | active | 00:06:00.235003 | LWLockTranche | buffer_content | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (445, 3, 5688076, 1021, CUR RENT_TIMESTAMP); 16547 | active | 00:06:00.218032 | LWLockTranche | buffer_content | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (803, 97, 6871273, -35, CUR RENT_TIMESTAMP); 16546 | active | 00:06:00.192543 | Lock| transactionid | UPDATE pgbench_branches SET bbalance = bbalance + -2823 WHERE bid = 3; (4 rows) postgres@zmori:~$ gdb -p 16546 0x7f51daa3a133 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:84 84 ../sysdeps/unix/syscall-template.S: No such file or directory. (gdb) bt #0 0x7f51daa3a133 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:84 #1 0x006abdb6 in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7fffd197d3a0, cur_timeout=-1, set=0xdc9288) at latch.c:987 #2 WaitEventSetWait (set=set@entry=0xdc9288, timeout=timeout@entry=-1, occurred_events=occurred_events@entry=0x7fffd197d3a0, nevents=nevents@entry=1) at latch.c:941 #3 0x006ac1ca in WaitLatchOrSocket (latch=0x7f51d9fd969c, wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1, timeout=-1, timeout@entry=0) at latch.c:347 #4 0x006ac27d in WaitLatch (latch=, wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=0) at latch.c:302 #5 0x006bb5e3 in ProcSleep (locallock=locallock@entry=0xd3d690, lockMethodTable=lockMethodTable@entry=0x9149a0 ) at proc.c:1219 #6 0x006b639d in WaitOnLock (locallock=locallock@entry=0xd3d690, owner=owner@entry=0xdc4cb0) at lock.c:1703 #7 0x006b7a77 in LockAcquireExtended ( locktag=locktag@entry=0x7fffd197d700, lockmode=lockmode@entry=5, sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000', reportMemoryError=reportMemoryError@entry=1 '\001') at lock.c:998 #8 0x006b7d51 in LockAcquire (locktag=locktag@entry=0x7fffd197d700, lockmode=lockmode@entry=5, sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000') at lock.c:688 #9 0x006b58fe in XactLockTableWait (xid=xid@entry=7667, rel=rel@entry=0x7f51db5c6398, ctid=ctid@entry=0x7fffd197d844, oper=oper@entry=XLTW_Update) at lmgr.c:587 #10 0x004a5d29 in heap_update (relation=relation@entry=0x7f51db5c6398, otid=otid@entry=0x7fffd197d9e0, newtup=newtup@entry=0xd4b3d0, cid=2, crosscheck=0x0, wait=wait@entry=1 '\001', hufd=0x7fffd197d8f0, lockmode=0x7fffd197d8ec) at heapam.c:3755 #11 0x005dbeaa in ExecUpdate (tupleid=tupleid@entry=0x7fffd197d9e0, oldtuple=oldtuple@entry=0x0, slot=slot@entry=0xd4adb0, planSlot=planSlot@entry=0xdc8ea0, epqstate=epqstate@entry=0xdc76c8, estate=estate@entry=0xdc73d0, canSetTag=1 '\001') at nodeModifyTable.c:922 #12 0x005dc633 in ExecModifyTable (node=node@entry=0xdc7620) at nodeModifyTable.c:1501 #13 0x005c3948 in ExecProcNode (node=node@entry=0xdc7620) at execProcnode.c:396 #14 0x005bfdbf in ExecutePlan (dest=0xdc9e60, direction=, numberTuples=0, sendTuples=, operation=CMD_UPDATE, use_parallel_mode=, planstate=0xdc7620, estate=0xdc73d0) at execMain.c:1567 #15 standard_ExecutorRun (queryDesc=0xdc6fc0, direction=, count=0) at execMain.c:338 #16 0x006ce669 in ProcessQuery (plan=, sourceText=0xd8ff70 "UPDATE pgbench_branches SET bbalance = bbalance + -2823 WHERE bid = 3;", params=0x0, dest=0xdc9e60, completionTag=0x7fffd197dfc0 "") ---
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On 24/08/16 15:36, Amit Kapila wrote: On Wed, Aug 24, 2016 at 8:54 AM, Mark Kirkwood <mark.kirkw...@catalyst.net.nz> wrote: On 24/08/16 12:09, Mark Kirkwood wrote: On 23/08/16 15:24, Amit Kapila wrote: Thoughts? Note - To use this patch, first apply latest version of concurrent hash index patch [2]. [1] - https://commitfest.postgresql.org/10/647/ [2] - https://www.postgresql.org/message-id/caa4ek1lkq_udism-z2dq6cuvjh3db5fnfnnezzbpsrjw0ha...@mail.gmail.com Firstly - really nice! Patches applied easily etc to latest version 10 checkout. I thought I'd test by initializing pgbench schema, adding a standby, then adding some hash indexes and running pgbench: Looking on the standby: bench=# \d pgbench_accounts Table "public.pgbench_accounts" Column | Type | Modifiers --+---+--- aid | integer | not null bid | integer | abalance | integer | filler | character(84) | Indexes: "pgbench_accounts_pkey" PRIMARY KEY, btree (aid) "pgbench_accounts_bid" hash (bid)< bench=# \d pgbench_history Table "public.pgbench_history" Column |Type | Modifiers +-+--- tid| integer | bid| integer | aid| integer | delta | integer | mtime | timestamp without time zone | filler | character(22) | Indexes: "pgbench_history_bid" hash (bid) <= they have been replicated there ok. However I'm seeing a hang on the master after a while: bench=# SELECT datname,application_name,state,now()-xact_start AS wait,query FROM pg_stat_activity ORDER BY wait DESC; datname | application_name | state | wait | query -+--++-+ | walreceiver | idle | | bench | pgbench | active | 00:31:38.367467 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (921, 38, 251973, -3868, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:38.215378 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (280, 95, 3954814, 2091, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.991056 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (447, 33, 8355237, 3438, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.619798 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (134, 16, 4839994, -2443, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.544196 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (37, 73, 9620119, 4053, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.334504 | UPDATE pgbench_branches SET bbalance = bbalance + -2954 WHERE bid = 33; bench | pgbench | active | 00:31:35.234112 | UPDATE pgbench_branches SET bbalance = bbalance + -713 WHERE bid = 38; bench | pgbench | active | 00:31:34.434676 | UPDATE pgbench_branches SET bbalance = bbalance + -921 WHERE bid = 33; bench | psql | active | 00:00:00| SELECT datname,application_name,state,now()-xact_start AS wait,query FROM pg_stat_activity ORDER BY wait DESC; (10 rows) but no errors in the logs, any thoughts? Can you get the call stacks? For every stuck backend? (just double checking)... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On 24/08/16 12:09, Mark Kirkwood wrote: On 23/08/16 15:24, Amit Kapila wrote: Thoughts? Note - To use this patch, first apply latest version of concurrent hash index patch [2]. [1] - https://commitfest.postgresql.org/10/647/ [2] - https://www.postgresql.org/message-id/caa4ek1lkq_udism-z2dq6cuvjh3db5fnfnnezzbpsrjw0ha...@mail.gmail.com Firstly - really nice! Patches applied easily etc to latest version 10 checkout. I thought I'd test by initializing pgbench schema, adding a standby, then adding some hash indexes and running pgbench: Looking on the standby: bench=# \d pgbench_accounts Table "public.pgbench_accounts" Column | Type | Modifiers --+---+--- aid | integer | not null bid | integer | abalance | integer | filler | character(84) | Indexes: "pgbench_accounts_pkey" PRIMARY KEY, btree (aid) "pgbench_accounts_bid" hash (bid)< bench=# \d pgbench_history Table "public.pgbench_history" Column |Type | Modifiers +-+--- tid| integer | bid| integer | aid| integer | delta | integer | mtime | timestamp without time zone | filler | character(22) | Indexes: "pgbench_history_bid" hash (bid) <= they have been replicated there ok. However I'm seeing a hang on the master after a while: bench=# SELECT datname,application_name,state,now()-xact_start AS wait,query FROM pg_stat_activity ORDER BY wait DESC; datname | application_name | state | wait | query -+--++-+ | walreceiver | idle | | bench | pgbench | active | 00:31:38.367467 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (921, 38, 251973, -3868, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:38.215378 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (280, 95, 3954814, 2091, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.991056 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (447, 33, 8355237, 3438, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.619798 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (134, 16, 4839994, -2443, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.544196 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (37, 73, 9620119, 4053, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.334504 | UPDATE pgbench_branches SET bbalance = bbalance + -2954 WHERE bid = 33; bench | pgbench | active | 00:31:35.234112 | UPDATE pgbench_branches SET bbalance = bbalance + -713 WHERE bid = 38; bench | pgbench | active | 00:31:34.434676 | UPDATE pgbench_branches SET bbalance = bbalance + -921 WHERE bid = 33; bench | psql | active | 00:00:00| SELECT datname,application_name,state,now()-xact_start AS wait,query FROM pg_stat_activity ORDER BY wait DESC; (10 rows) but no errors in the logs, any thoughts? FWIW, retesting with the wal logging patch removed (i.e leaving the concurrent hast one) works fine. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On 23/08/16 15:24, Amit Kapila wrote: Thoughts? Note - To use this patch, first apply latest version of concurrent hash index patch [2]. [1] - https://commitfest.postgresql.org/10/647/ [2] - https://www.postgresql.org/message-id/caa4ek1lkq_udism-z2dq6cuvjh3db5fnfnnezzbpsrjw0ha...@mail.gmail.com Firstly - really nice! Patches applied easily etc to latest version 10 checkout. I thought I'd test by initializing pgbench schema, adding a standby, then adding some hash indexes and running pgbench: Looking on the standby: bench=# \d pgbench_accounts Table "public.pgbench_accounts" Column | Type | Modifiers --+---+--- aid | integer | not null bid | integer | abalance | integer | filler | character(84) | Indexes: "pgbench_accounts_pkey" PRIMARY KEY, btree (aid) "pgbench_accounts_bid" hash (bid)< bench=# \d pgbench_history Table "public.pgbench_history" Column |Type | Modifiers +-+--- tid| integer | bid| integer | aid| integer | delta | integer | mtime | timestamp without time zone | filler | character(22) | Indexes: "pgbench_history_bid" hash (bid) <= they have been replicated there ok. However I'm seeing a hang on the master after a while: bench=# SELECT datname,application_name,state,now()-xact_start AS wait,query FROM pg_stat_activity ORDER BY wait DESC; datname | application_name | state | wait | query -+--++-+ | walreceiver | idle | | bench | pgbench | active | 00:31:38.367467 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (921, 38, 251973, -3868, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:38.215378 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (280, 95, 3954814, 2091, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.991056 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (447, 33, 8355237, 3438, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.619798 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (134, 16, 4839994, -2443, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.544196 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (37, 73, 9620119, 4053, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.334504 | UPDATE pgbench_branches SET bbalance = bbalance + -2954 WHERE bid = 33; bench | pgbench | active | 00:31:35.234112 | UPDATE pgbench_branches SET bbalance = bbalance + -713 WHERE bid = 38; bench | pgbench | active | 00:31:34.434676 | UPDATE pgbench_branches SET bbalance = bbalance + -921 WHERE bid = 33; bench | psql | active | 00:00:00| SELECT datname,application_name,state,now()-xact_start AS wait,query FROM pg_stat_activity ORDER BY wait DESC; (10 rows) but no errors in the logs, any thoughts? Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)
On 13/08/16 05:44, Jeff Janes wrote: On Fri, Aug 12, 2016 at 1:40 AM, Mark Kirkwood However your index rebuild gets you from 5 to 3 GB - does that really help performance significantly? It can make a big difference, depending on how much RAM you have. Yeah - I suspect this is the issue - loading up a similar type of schema with records with a primary key of form 'userx' for (uniformly) randomly distributed x... (I was gonna use the Yahoo benchmark but it is s slow...). Also I'm using 1000 rows instead of 1 to avoid waiting a long time (1000 should be enough to show the point): prefix=# \d prefix Table "public.prefix" Column | Type | Modifiers +---+--- uid| character varying(30) | not null filler | character(255)| Indexes: "prefix_pkey" PRIMARY KEY, btree (uid) Doing an uncached indexed read by forcing a buffer cache clear: # echo 3 > /proc/sys/vm/drop_caches prefix=# SELECT relfilenode,relname,reltuples,pg_relation_size(oid)/1024/1024 AS mb FROM pg_class WHERE relname LIKE 'prefix%'; relfilenode | relname | reltuples | mb -+-+---+- 6017817 | prefix | 1e+07 | 422 6017819 | prefix_pkey | 1e+07 | 391 (2 rows) prefix=# EXPLAIN ANALYZE SELECT count(*) FROM prefix WHERE uid='user1'; QUERY PLAN --- Aggregate (cost=8.46..8.46 rows=1 width=0) (actual time=3.408..3.408 rows=1 lo ops=1) -> Index Only Scan using prefix_pkey on prefix (cost=0.43..8.45 rows=1 widt h=0) (actual time=3.406..3.406 rows=0 loops=1) Index Cond: (uid = 'user1'::text) Heap Fetches: 0 Planning time: 19.362 ms Execution time: 3.429 ms (6 rows) Repeating this after REINDEX: # echo 3 > /proc/sys/vm/drop_caches prefix=# SELECT relfilenode,relname,reltuples,pg_relation_size(oid)/1024/1024 AS mb FROM pg_class WHERE relname LIKE 'prefix%'; relfilenode | relname | reltuples | mb -+-+---+- 6017817 | prefix | 1e+07 | 422 6017819 | prefix_pkey | 1e+07 | 300 (2 rows) prefix=# EXPLAIN ANALYZE SELECT count(*) FROM prefix WHERE uid='user1'; QUERY PLAN --- Aggregate (cost=8.46..8.46 rows=1 width=0) (actual time=3.868..3.868 rows=1 lo ops=1) -> Index Only Scan using prefix_pkey on prefix (cost=0.43..8.45 rows=1 widt h=0) (actual time=3.866..3.866 rows=0 loops=1) Index Cond: (uid = 'user1'::text) Heap Fetches: 0 Planning time: 19.366 ms Execution time: 3.889 ms (6 rows) So certainly not significantly *slower* with the physically bigger index. This suggests that Jeff's analysis was spot on - likely that the larger index didn't fix in RAM. Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)
After examining the benchmark design - I see we are probably not being helped by the repeated insertion of keys all of form 'userxxx' leading to some page splitting. However your index rebuild gets you from 5 to 3 GB - does that really help performance significantly? regards Mark On 11/08/16 16:08, Kisung Kim wrote: Thank you for your information. Here is the result: After insertions: ycsb=# select * from pgstatindex('usertable_pkey'); version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation -+++---+++-+---+--+ 2 | 3 | 5488721920 | 44337 | 4464 | 665545 | 0 | 0 | 52 | 11 (1 row) After rebuild: ycsb=# select * from pgstatindex('usertable_pkey'); version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation -+++---+++-+---+--+ 2 | 3 | 3154296832 | 41827 | 1899 | 383146 | 0 | 0 |90.08 | 0 It seems like that rebuild has an effect to reduce the number of internal and leaf_pages and make more dense leaf pages. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we lost Uber as a user
On 03/08/16 02:27, Robert Haas wrote: Personally, I think that incremental surgery on our current heap format to try to fix this is not going to get very far. If you look at the history of this, 8.3 was a huge release for timely cleanup of dead tuple. There was also significant progress in 8.4 as a result of 5da9da71c44f27ba48fdad08ef263bf70e43e689. As far as I can recall, we then made no progress at all in 9.0 - 9.4. We made a very small improvement in 9.5 with 94028691609f8e148bd4ce72c46163f018832a5b, but that's pretty niche. In 9.6, we have "snapshot too old", which I'd argue is potentially a large improvement, but it was big and invasive and will no doubt pose code maintenance hazards in the years to come; also, many people won't be able to use it or won't realize that they should use it. I think it is likely that further incremental improvements here will be quite hard to find, and the amount of effort will be large relative to the amount of benefit. I think we need a new storage format where the bloat is cleanly separated from the data rather than intermingled with it; every other major RDMS works that way. Perhaps this is a case of "the grass is greener on the other side of the fence", but I don't think so. Yeah, I think this is a good summary of the state of play. The only other new db development to use a non-overwriting design like ours that I know of was Jim Starky's Falcon engine for (ironically) Mysql 6.0. Not sure if anyone is still progressing that at all now. I do wonder if Uber could have successfully tamed dead tuple bloat with aggressive per-table autovacuum settings (and if in fact they tried), but as I think Robert said earlier, it is pretty easy to come up with a highly update (or insert + delete) workload that makes for a pretty ugly bloat component even with real aggressive autovacuuming. Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How can we expand PostgreSQL ecosystem?
On 06/03/16 18:29, MauMau wrote: > As I said in the previous greeting mail, I'd like to discuss how to > expand PostgreSQL ecosystem. Here, ecosystem means "interoperability" > -- the software products and cloud services which use/support > PostgreSQL. If pgsql-advocacy or somewhere else is better for this > topic, just tell me so. For cloud - in particular Openstack (which I am working with ATM), the biggest thing would be: - multi-master replication or failing that: - self managing single master failover (voting/quorum etc) so that operators can essentially 'set and forget'. We currently use Mysql+ Galera (multi master) and Mongodb (self managing single master) and the convenience and simplicity is just so important (Openstack is a huge complex collection of services - hand holding of any one service is pretty much a non starter). regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] planstats.sgml
On 16/02/16 20:01, Tatsuo Ishii wrote: Tatsuo Ishiiwrites: While reading planstats.sgml, I encounted a sentence which I don't understand. These numbers are current as of the last VACUUM or ANALYZE on the table. The planner then fetches the actual current number of pages in the table (this is a cheap operation, not requiring a table scan). If that is different from relpages then reltuples is scaled accordingly to arrive at a current number-of-rows estimate. In this case the value of relpages is up-to-date so the rows estimate is the same as reltuples. I don't understand the last sentence (In this case...). For me it seems it is talking about the case when replages is not different from what the planner fetches from the table. If so, why "In this case"? I think what it meant is "In the example above, ...". Feel free to change it if you think that is clearer. Oh, I see. I'm going to change "In this case" to "In the example above". Done. Yeah, that is clearer. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] planstats.sgml
On 16/02/16 12:46, David G. Johnston wrote: On Mon, Feb 15, 2016 at 4:23 PM, Tatsuo Ishii>wrote: While reading planstats.sgml, I encounted a sentence which I don't understand. These numbers are current as of the last VACUUM or ANALYZE on the table. The planner then fetches the actual current number of pages in the table (this is a cheap operation, not requiring a table scan). If that is different from relpages then reltuples is scaled accordingly to arrive at a current number-of-rows estimate. In this case the value of relpages is up-to-date so the rows estimate is the same as reltuples. I don't understand the last sentence (In this case...). For me it seems it is talking about the case when replages is not different from what the planner fetches from the table. If so, why "In this case"? Isn't "In this case" referrers to the previous sentence (If that is different from...)? Maybe "In this case" should be "Otherwise" or some such? The whole sentence is awkward but you are correct in your reading - and "otherwise" would be a solid choice. A long while ago when I wrote that, I was expressing the fact that *in the example* the numbers matched perfectly, but *in general* the planner would scale 'em. It still reads that way to me...but change away if you like :-) Though iIt seems the whole thing could be simplified to: These numbers are current as of the last VACUUM or ANALYZE on the table. To account for subsequent changes the planner obtains the actual page count for the table and scales pg_class.reltuples by a factor of "actual page count" over pg_class.relpages. The "cheap operation" comment seems gratuitous though could still be injected if desired. Well folk interested in performance, like to keep an eye on whether these sort of probes cost a lot... regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Horizontal scalability/sharding
On 01/09/15 21:41, Bruce Momjian wrote: Well, reworking our partitioning system is one of the things required for sharding, so at least we will clean up one mess while we create another. ;-) Seem my post to Josh Berkus just now --- I think if we don't use FDWs, that sharding is such a limited use-case that we will not implement it inside of Postgres. I'm thinking that partitioning and sharding are two different things: Partitioning is about reducing the amount of table data accessed and also perhaps easing admin activities (archiving/removing old stuff etc). Sharding is a about parallelism and redundancy...copies of stuff in different places and concurrent access by virtue of it being on different nodes! Now *maybe* FDW is a good way to approach this, but really would be nice to see a more rigorous analysis (I note that like XC and XL, Greenplum looked at the existing mechanisms around at the time and ended up writing their own). Now I'm aware that things have moved on - but I think there needs to be a proper discussion about design and what we think distributed data/sharding etc should provide *before* grabbing hold of FDW as *the answer*! Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_xlog - pg_xjournal?
On 01/06/15 05:29, Joel Jacobson wrote: While anyone who is familiar with postgres would never do something as stupid as to delete pg_xlog, according to Google, there appears to be a fair amount of end-users out there having made the irrevocable mistake of deleting the precious directory, a decision made on the assumption that since it has *log* in the name so it must be unimportant (http://stackoverflow.com/questions/12897429/what-does-pg-resetxlog-do-and-how-does-it-work). If we could turn back time, would we have picked pg_xlog as the most optimal name for this important directory, or would we have come up with a more user-friendly name? Personally, I have never had any problems with pg_xlog, but I realize there are quite a few unlucky new users who end up in trouble. My suggestion is to use pg_xjournal instead of pg_xlog when new users create a new data directory using initdb, and allow for both directories to exist (exclusive or, i.e. either one or the other, but not both). That way we don't complicate the life for any existing users, all their tools will continue to work who rely on pg_xlog to be named pg_xlog, but only force new users to do a bit of googling when they can't use whatever tool that can't find pg_xlog. When they find out it's an important directory, they can simply create a symlink and their old not yet updated tool will work again. Thoughts? +1 Strongly agree - I have had people on my dba course ask about deleting these pesky 'log' directories (obvious confusion/conflation with pg_log ...)! A change of name would help reduce the temptation! regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove fsync ON/OFF as a visible option?
On 22/03/15 08:14, Jaime Casanova wrote: El mar 21, 2015 2:00 AM, Mark Kirkwood mark.kirkw...@catalyst.net.nz mailto:mark.kirkw...@catalyst.net.nz escribió: On 21/03/15 19:28, Jaime Casanova wrote: what about not removing it but not showing it in postgresql.conf? as a side note, i wonder why trace_sort is not in postgresql.conf... other option is to make it a compile setting, that why if you want to have it you need to compile and postgres' developers do that routinely anyway -1 Personally I'm against hiding *any* settings. Choosing sensible defaults - yes! Hiding them - that reeks of secret squirrel nonsense and overpaid Oracle dbas that knew the undocumented settings for various capabilities. I think/hope that no open source project will try to emulate that meme! That ship has already sailed. http://www.postgresql.org/docs/9.4/static/runtime-config-developer.html Not really - they are documented in the official doc repo (that was the point I was making I think), and +1 for adding or improving the documentation for some of the more dangerous ones! While I'm against removing or hiding settings, I have no problem with shipping/generating a postgresql.conf that has *only* the non default settings therein, as that requires people to look at the docs where (of course) we have some sensible discussion about how to set the rest of 'em. I note that Mysql ship a pretty minimal confile files there days (5.5, 5.6) on Ubuntu, and that seems to cause no particular problem. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove fsync ON/OFF as a visible option?
On 21/03/15 19:28, Jaime Casanova wrote: On Fri, Mar 20, 2015 at 11:29 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Sat, Mar 21, 2015 at 2:47 AM, Peter Geoghegan p...@heroku.com wrote: On Fri, Mar 20, 2015 at 9:52 AM, Joshua D. Drake j...@commandprompt.com wrote: There are just as many people that are running with scissors that are now running (or attempting to run) our elephant in production. Does it make sense to remove fsync (and possibly full_page_writes) from such a visible place as postgresql.conf? -1 Anyone turning off fsync without even for a moment considering the consequences has only themselves to blame. I can't imagine why you'd want to remove full_page_writes or make it less visible either, since in principle it ought to be perfectly fine to turn it off in production once its verified as safe. -1 for its removal as well. It is still useful for developers to emulate CPU-bounded loads... I fought to remove fsync before so i understand JD concerns. and yes, i have seen fsync=off in the field too... what about not removing it but not showing it in postgresql.conf? as a side note, i wonder why trace_sort is not in postgresql.conf... other option is to make it a compile setting, that why if you want to have it you need to compile and postgres' developers do that routinely anyway -1 Personally I'm against hiding *any* settings. Choosing sensible defaults - yes! Hiding them - that reeks of secret squirrel nonsense and overpaid Oracle dbas that knew the undocumented settings for various capabilities. I think/hope that no open source project will try to emulate that meme! Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)
On 26/02/15 13:32, Simon Riggs wrote: On 25 February 2015 at 23:30, Jim Nasby jim.na...@bluetreble.com wrote: That said, I don't want to block this; I think it's useful. Though, perhaps it would be better as an extension instead of in contrib? I don't think it should be very version dependent? The whole point of this is to get it into contrib. It could have published it as an extension months ago. Is this intended to replace pg_freespacemap? Not trying to be overprotective or anything :-) but is this new extension/module better/faster/stronger/more accurate etc? If so then excellent! Ohth was wondering if people either a) didn't know about pg_freespacemap or b) consider that it is crap and won't use it. Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unable to build pg_rewind
On 25/02/15 12:47, Michael Paquier wrote: On Wed, Feb 25, 2015 at 8:03 AM, Mark Kirkwood mark.kirkw...@catalyst.net.nz mailto:mark.kirkw...@catalyst.net.nz wrote: On 25/02/15 11:06, Ratay, Steve wrote: I have checked out the pg_rewind code from https://github.com/vmware/pg_rewind.git on the master branch and am using PostgreSQL 9.4.1 source code to build against. When I try to compile pg_rewind, I am getting the following errors. How can I resolve these problems? It looks like master of pg_rewind expects to be built against Postgres 9.5. Try checking out the REL9_4_STABLE branch. This branching model makes management of the code easier because WAL-related APIs and WAL format can change a lot between major releases (and also because in this case it would have made the code less readable using blocks based on PG_VERSION_NUM for not that much result). Absolutely - and you've used the same branch names as the postgresql git repo does - so really nice and easy to match things up! Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unable to build pg_rewind
On 25/02/15 11:06, Ratay, Steve wrote: I have checked out the pg_rewind code from https://github.com/vmware/pg_rewind.git on the master branch and am using PostgreSQL 9.4.1 source code to build against. When I try to compile pg_rewind, I am getting the following errors. How can I resolve these problems? gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -I/var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql -I. -I./ -I/var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql/server -I/var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql/internal -DFRONTEND -c -o parsexlog.o parsexlog.c In file included from parsexlog.c:39: /var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql/server/access/rmgrlist.h:28:61: error: macro PG_RMGR requires 7 arguments, but only 6 given /var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql/server/access/rmgrlist.h:29:68: error: macro PG_RMGR requires 7 arguments, but only 6 given In file included from parsexlog.c:39: /var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql/server/access/rmgrlist.h:28: error: ‘PG_RMGR’ undeclared here (not in a function) /var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql/server/access/rmgrlist.h:29: error: expected ‘}’ before ‘PG_RMGR’ /var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql/server/access/rmgrlist.h:30:64: error: macro PG_RMGR requires 7 arguments, but only 6 given /var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql/server/access/rmgrlist.h:31:61: error: macro PG_RMGR requires 7 arguments, but only 6 given /var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql/server/access/rmgrlist.h:32:68: error: macro PG_RMGR requires 7 arguments, but only 6 given /var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql/server/access/rmgrlist.h:33:73: error: macro PG_RMGR requires 7 arguments, but only 6 given /var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql/server/access/rmgrlist.h:34:81: error: macro PG_RMGR requires 7 arguments, but only 6 given /var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql/server/access/rmgrlist.h:35:69: error: macro PG_RMGR requires 7 arguments, but only 6 given /var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql/server/access/rmgrlist.h:36:73: error: macro PG_RMGR requires 7 arguments, but only 6 given /var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql/server/access/rmgrlist.h:37:65: error: macro PG_RMGR requires 7 arguments, but only 6 given /var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql/server/access/rmgrlist.h:38:61: error: macro PG_RMGR requires 7 arguments, but only 6 given /var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql/server/access/rmgrlist.h:39:65: error: macro PG_RMGR requires 7 arguments, but only 6 given /var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql/server/access/rmgrlist.h:40:61: error: macro PG_RMGR requires 7 arguments, but only 6 given /var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql/server/access/rmgrlist.h:41:81: error: macro PG_RMGR requires 7 arguments, but only 6 given /var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql/server/access/rmgrlist.h:42:87: error: macro PG_RMGR requires 7 arguments, but only 6 given /var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql/server/access/rmgrlist.h:43:62: error: macro PG_RMGR requires 7 arguments, but only 6 given /var/tmp/vpagent/postgresql94-libs-9.4.1-build/usr/include/pgsql/server/access/rmgrlist.h:44:87: error: macro PG_RMGR requires 7 arguments, but only 6 given parsexlog.c: In function ‘findLastCheckpoint’: parsexlog.c:198: warning: implicit declaration of function ‘XLogRecGetInfo’ parsexlog.c:200: warning: implicit declaration of function ‘XLogRecGetRmid’ parsexlog.c: In function ‘extractPageInfo’: parsexlog.c:344: error: ‘XLR_SPECIAL_REL_UPDATE’ undeclared (first use in this function) parsexlog.c:344: error: (Each undeclared identifier is reported only once parsexlog.c:344: error: for each function it appears in.) parsexlog.c:358: error: ‘XLogReaderState’ has no member named ‘max_block_id’ parsexlog.c:364: warning: implicit declaration of function ‘XLogRecGetBlockTag’ make: *** [parsexlog.o] Error 1 It looks like master of pg_rewind expects to be built against Postgres 9.5. Try checking out the REL9_4_STABLE branch. Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] List of table names of a DB
Actually, code has moved to: https://github.com/snaga/pqc On 09/01/15 19:53, Mark Kirkwood wrote: Also see: https://code.google.com/p/pqc/ A project to implement a query cache using pgpool code, probably lots of good ideas there. Cheers Mark On 09/01/15 19:38, Tatsuo Ishii wrote: Hi, pgpool-II (pgpool.net) does exactly the same thing. It receive SELECT query from clients, 1) parse it to find table names, and 2) gets the oids (unique identifier in the PostgreSQL system catalog) to recognize them. when the SELECT succeeds , it store the query result (cache) on either shared memory or memcached according to user's choice. For query cache invalidation, pgpool-II remembers all oids related to the SELECTs which are source of query cache. If one of tables get updated, pgpoool-II invalidates all of cache using the oid. For #1, pgpool-II has a query parser copied from PostgreSQL. pgpool-II is an open source project, so you could get some idea to implement your own tool. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp Sorry, it's not about querying. I am implementing an invalidation mechanism for Postgres Query Cache as part of my masters project. In order to this, I need to store details(like name) of each table the query uses. In essence, I need to store the table names of the cached queries. Initially, I thought of writing a code that could extract the table names but later discovered that it is a gargantuan task as I shall have to include around 600 production rules as was hinted in a Stackoverflow Exchange post. Hence, I thought of getting hold of the data structure used for storing table names of a DB but I couldn't get it. Sorry for the long post but do you know where these tables information of a DB gets stored? Or can you suggest me a mechanism(needn't be fully perfect) to extract the table names? I went through the parser of postgres but it was confusing. Thanks Date: Fri, 9 Jan 2015 00:46:30 +1300 Subject: Re: [HACKERS] List of table names of a DB From: dgrowle...@gmail.com To: in.live...@live.in CC: pgsql-hackers@postgresql.org On 8 January 2015 at 22:41, Deepak S in.live...@live.in wrote: Hello, I am unable to find the function which retrieves the 'list of names of the tables' used in a DB. You may find what you want in: select table_name from information_schema.tables; http://www.postgresql.org/docs/9.4/static/infoschema-tables.html Reason: I need a mechanism by which I can extract the names of the tables used in a query which has been parsed. My plan is to check for a match of each word in the query with a list of names of the tables used in the current DB so that each hit confirms a 'table name' in the query in most cases. This sounds rather flaky. Maybe it would be better to just EXPLAIN the query and see if you get error code 42P01postgres=# \set VERBOSITY verbosepostgres=# explain select * from doesnotexist;ERROR: 42P01: relation doesnotexist does not existLINE 1: explain select * from doesnotexist; Unfortunately this won't help you much if you want to know all of the tables that don't exist. Also, just for the future, a question like this might be more suited for the pgsql-gene...@postgresql.org list. Regards David Rowley -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] List of table names of a DB
Also see: https://code.google.com/p/pqc/ A project to implement a query cache using pgpool code, probably lots of good ideas there. Cheers Mark On 09/01/15 19:38, Tatsuo Ishii wrote: Hi, pgpool-II (pgpool.net) does exactly the same thing. It receive SELECT query from clients, 1) parse it to find table names, and 2) gets the oids (unique identifier in the PostgreSQL system catalog) to recognize them. when the SELECT succeeds , it store the query result (cache) on either shared memory or memcached according to user's choice. For query cache invalidation, pgpool-II remembers all oids related to the SELECTs which are source of query cache. If one of tables get updated, pgpoool-II invalidates all of cache using the oid. For #1, pgpool-II has a query parser copied from PostgreSQL. pgpool-II is an open source project, so you could get some idea to implement your own tool. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp Sorry, it's not about querying. I am implementing an invalidation mechanism for Postgres Query Cache as part of my masters project. In order to this, I need to store details(like name) of each table the query uses. In essence, I need to store the table names of the cached queries. Initially, I thought of writing a code that could extract the table names but later discovered that it is a gargantuan task as I shall have to include around 600 production rules as was hinted in a Stackoverflow Exchange post. Hence, I thought of getting hold of the data structure used for storing table names of a DB but I couldn't get it. Sorry for the long post but do you know where these tables information of a DB gets stored? Or can you suggest me a mechanism(needn't be fully perfect) to extract the table names? I went through the parser of postgres but it was confusing. Thanks Date: Fri, 9 Jan 2015 00:46:30 +1300 Subject: Re: [HACKERS] List of table names of a DB From: dgrowle...@gmail.com To: in.live...@live.in CC: pgsql-hackers@postgresql.org On 8 January 2015 at 22:41, Deepak S in.live...@live.in wrote: Hello, I am unable to find the function which retrieves the 'list of names of the tables' used in a DB. You may find what you want in: select table_name from information_schema.tables; http://www.postgresql.org/docs/9.4/static/infoschema-tables.html Reason: I need a mechanism by which I can extract the names of the tables used in a query which has been parsed. My plan is to check for a match of each word in the query with a list of names of the tables used in the current DB so that each hit confirms a 'table name' in the query in most cases. This sounds rather flaky. Maybe it would be better to just EXPLAIN the query and see if you get error code 42P01postgres=# \set VERBOSITY verbosepostgres=# explain select * from doesnotexist;ERROR: 42P01: relation doesnotexist does not existLINE 1: explain select * from doesnotexist; Unfortunately this won't help you much if you want to know all of the tables that don't exist. Also, just for the future, a question like this might be more suited for the pgsql-gene...@postgresql.org list. Regards David Rowley -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mysql with postgres
On 23/12/14 22:36, Ravi Kiran wrote: hi all, Is postgres source code compatible with mysql database?? If it is, could someone could give me some links so that I can do that. I want to hack into the postgres source code, but as I am comfortable with mysql, I want to use the mysql database not postgres. any references would be fine. I'm wondering if you are thinking that you can use Postgres as a Mysql storage engine? While Mysql does has pluggable storage engines...Postgres is not designed to be able to be used in this way (that would be an interesting - but big and probably controversial project to undertake). So if you are more familiar with Mysql, why not hack Innodb? Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest problems
On 19/12/14 20:48, Andres Freund wrote: On 2014-12-18 10:02:25 -0800, Joshua D. Drake wrote: I think a lot of hackers forget exactly how tender their egos are. Now I say this knowing that a lot of them will go, Oh give me a break but as someone who employs hackers, deals with open source AND normal people :P every single day, I can tell you without a single inch of sarcasm that petting egos is one of the ways you get things done in the open source (and really any male dominated) community. To me that's a bit over the top stereotyping. +1 Having been mentioned one or two times myself - it was an unexpected wow - cool rather than a grumpy/fragile I must be noticed thing. I think some folk have forgotten the underlying principle of the open source community - it is about freely giving - time or code etc. The there must be something in it for me me me meme is - well - the *other* world view. However, doing crappy work and let's not be shy about it, there is NOTHING fun about reviewing someone else's code needs to have incentive. FWIW, I don't agree with this at all. Reviewing code can be quite interesting - with the one constraint that the problem the patch solves needs to be somewhat interesting. The latter is what I think gets many of the more experienced reviewers - lots of the patches just solve stuff they don't care about. Yeah, and also it helps if the patch addresses an area that you at least know *something* about - otherwise it is really hard to review in any useful way. Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest problems
On 20/12/14 11:22, Joshua D. Drake wrote: On 12/19/2014 12:28 AM, Mark Kirkwood wrote: To me that's a bit over the top stereotyping. +1 Having been mentioned one or two times myself - it was an unexpected wow - cool rather than a grumpy/fragile I must be noticed thing. I think some folk have forgotten the underlying principle of the open source community - it is about freely giving - time or code etc. The there must be something in it for me me me meme is - well - the *other* world view. It was supposed to be over the top. That doesn't make it any less true. Sure there are plenty of us that don't have any of the ego petting issues. However,t there are more of us in those of us that think we don't, that really, really do. Heh - that fact that even you are stating it is over the top clearly makes it less *generally* true. Sure, there are some/few(?) folk who are seeing open source contributions as purely a CV enhancer, and perhaps a few in denial about their egos, but I don't see that as the common trend in this community (which is great). regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vitesse DB call for testing
On 18/10/14 07:13, Josh Berkus wrote: CK, Before we go any further on this, how is Vitesse currently licensed? last time we talked it was still proprietary. If it's not being open-sourced, we likely need to take discussion off this list. +1 Guys, you need to 'fess up on the licensing! Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On 04/10/14 11:21, Andres Freund wrote: On 2014-10-03 18:16:28 -0400, Bruce Momjian wrote: On Sat, Oct 4, 2014 at 12:13:00AM +0200, Andres Freund wrote: Do we really want to expose a setting a few of us _might_ ask customers to change? They also will try that themselves. Our customers aren't a horde of dumb people. Some of them are willing to try things if they hit scalability problesm. And *lots* of people hit scalability problems with postgres. In fact I've seen big users migrate away from postgres because of them. And it's not like this only affects absurd cases. Even a parallel restore will benefit. I disagree. I just don't see the value in having such undefined variables. undefined variables? I'm not arguing that we don't need documentation for it. Obviously we'd need that. I'm arguing against taking away significant scalability possibilities from our users. My bet is that it's more than 50% on a bigger machine. I don't think we can offer absolutely accurate tuning advice, but I'm sure we can give some guidance. Let me try. +1 I think it is ok to document our reason for providing the new GUC - along with that fact that it is a new one and we need more field testing and benchmarks to provide comprehensive advice about how to set - and recommend leaving it alone unless consult with experts/this list etc. Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On 04/10/14 12:10, Bruce Momjian wrote: On Sat, Oct 4, 2014 at 12:00:36PM +1300, Mark Kirkwood wrote: I don't think we can offer absolutely accurate tuning advice, but I'm sure we can give some guidance. Let me try. +1 I think it is ok to document our reason for providing the new GUC - along with that fact that it is a new one and we need more field testing and benchmarks to provide comprehensive advice about how to set - and recommend leaving it alone unless consult with experts/this list etc. I predict that such a setting will remain in postgresql.conf for years with almost zero activity, as have other similar efforts. Sure that *may* happen. In fact in my experience the vast majority of our current GUCs are never altered in the field - however when you run into a situation where a certain GUC solves your performance issue, then that seldom used GUC really gets some love. So altho I get your point about endless proliferation of 'em not being cost free, I'd like to plug the other side of the argument too - having the flexibility to adjust your Postgres installation to work well with random platform with annoying quirks is the corresponding benefit. In addition with the increasing use of cloud platforms - the situation above is likely to become *more* common (Postgres in Openstack using Ceph for volume storage is a case in point). Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] “Core” function in Postgres
On 24/09/14 11:29, Mingzhe Li wrote: Hi experts, I want to know what's the core function used in Postgres server? I am looking for something corresponding to main() in a simple C program. I want to know the file path and the function name. I am using Postgres 9.3.5, however I assume the core function will be unchanged between different revisions. I suspect you want to start looking at PostgresMain in src/backend/tcop/postgres.c and ServerLoop in src/backend/postmaster/postmaster.c Regards Mark P.s: FWIW I think this *is* the right list to ask this type of question... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres code for a query intermediate dataset
On 14/09/14 20:43, Mark Kirkwood wrote: On 14/09/14 20:24, Atri Sharma wrote: How do you plan to do all that VACUUM does for this table then? It seems to me that you are saying to VACUUM that it need not be concerned with table 'A' and you are assuming ownership of all the tasks performed by VACUUM for this table. Seems pretty broken to me, not to mention the performance degradations. I think the whole point of such a modification is that nothing is done to such tables, as you want to see all the previous versions. Clearly this is less performant for standard workloads...but we are talking about non standard workloads surely... To be fair with respect to what Atri is saying, I should have said something like: Clearly this is *horribly* less performant for standard workloads...etc :-) Also there is the good point he raised about transaction xid wrap, so some messing about with that part of VACUUM would be required too (it's the little complications that all add up)! The TRIGGER based approach is clearly a lot simpler! However for an interest project to understand Postgres internals the other approach is worthwhile. Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres code for a query intermediate dataset
On 14/09/14 05:36, Rohit Goyal wrote: Hi All, I want to work on the code of intermediate dataset of select and update query. For example. Rohit's salary has been updated 4 times, so it has 4 different version of salary. I want to select salary of person named Rohit. Now suppose , in intermediate result, I found 4 different versions of the data. I want to know the code portion which i need to look for working on all 4 versions in dataset. :) Hi Rohit, Currently in Postgres, these intermediate versions all exist - however a given session can only see one of them. Also VACUUM is allowed to destroy versions that no other transactions can see. So if I'm understanding you correctly, you would like to have some way for a session to see *all* these versions (and I guess preventing VACUUM from destroying them). It is certainly possible (or used to be via snapshot manipulation, I haven't looked at that code in a while sorry) to enable a session to see all the old versions, and is quite a cool idea (Postgres used to have this ability in older versions - called Time Travel). For pure practicality, this can be achieved without any code modifications using TRIGGERs and an extra table (as Gavin alludes to). Do tell us a bit more about what you are wanting to do! Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 14/09/14 19:00, Amit Kapila wrote: On Fri, Sep 12, 2014 at 11:09 PM, Gregory Smith gregsmithpg...@gmail.com mailto:gregsmithpg...@gmail.com wrote: This looks like it's squashed one of the very fundamental buffer scaling issues though; well done Amit. Thanks. I'll go back to my notes and try to recreate the pathological cases that plagued both the 8.3 BGW rewrite and the aborted 9.2 fsync spreading effort I did; get those running again and see how they do on this new approach. I have a decent sized 24 core server that should be good enough for this job. I'll see what I can do. It will be really helpful if you can try out those cases. And if you want 'em run on the 60 core beast, just let me know the details and I'll do some runs for you. Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres code for a query intermediate dataset
On 14/09/14 19:25, Atri Sharma wrote: On Sunday, September 14, 2014, Mark Kirkwood mark.kirkw...@catalyst.net.nz mailto:mark.kirkw...@catalyst.net.nz wrote: On 14/09/14 05:36, Rohit Goyal wrote: Hi All, I want to work on the code of intermediate dataset of select and update query. For example. Rohit's salary has been updated 4 times, so it has 4 different version of salary. I want to select salary of person named Rohit. Now suppose , in intermediate result, I found 4 different versions of the data. I want to know the code portion which i need to look for working on all 4 versions in dataset. :) Hi Rohit, Currently in Postgres, these intermediate versions all exist - however a given session can only see one of them. Also VACUUM is allowed to destroy versions that no other transactions can see. So if I'm understanding you correctly, you would like to have some way for a session to see *all* these versions (and I guess preventing VACUUM from destroying them). Any modifications of that sort are bound to introduce lots of pain, not to mention performance degradation and the added responsibility of ensuring that dead tuples don't bloat up the system (prevent vacuum from running at regular intervals and you can have a xid wraparound). I just mentioned that in case you are planning to go in that direction. If you only want the data, use the triggers as Gavin mentioned. Obviously in the general case sure - but (as yet) we don't have much idea about Rohit's use case and workload. If retrieving past versions is the *primary* workload bias and high update concurrency is not required then this could well work better than a trigger based solution. And it does not seem too onerous to have the ability to switch this on as required, viz: ALTER TABLE table1 VERSIONING; (or similar syntax) which makes VACUUM leave this table alone. It might make more sense to make such a concept apply to a TABLESPACE instead mind you (i.e things in here are for archive/versioning purposes)... Clearly we'd need to see the code for any of this and evaluate if it is good or terrible, but I'm not seeing the idea as bad as stated. Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres code for a query intermediate dataset
On 14/09/14 20:24, Atri Sharma wrote: On Sun, Sep 14, 2014 at 1:30 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz mailto:mark.kirkw...@catalyst.net.nz wrote: On 14/09/14 19:25, Atri Sharma wrote: On Sunday, September 14, 2014, Mark Kirkwood mark.kirkw...@catalyst.net.nz mailto:mark.kirkw...@catalyst.net.nz mailto:mark.kirkwood@__catalyst.net.nz mailto:mark.kirkw...@catalyst.net.nz wrote: On 14/09/14 05:36, Rohit Goyal wrote: Hi All, I want to work on the code of intermediate dataset of select and update query. For example. Rohit's salary has been updated 4 times, so it has 4 different version of salary. I want to select salary of person named Rohit. Now suppose , in intermediate result, I found 4 different versions of the data. I want to know the code portion which i need to look for working on all 4 versions in dataset. :) Hi Rohit, Currently in Postgres, these intermediate versions all exist - however a given session can only see one of them. Also VACUUM is allowed to destroy versions that no other transactions can see. So if I'm understanding you correctly, you would like to have some way for a session to see *all* these versions (and I guess preventing VACUUM from destroying them). Any modifications of that sort are bound to introduce lots of pain, not to mention performance degradation and the added responsibility of ensuring that dead tuples don't bloat up the system (prevent vacuum from running at regular intervals and you can have a xid wraparound). I just mentioned that in case you are planning to go in that direction. If you only want the data, use the triggers as Gavin mentioned. Obviously in the general case sure - but (as yet) we don't have much idea about Rohit's use case and workload. If retrieving past versions is the *primary* workload bias and high update concurrency is not required then this could well work better than a trigger based solution. And it does not seem too onerous to have the ability to switch this on as required, viz: ALTER TABLE table1 VERSIONING; (or similar syntax) which makes VACUUM leave this table alone. How do you plan to do all that VACUUM does for this table then? It seems to me that you are saying to VACUUM that it need not be concerned with table 'A' and you are assuming ownership of all the tasks performed by VACUUM for this table. Seems pretty broken to me, not to mention the performance degradations. I think the whole point of such a modification is that nothing is done to such tables, as you want to see all the previous versions. Clearly this is less performant for standard workloads...but we are talking about non standard workloads surely... Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres code for a query intermediate dataset
On 14/09/14 20:11, Rohit Goyal wrote: Hi Mark, On Sun, Sep 14, 2014 at 8:57 AM, Mark Kirkwood mark.kirkw...@catalyst.net.nz mailto:mark.kirkw...@catalyst.net.nz wrote: Currently in Postgres, these intermediate versions all exist - however a given session can only see one of them. Also VACUUM is allowed to destroy versions that no other transactions can see. Exactly, one visible version is there per session. But, I want to test my algorithm in which i myself want to work on all the intermediate version and find the correct one for the session. So if I'm understanding you correctly, you would like to have some way for a session to see *all* these versions (and I guess preventing VACUUM from destroying them). yes and I also want to know the portion of code where i can find all the historical or intermediate versions Well that's going to be a bit of a learning curve for you :-), the concept to get familiar with is snapshots (see src/backand/access/heap/heapam.c to get started). It is certainly possible (or used to be via snapshot manipulation, I haven't looked at that code in a while sorry) to enable a session to see all the old versions, and is quite a cool idea (Postgres used to have this ability in older versions - called Time Travel). For pure practicality, this can be achieved without any code modifications using TRIGGERs and an extra table (as Gavin alludes to). Can you explain me more about how to starting working using trigger? I'm not sure we have specif examples in the docs for what you want to do, but generally see http://www.postgresql.org/docs/9.4/static/plpgsql-trigger.html Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres code for a query intermediate dataset
On 14/09/14 21:18, Rohit Goyal wrote: Hi Mark Atri, :) Thanks for reply. But, I think i confused you. I am talking about access using indexes. So, I assume that B+ tree store key-value pair where rohit is the key and all the versions are its value. Another way to think is I have a secondary index on emp. name and there are 4 rohit exist in DB. So, now B+ tree gives me 4 different tuple pointer for each Rohit. I want to know the code portion for this where i can see all 4 tuple pointer before each one have I/O access to fetch its tuple. Are the suggestions still valid? Visibility rules mentioned earlier apply equally to tables and indexes (strictly speaking what happens is index tuples are checked against the relevant tables to see if your session can see them), so discussion of whether tuples are retrieved via index or table scans is not really relevant (i.e query planning/optimization is separate from tuple visibility). Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 10/09/14 18:54, Amit Kapila wrote: On Wed, Sep 10, 2014 at 5:46 AM, Mark Kirkwood mark.kirkw...@catalyst.net.nz mailto:mark.kirkw...@catalyst.net.nz wrote: In terms of the effect of the patch - looks pretty similar to the scale 2000 results for read-write, but read-only is a different and more interesting story - unpatched 9.4 is noticeably impacted in the higher client counts, whereas the patched version scales as well (or even better perhaps) than in the scale 2000 case. Yeah, that's what I was expecting, the benefit of this patch will be more at higher client count when there is large data and all the data can fit in RAM . Many thanks for doing the performance test for patch. No worries, this is looking like a patch we're going to apply to 9.4 to make the 60 core beast scale a bit better, so thanks very much for your work in this area. If you would like more tests run at higher scales let me know (we have two of these machines at pre-production state currently so I can run benchmarks as reqd)! Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 05/09/14 23:50, Amit Kapila wrote: On Fri, Sep 5, 2014 at 8:42 AM, Mark Kirkwood mark.kirkw...@catalyst.net.nz mailto:mark.kirkw...@catalyst.net.nz wrote: On 04/09/14 14:42, Amit Kapila wrote: On Thu, Sep 4, 2014 at 8:00 AM, Mark Kirkwood mark.kirkw...@catalyst.net.nz mailto:mark.kirkw...@catalyst.net.nz wrote: Hi Amit, Results look pretty good. Does it help in the read-write case too? Last time I ran the tpc-b test of pgbench (results of which are posted earlier in this thread), there doesn't seem to be any major gain for that, however for cases where read is predominant, you might see better gains. I am again planing to take that data in next few days. FWIW below are some test results on the 60 core beast with this patch applied to 9.4. I'll need to do more runs to iron out the variation, but it looks like the patch helps the standard (write heavy) pgbench workload a little, and clearly helps the read only case. Thanks for doing the test. I think if possible you can take the performance data with higher scale factor (4000) as it seems your m/c has 1TB of RAM. You might want to use latest patch I have posted today. Here's some fairly typical data from read-write and read-only runs at scale 4000 for 9.4 beta2 with and without the v7 patch (below). I'm not seeing much variation between repeated read-write runs with the same config (which is nice - sleep 30 and explicit checkpoint call between each one seem to help there). Interestingly, I note anecdotally that (unpatched) 9.4 beta2 seems to be better at higher client counts than beta1 was... In terms of the effect of the patch - looks pretty similar to the scale 2000 results for read-write, but read-only is a different and more interesting story - unpatched 9.4 is noticeably impacted in the higher client counts, whereas the patched version scales as well (or even better perhaps) than in the scale 2000 case. read write (600s) Clients | tps| tps (unpatched) -++ 6 | 9395 | 9334 12 | 16605 | 16525 24 | 24634 | 24910 48 | 32170 | 31275 96 | 35675 | 36533 192 | 35579 | 31137 384 | 30528 | 28308 read only (300s) Clients | tps| tps (unpatched) -++ 6 | 35743 | 35362 12 | 111019 | 106579 24 | 199746 | 160305 48 | 327026 | 198407 96 | 379184 | 171863 192 | 356623 | 152224 384 | 340878 | 128308 regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 04/09/14 14:42, Amit Kapila wrote: On Thu, Sep 4, 2014 at 8:00 AM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: Hi Amit, Results look pretty good. Does it help in the read-write case too? Last time I ran the tpc-b test of pgbench (results of which are posted earlier in this thread), there doesn't seem to be any major gain for that, however for cases where read is predominant, you might see better gains. I am again planing to take that data in next few days. FWIW below are some test results on the 60 core beast with this patch applied to 9.4. I'll need to do more runs to iron out the variation, but it looks like the patch helps the standard (write heavy) pgbench workload a little, and clearly helps the read only case. 4x E7-4890 15 cores each. 1 TB ram 16x Toshiba PX02SS SATA SSD 4x Samsung NVMe XS1715 PCIe SSD Ubuntu 14.04 (Linux 3.13) Postgres 9.4 beta2 + buffer eviction patch v5 Pgbench scale 2000 Non default params: max_connections = 400; shared_buffers = 10GB; maintenance_work_mem = 1GB; effective_io_concurrency = 10; wal_buffers = 256MB; checkpoint_segments = 1920; checkpoint_completion_target = 0.8; ssl = 'off'; wal_sync_method = 'open_datasync'; read write elapsed 600s Clients | tps | tps (unpatched) -+---+ 6 | 8279 | 8328 12 | 16260 | 16381 24 | 23639 | 23451 48 | 31430 | 31004 96 | 38516 | 34777 192 | 33535 | 32443 384 | 27978 | 25068 384 | 30589 | 28798 read only elapsed 300s Clients | tps| tps (unpatched) -++ 6 | 57654 | 57255 12 | 111361 | 112360 24 | 220304 | 187967 48 | 384567 | 230961 96 | 380309 | 241947 192 | 330865 | 214570 384 | 315516 | 207548 Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 03/09/14 16:22, Amit Kapila wrote: On Wed, Sep 3, 2014 at 9:45 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Aug 28, 2014 at 4:41 PM, Amit Kapila amit.kapil...@gmail.com wrote: I have yet to collect data under varying loads, however I have collected performance data for 8GB shared buffers which shows reasonably good performance and scalability. I think the main part left for this patch is more data for various loads which I will share in next few days, however I think patch is ready for next round of review, so I will mark it as Needs Review. I have collected more data with the patch. I understand that you have given more review comments due to which patch require changes, however I think it will not effect the performance data to a great extent and I have anyway taken the data, so sharing the same. Performance Data: --- Configuration and Db Details IBM POWER-7 16 cores, 64 hardware threads RAM = 64GB Database Locale =C checkpoint_segments=256 checkpoint_timeout=15min scale factor = 3000 Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8) Duration of each individual run = 5mins All the data is in tps and taken using pgbench read-only load Common configuration remains same as above. Forgot to mention that data is a median of 3 runs and attached sheet contains data for individual runs. Hi Amit, Results look pretty good. Does it help in the read-write case too? Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL 2
On 02/09/14 21:25, Álvaro Hernández Tortosa wrote: On 02/09/14 05:24, Craig Ringer wrote: I couldn't disagree more. If we were to implement anything, it'd be PL/PSM (http://en.wikipedia.org/wiki/SQL/PSM). I'm sure it's as bizarre and quirky as anything else the SQL committee has brought forth, but it's at least a standard(ish) language. So we'd choose a bizarre and quirky language instead of anything better just because it's standard. I'm sure current and prospective users will surely prefer a bizarre and quirky language that is standard approved, rather than a modern, comfortable, easy-to-use, that is not embodied by the ISO. No doubt ^_^ Well there is the risk that by randomly adding new syntax to PL/pgSQL we turn it in a bizarre and quirky *non standard* language. Part of the attraction of PL/pgsql is that it is Ada like - if we break that too much then...well...that would be bad. So I think a careful balance is needed, to add new features that keep the spirit of the original language. Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL 2
On 02/09/14 15:46, Craig Ringer wrote: was is exactly why we need a new language and that All the clumsy stuff we cannot fix in plpgsql, can easily be fixed in plpgsql2, with the most beautiful syntax we can come up with. But you haven't said HOW you propose to fix this one case. Unfortunately, there is likely to be a (large) variance of opinion concerning the details. In particular 'beautiful/elegant...'. Err - these things are mostly in the eye of the beholder. E.g: I might want this new shiny syntax to be lisp like, as that is beautiful (heh, kidding - but you should get the idea). Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
On 29/08/14 08:56, Alvaro Herrera wrote: Robert Haas wrote: I agree that you might not like that. But you might not like having the table vacuumed slower than the configured rate, either. My impression is that the time between vacuums isn't really all that negotiable for some people. I had one customer who had horrible bloat issues on a table that was vacuumed every minute; when we changed the configuration so that it was vacuumed every 15 seconds, those problems went away. Wow, that's extreme. For that case you can set autovacuum_vacuum_cost_limit to 0, which disables the whole thing and lets vacuum run at full speed -- no throttling at all. Would that satisfy the concern? Well no - you might have a whole lot of big tables that you want vacuum to not get too aggressive on, but a few small tables that are highly volatile. So you want *them* vacuumed really fast to prevent them becoming huge tables with only a few rows therein, but your system might not be able to handle *all* your tables being vacuum full speed. This is a fairly common scenario for (several) web CMS systems that tend to want to have session and/cache tables that are small and extremely volatile, plus the rest of the (real) data that is bigger and vastly less volatile. While there is a valid objection along the lines of don't use a database instead of memcache, it does seem reasonable that Postgres should be able to cope with this type of workload. Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
On 27/08/14 10:27, Alvaro Herrera wrote: Alvaro Herrera wrote: So my proposal is a bit more complicated. First we introduce the notion of a single number, to enable sorting and computations: the delay equivalent, which is the cost_limit divided by cost_delay. Here's a patch that implements this idea. As you see this is quite a bit more complicated that Haribabu's proposal. There are two holes in this: 1. if you ALTER DATABASE to change vacuum delay for a database, those values are not considered in the global equiv delay. I don't think this is very important and anyway we haven't considered this very much, so it's okay if we don't handle it. 2. If you have a fast worker that's only slightly faster than regular workers, it will become slower in some cases. This is explained in a FIXME comment in the patch. I don't really have any more time to invest in this, but I would like to see it in 9.4. Mark, would you test this? Haribabu, how open are you to fixing point (2) above? Thanks Alvaro - I will take a look. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
On 27/08/14 10:27, Alvaro Herrera wrote: Alvaro Herrera wrote: So my proposal is a bit more complicated. First we introduce the notion of a single number, to enable sorting and computations: the delay equivalent, which is the cost_limit divided by cost_delay. Here's a patch that implements this idea. As you see this is quite a bit more complicated that Haribabu's proposal. There are two holes in this: 1. if you ALTER DATABASE to change vacuum delay for a database, those values are not considered in the global equiv delay. I don't think this is very important and anyway we haven't considered this very much, so it's okay if we don't handle it. 2. If you have a fast worker that's only slightly faster than regular workers, it will become slower in some cases. This is explained in a FIXME comment in the patch. I don't really have any more time to invest in this, but I would like to see it in 9.4. Mark, would you test this? Haribabu, how open are you to fixing point (2) above? I did some testing with this patch applied. Minimally tweaking autovacuum (naptime of 5s) with a single table 'cache0' created with a cost limit setting of 1, running: $ pgbench -n -c8 -T300 -f volatile0.sql cache and monitoring the size of 'cache0' table showed a steady state of: cache=# SELECT pg_relation_size('cache0')/(1024*1024) AS mb; mb -- 85 So far so good. Adding another table 'cache1' similar to the previous but lacking any per table autovacuum settings, and running 2 pgbench sessions: $ pgbench -n -c8 -T300 -f volatile0.sql cache $ pgbench -n -c8 -T300 -f volatile1.sql cache (volatile1.sql just uses table 'cache1' instead of 'cache0') shows after a few minutes: cache=# SELECT relname,pg_relation_size(oid)/(1024*1024) AS mb FROM pg_class WHERE relname like 'cache_'; relname | mb -+ cache0 |664 cache1 | 1900 So we are definitely seeing the 'fast' worker being slowed down. Also, the growth of 'cache1' was only a bit faster than 'cache0' - so the 'slow' worker was getting a speed boost was well. So looks like good progress, but yeah - point (2) is obviously rearing its head in this test. Cheers Mark schema.sql Description: application/sql volatile0.sql Description: application/sql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Trove with PostgreSQL-XC
On 19/08/14 23:14, Vivek Singh Raghuwanshi wrote: Hi All, Please let me know is that possible to use Openstack Trove with Postgres-XC. With instances and Baremetal (after Juno Release). I Know it is possible to use other medium like MySQL or PostgreSQL, but i am not sure about XC. AFAIK [1], vanilla Postgres support for Trove is still at the review stage, and there is quite a bit to finish, and some stuff to fix. I'd think that getting Trove to provision a complete XC VM (or more sensibly a set of VMs that run XC) is going to require fair bit of extra work. It looks like replication will be deployable via Trove for the supported flavours (Mysql, Mongo, Cassandra, Counch, Redis) see https://wiki.openstack.org/wiki/Trove/Replication-And-Clustering . I'm not seeing any specific discussion of sharding notice. Cheers Mark [1] I've been helping debug the Postgres Trove patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?
On 05/08/14 17:56, Mark Kirkwood wrote: Adding in the 'if' in the float8 case increases run time to 4s. So looks like plpgsql might have a slightly higher cost for handling added conditionals. Be interesting to dig a bit more and see what is taking the time. Thinking about this a bit more, I wonder if the 'big O' has added some optimizations in PL/SQL for trivial conditionals - i.e you are adding: IF (0 = 0) THEN END IF; ...it may be going...'Ah yes, always true...so remove'! So it might be interesting to try some (hopefully not so easily removable) non trivial ones like: DO LANGUAGE plpgsql $$ DECLARE DECLARE i integer; BEGIN FOR i IN 1..1000 LOOP IF (i%100 = 0) THEN NULL; END IF; END LOOP; END $$; Now I guess there is the chance that PL/SQL might understand that NULL inside a loop means it can remove it...so you may need to experiment further. The point to take away here is that for interesting loops and conditions - there may be not such a significant difference! Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?
On 05/08/14 08:48, testman1316 wrote: We am trying to get an idea of the raw performance of Oracle vs PostgreSQL. We have extensive oracle experience but are new to PostgreSQL. We are going to run lots of queries with our data, etc. But first we wanted to see just how they perform on basic kernel tasks, i.e. math and branching since SQL is built on that. In AWS RDS we created two db.m3.2xlarge instances one with oracle 11.2.0.4.v1 license included, the other with PostgreSQL (9.3.3) In both we ran code that did 1 million square roots (from 1 to 1 mill). Then did the same but within an If..Then statement. The results were a bit troubling: Oracle 4.8 seconds PostgreSQL 21.803 seconds adding an if statement: Oracle 4.78 seconds PostgreSQL 24.4 seconds code Oracle square root SET SERVEROUTPUT ON SET TIMING ON DECLARE n NUMBER := 0; BEGIN FOR f IN 1..1000 LOOP n := SQRT (f); END LOOP; END; PostgreSQL DO LANGUAGE plpgsql $$ DECLARE n real; DECLARE f integer; BEGIN FOR f IN 1..1000 LOOP n = SQRT (f); END LOOP; RAISE NOTICE 'Result = %',n; END $$; oracle adding if SET SERVEROUTPUT ON SET TIMING ON DECLARE n NUMBER := 0; BEGIN FOR f IN 1..1000 LOOP if 0 =0 then n := SQRT (f); end if; END LOOP; postgres adding if DO LANGUAGE plpgsql $$ DECLARE n real; DECLARE f integer; BEGIN FOR f IN 1..1000 LOOP if 0=0 then n = SQRT (f); end if; END LOOP; RAISE NOTICE 'Result = %',n; END $$; I used an anonymous block for PostgreSQL. I also did it as a function and got identical results CREATE OR REPLACE FUNCTION testpostgrescpu() RETURNS real AS $BODY$ declare n real; f integer; BEGIN FOR f IN 1..1000 LOOP n = SQRT (f); END LOOP; RETURN n; END; $BODY$ LANGUAGE plpgsql VOLATILE COST 100; ALTER FUNCTION testpostgrescpu() OWNER TO xxx Based on what we had heard of PostgreSQL and how it is comparable to Oracle in many ways, we were taken aback by the results. Did we code PostgreSQL incorrectly? What are we missing or is this the way it is. Note: once we started running queries on the exact same data in Oracle and PostgreSQL we saw a similar pattern. On basic queries little difference, but as they started to get more and more complex Oracle was around 3-5 faster. Again, this was run on identical AWS RDS instances, we ran them many times during the day on different days and results were always the same Looking at this guy: DO LANGUAGE plpgsql $$ DECLARE n real; DECLARE f integer; BEGIN FOR f IN 1..1000 LOOP n = SQRT (f); END LOOP; RAISE NOTICE 'Result = %',n; END $$; Takes about 12s with with Postgres 9.4 running on Ubuntu 14.04 hosted on real HW (Intel i7). Changing n to be float8 rather than real, i.e: DO LANGUAGE plpgsql $$ DECLARE n float8; DECLARE f integer; BEGIN FOR f IN 1..1000 LOOP n = SQRT (f); END LOOP; RAISE NOTICE 'Result = %',n; END $$; ...time drops to about 2s (which I'm guessing would get it to about Oracle speed on your EC2 setup). The moral of the story for this case is that mapping Oracle to Postgres datatypes can require some careful thought. Using 'native' types (like integer, float8 etc) will generally give vastly quicker performance. Adding in the 'if' in the float8 case increases run time to 4s. So looks like plpgsql might have a slightly higher cost for handling added conditionals. Be interesting to dig a bit more and see what is taking the time. Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parametric block size?
On 26/07/14 21:05, Andres Freund wrote: More advanced features, but with much more impact on the code, would be to be able to change the size at database/table level. That'd be pretty horrible because the size of pages in shared_buffers wouldn't be uniform anymore. Possibly stopping at the tablespace level might be more straightforward. To avoid messing up the pages in shared buffers we'd perhaps need something like several shared buffer pools - each with either its own blocksize or associated with a (set of) tablespace(s). Obviously this sort of thing has a pretty big architecture/code impact, probably better to consider a 1st iteration with it being initdb specifiable only (as that would still be very convenient)! Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgresql.auto.conf and reload
On 09/07/14 05:13, Josh Berkus wrote: On 07/06/2014 01:27 AM, Christoph Berg wrote: Another could be that during initdb all the uncommented settings be written to postgresql.auto.conf file rather than to postgresql.conf. I think we can do this by changing code in initdb.c-setup_config(). This will ensure that unless user is changing settings in a mixed way (some by Alter System and some manually by editing postgresql.conf), there will no such problem. There is no reasonable way for us to prevent issues for users who are manually changing both pg.conf and pg.auto.conf. Users should stick to one or the other method of management (or, thirdly, using conf.d); if they mix methods, we can't prevent confusion at restart time and we shouldn't try. Yes, but even well behaved users will see this type of error, because initdb uncomments certain values (ones that are dead certs for being changed via ALTER SYSTEM subsequently like shared_buffers), and then - bang! your next reload gets that your postgresql.conf contains errors message. Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
On 01/07/14 23:25, Heikki Linnakangas wrote: On 07/01/2014 01:08 PM, Andres Freund wrote: Hi, Over at -performance Mark Kirkwood tested a recent version of this (http://archives.postgresql.org/message-id/53B283F3.7020005%40catalyst.net.nz) . I thought it's interesting to add the numbers to this thread: Test: pgbench Options: scale 500 read only Os: Ubuntu 14.04 Pg: 9.3.4 Pg Options: max_connections = 200 shared_buffers = 10GB maintenance_work_mem = 1GB effective_io_concurrency = 10 wal_buffers = 32MB checkpoint_segments = 192 checkpoint_completion_target = 0.8 Results Clients | 9.3 tps 32 cores | 9.3 tps 60 cores +--+- 6 | 70400 | 71028 12 | 98918 | 129140 24 | 230345 | 240631 48 | 324042 | 409510 96 | 346929 | 120464 192 | 312621 | 92663 So we have anti scaling with 60 cores as we increase the client connections. Ouch! A level of urgency led to trying out Andres's 'rwlock' 9.4 branch [1] - cherry picking the last 5 commits into 9.4 branch and building a package from that and retesting: Clients | 9.4 tps 60 cores (rwlock) +-- 6 | 70189 12 | 128894 24 | 233542 48 | 422754 96 | 590796 192 | 630672 Now, this is a bit of a skewed comparison due to 9.4 vs. 9.3 but still interesting. It looks like the issue I reported here: http://www.postgresql.org/message-id/5190e17b.9060...@vmware.com fixed by this commit: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b03d196be055450c7260749f17347c2d066b4254. So, definitely need to compare plain 9.4 vs patched 9.4, not 9.3. Here's plain 9.4 vs patched 9.4: Clients | 9.4 tps 60 cores | 9.4 tps 60 cores (rwlock) +--+-- 6 | 69490 | 70189 12 | 128200 | 128894 24 | 232243 | 233542 48 | 417689 | 422754 96 | 464037 | 590796 192 | 418252 | 630672 It appears that plain 9.4 does not exhibit the dramatic anti scaling that 9.3 showed, but there is still evidence of some contention in the higher client numbers, and we peak at the 96 client mark. The patched variant looks pretty much free from this, still scaling at 192 connections (might have been interesting to try more, but had max_connections set to 200)! Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On 07/05/14 17:35, Peter Geoghegan wrote: On Tue, May 6, 2014 at 10:20 PM, Simon Riggs si...@2ndquadrant.com wrote: On 6 May 2014 23:47, Josh Berkus j...@agliodbs.com wrote: If you're going to make an argument in favor of different tuning advice, then do it based on something in which you actually believe, based on hard evidence. The proposed default setting of 4x shared_buffers is unprincipled *and* lacks hard evidence from you and everybody else. +1. In my view, we probably should have set it to a much higher absolute default value. The main problem with setting it to any multiple of shared_buffers that I can see is that shared_buffers is a very poor proxy for what effective_cache_size is supposed to represent. In general, the folk wisdom around sizing shared_buffers has past its sell-by date. +1. ISTM the only sensible approach to auto tune this requires us to have a plugin to detect how much RAM the system has (and then setting it to 1/2 that say). I wonder if it might be worthwhile writing plugins for the handful of popular platforms. For the remainder maybe we could leave it defaulting to the current (small) value, and encourage volunteers to code the missing ones if they want something better. Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
On 05/05/14 15:22, Amit Kapila wrote: Here what I could understand is that sum of cost_limit for all autovacuum workers should never exceed the value of autovacuum_vacuum_cost_limit which seems to be always the case in current code but same is not true for proposed patch. Right, but have a look at the 1st message in this thread - the current behavior (and to a large extent the above condition) means that setting cost limits per table does not work in any remotely intuitive way. ITSM that the whole purpose of a per table setting in this context is to override the behavior of auto vacuum throttling - and currently this does not happen unless you get real brutal (i.e setting the cost delay to zero in addition to setting cost limit...making the whole cost limit a bit pointless). regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
On 06/05/14 16:28, Amit Kapila wrote: On Mon, May 5, 2014 at 11:57 AM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 05/05/14 15:22, Amit Kapila wrote: Right, but have a look at the 1st message in this thread - the current behavior (and to a large extent the above condition) means that setting cost limits per table does not work in any remotely intuitive way. ITSM that the whole purpose of a per table setting in this context is to override the behavior of auto vacuum throttling - and currently this does not happen unless you get real brutal (i.e setting the cost delay to zero in addition to setting cost limit...making the whole cost limit a bit pointless). I think meaning of per table setting is just that it overrides the default value of autovacuum_vacuum_cost_limit for that table and the rest of calculation or concept remains same. This is what currently code does and the same is mentioned in docs as far as I can understand. As per current behaviour the per-table cost_limit is also adjusted based on the setting of GUC autovacuum_vacuum_cost_limit and right now it follows a simple principle that the total cost limit for all workers should be equal to autovacuum_vacuum_cost_limit. Even code has below comment: /* * Adjust cost limit of each active worker to balance the total of cost * limit to autovacuum_vacuum_cost_limit. */ Now If you want to change for the case where user specifies value per table which is more than autovacuum_vacuum_cost_limit or otherwise, then I think the new definition should be bit more clear and it is better not to impact current calculation for default values. I could think of 2 ways to change this: a. if user has specified cost_limit value for table, then it just uses it rather than rebalancing based on value of system-wide guc variable autovacuum_vacuum_cost_limit b. another could be to restrict setting per-table value to be lesser than system-wide value? The former is used for auto vacuum parameters like scale_factor and later is used for parameters like freeze_max_age. Thoughts? Alvaro, do you think above options makes sense to solve this problem? Yes indeed - the code currently working differently from what one would expect. However the usual reason for handing knobs to the user for individual object is so that special configurations can be applied to them. The current method of operation of the per table knobs does not do this (not without clubbing 'em on the head) The (ahem) sensible way that one would expect (perhaps even need) autovacuum throttling to work is: - set sensible defaults for all the usual (well behaved) tables - set a few really aggressive overrides for a handful of the naughty ones Runaway free space bloat is one of the things that can really mangle a postgres system (I've been called in to rescue a few in my time)... there needs to be a way to control those few badly behaved tables ... without removing the usefulness of throttling the others. Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 Proposal: Initdb creates a single table
This seems like a much better idea - whereas a single table, related to nothing - on the other hand, is at best not very helpful (and it could be argued, might contribute to teaching poor data data design). Regards Mark On 23/04/14 19:13, Pavel Stehule wrote: Hello if you are thinking about this direction, then store there some demo project. I am don't think so isolated table has significant price. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers