Re: [HACKERS] Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions
Hello, > Agree that windowing function will return all the rows compared to max and > group by returing only max rows per group. But even while arriving at the > aggregate/sorting windowing function seems to spend more effort than group > by/order by. (I'll apologise in advance for possible misreading..) The most cause of the difference in time comes from sorting. Over 90% of total execution time has elapsed while sorting (49ms->2733ms) for the one using windowing function. If this sort were useless, the execution time would be less than 300 ms - seems comparable enough to group-by query. | Subquery Scan on __unnamed_subquery_0 | (actual time=2606.075..2953.937 rows=558 loops=1) | Filter: (__unnamed_subquery_0.rn = 1) | -> WindowAgg (actual time=2606.063..2928.061 rows=122880 loops=1) | -> Sort (actual time=2606.020..2733.677 rows=122880 loops=1) | Sort Key: student_score.course, student_score.score | -> Seq Scan on student_score | (actual time=0.009..49.026 rows=122880 loops=1) As you see in above plan, sorting key is (course, score). If your point is the overall performance but not reusing a kind of 'hash', there's a big chance to eliminate this sorting if you are able to have an additional index, say, =# create index idx_co_sc on student_score using btree (course, score); With this index, you will get a different plan like this, > uniontest=# explain analyze select student_name from (select student_name, > dense_rank() over(partition by course order by score) rn, score from > student_score) rnn where rn=2; > QUERY PLAN > --- > Subquery Scan on rnn (actual time=0.088..319.403 rows=135 loops=1) >Filter: (rnn.rn = 2) >Rows Removed by Filter: 122746 >-> WindowAgg (actual time=0.037..296.851 rows=122881 loops=1) > -> Index Scan using idx_co_sc on student_score >(actual time=0.027..111.333 rows=122881 loops=1) > Total runtime: 319.483 ms Does this satisfies your needs? === > Another thing, (I may be stupid and naive here) does PostgreSQL > re-uses the hash which has been already created for sort. In > this case the inner query must have created a hash for windoing > aggregate. Can't we use that same one while applying the the > filter "rn=1" ? Generally saying, hashes cannot yield ordered output by its nature, I believe. Windowing function (execnode) always receives tuples sequentially in the window-defined order (as you see in the explained plan above) then processes the tuples in semi tuple-by-tuple manner to perform per-frame aggregaion, and finally outputs tuples of the same number to input. And furthermore, dense_rank() doesn't even need per-frame aggregations. So none of the planners so far seems to have chance to use a kind of hash tables to culculate/execute windowing fucntions. On the another point, automatically preserving some internal data within a query beyond the end of the query brings in 'when to discard it?' problem. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Fri, Oct 25, 2013 at 5:57 AM, Magnus Hagander wrote: > In fact I've been considering suggesting we might want to retire the > difference between archive and hot_standby as wal_level, because the > difference is usually so small. And the advantage of hot_standby is in > almost every case worth it. Even in the archive recovery mode, being > able to do pause_at_recovery_target is extremely useful. And as you > say in (c) above, many users don't realize that until it's too late. +1 on removing archive from wal_level. Having both archive and hot_standby for wal_level is confusing, and if I recall correctly hot_standby and archive have been kept as possible settings only to protect people from bugs that the newly-introduced hot_standby could introduce due to the few WAL records it adds. But it has been a couple of releases since there have been no such bugs, no? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On 2013-10-24 17:17:22 -0700, Josh Berkus wrote: > On 10/24/2013 04:55 PM, Robert Haas wrote: > > On Thu, Oct 24, 2013 at 1:09 PM, Josh Berkus wrote: > >> On 10/23/2013 09:58 PM, Amit Kapila wrote: > >>> I wonder why anyone would like to freeze during CLUSTER command when > >>> they already have separate way (VACUUM FREEZE) to achieve it, do you > >>> know or can think of any case where user wants to do it along with > >>> Cluster command? > >> > >> "If I'm rewriting the table anyway, let's freeze it". > >> > >> Otherwise, you have to write the same pages twice, if both CLUSTER and > >> FREEZE are required. > > > > I wonder if we should go so far as to make this the default behavior, > > instead of just making it an option. > > +1 from me. Can you think of a reason you *wouldn't* want to freeze? It makes content from the future appear when you start using the relation in a query/session with an older snapshot. Currently CLUSTER is safe against that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regress tests to improve the function coverage of schemacmds and user and tablespace files
Hi, On Thu, Oct 24, 2013 at 5:59 PM, Haribabu kommi wrote: > Here I have added some more regression tests to improve the missing function > coverage of schemacmds.c, user.c and tablespace.c. > The added tests are mainly RENAME TO and OWNER TO support. Could you add those patches to the next commitfest such as they don't get lost in the flow? Here is a URL to it: https://commitfest.postgresql.org/action/commitfest_view?id=20 Note that you will need a community account to register your patches. Thanks, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On Fri, Oct 25, 2013 at 4:29 AM, Thomas Munro wrote: > On 24 October 2013 05:58, Amit Kapila wrote: >> >> On Thu, Oct 24, 2013 at 4:58 AM, Thomas Munro wrote: >> > Hi >> > I noticed that CLUSTER doesn't have a FREEZE option. Here is a patch to >> > add >> > that, for consistency with VACUUM. Is it useful? >> >> I wonder why anyone would like to freeze during CLUSTER command when >> they already have separate way (VACUUM FREEZE) to achieve it, do you >> know or can think of any case where user wants to do it along with >> Cluster command? > > > As others have said, the goal is to freeze and cluster in a single step. > You can already do that if you know how things work under the covers with: > > SET vacuum_freeze_min_age = 0; > CLUSTER my_table; True, but in that case why don't we just mention above in the documentation of CLUSTER command, so that if user wants to freeze along with Cluster, he can use above way to Cluster. Some of the reason's, I could think of adding FREEZE as an option are: a. it's more explicit and easy to use for user. b. if by chance underlying mechanism changes (which is less likely) then above way of doing Cluster might not result into freeze. > This patch lets you say CLUSTER FREEZE instead. It mirrors VACUUM, which > can freeze tuples based on the GUC and tuple age or the FREEZE keyword. > >> >> Anyway code side, I think you need to set both feeze_min_age as well >> as freeze_table_age, see VACUUM command in gram.y > > > Ok, I attach a new version that is more like VACUUM in gram.y. (Although > I'm not sure why it isn't a single boolean flag). The reason of separate flags is that both are used to decide different things, freeze_min_age - this is used to decide the cutoff_xid, based on which FrozenTransactionId will be placed on tuple. freeze_table_age - used do decide, whether to scan all pages of a relation in Vacuum and in Cluster command it is ignored as it needs to scan all pages anyway. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On Thu, Oct 24, 2013 at 10:39 PM, Josh Berkus wrote: > On 10/23/2013 09:58 PM, Amit Kapila wrote: >> I wonder why anyone would like to freeze during CLUSTER command when >> they already have separate way (VACUUM FREEZE) to achieve it, do you >> know or can think of any case where user wants to do it along with >> Cluster command? > > "If I'm rewriting the table anyway, let's freeze it". So do you think that other places where we are rewriting the table with exclusive lock, we should have such mechanism or option and even if it is not there, it is kind of Todo item and tomorrow someone can write a patch to improve that situation. > Otherwise, you have to write the same pages twice, if both CLUSTER and > FREEZE are required. Yes, this is completely right and I understand this point, but the question I had in my mind before writing my last mail was that whether any or all places having this concept deserves to have an option like FREEZE? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Thu, Oct 24, 2013 at 8:37 PM, Robert Haas wrote: > On Mon, Oct 21, 2013 at 11:52 PM, Fujii Masao wrote: >> So, our consensus is to introduce the hooks for FPW compression so that >> users can freely select their own best compression algorithm? >> Also, probably we need to implement at least one compression contrib module >> using that hook, maybe it's based on pglz or snappy. > > I don't favor making this pluggable. I think we should pick snappy or > lz4 (or something else), put it in the tree, and use it. The reason why the discussion went towards making it pluggable (or at least what made me to think like that) was because of below reasons: a. what somebody needs to do to make snappy or lz4 in the tree, is it only performance/compression data for some of the scenario's or some other legal stuff as well, if it is only performance/compression then what will be the scenario's (is pgbench sufficient?). b. there can be cases where one or the other algorithm can be better or not doing compression is better. For example in one of the other patches where we were trying to achieve WAL reduction in Update operation (http://www.postgresql.org/message-id/8977cb36860c5843884e0a18d8747b036b9a4...@szxeml558-mbs.china.huawei.com), Heikki has came up with a test (where data is not much compressible), in such a case, the observation was that LZ was better than native compression method used in that patch and Snappy was better than LZ and not doing compression could be considered preferable in such a scenario because all the algorithm's were reducing TPS for that case. Now I think it is certainly better if we could choose one of the algorithms (snappy or lz4) and test them for most used scenario's for compression and performance and call it done, but I think giving at least an option to user to make compression altogether off should be still considered. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Move rmtree() from libpgport to libpgcommon
Peter Eisentraut writes: > Move rmtree() from libpgport to libpgcommon This patch leaves dirmod.c entirely empty on non-Windows platforms. At least on my OS X Lion laptop, that results in some bleating: /usr/bin/ranlib: file: libpgport.a(dirmod.o) has no symbols /usr/bin/ranlib: file: libpgport_srv.a(dirmod_srv.o) has no symbols Can we do something about that? Perhaps not build this file on non-Windows? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
Robert Haas writes: > I wonder if we should go so far as to make this the default behavior, > instead of just making it an option. In that case you'd have to invent a NOFREEZE keyword, no? Ick. In any case, it's very far from obvious to me that CLUSTER ought to throw away information by default, which is what you're proposing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Fri, Oct 25, 2013 at 5:57 AM, Magnus Hagander wrote: > On Thu, Oct 24, 2013 at 10:51 PM, Josh Berkus wrote: >> On 10/24/2013 01:14 PM, Heikki Linnakangas wrote: >> >> I think it would be worth estimating what this actually looks like in >> terms of log write quantity. My inclication is to say that if it >> increases log writes less than 10%, we don't need to provide an option >> to turn it off. >> >> The reasons I don't want to provide a disabling GUC are: >> a) more GUCs >> b) confusing users >> c) causing users to disable rewind *until they need it*, at which point >> it's too late to enable it. >> >> So if there's any way we can avoid having a GUC for this, I'm for it. >> And if we do have a GUC, failback should be enabled by default. > > +1 on the principle. > > In fact I've been considering suggesting we might want to retire the > difference between archive and hot_standby as wal_level, because the > difference is usually so small. And the advantage of hot_standby is in > almost every case worth it. Even in the archive recovery mode, being > able to do pause_at_recovery_target is extremely useful. And as you > say in (c) above, many users don't realize that until it's too late. > +1. Many user would not realize that it is too late If we will provide it as additional GUC. And I agree with writing log including the block number of the changed block. I think that writing log is not lead huge overhead increase. Is those WAL record replicated to the standby server in synchronous ( of course when configuring sync replication)? I am concerned that it lead performance overhead with such as executing SELECT or auto vacuum. especially, when two servers are in far location. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reasons not to like asprintf
Robert Haas writes: > On Thu, Oct 24, 2013 at 2:18 PM, Peter Eisentraut wrote: >> While this is attractive, the same logic would suggest that we rename >> pg_malloc() to palloc(), and that sounds wrong. The frontend and >> backend functions do have different freeing semantics. > I'd almost be inclined to go the other way and suggest that we try to > use the pg_ prefix more, at least for things to be shared between > front and back end code. Meh. I think that mainly promotes carpal tunnel syndrome. The place for a pg_ prefix is in functions we intend to expose to the "outside world", such as functions exposed by libpq. But these are not that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On 10/24/2013 04:55 PM, Robert Haas wrote: > On Thu, Oct 24, 2013 at 1:09 PM, Josh Berkus wrote: >> On 10/23/2013 09:58 PM, Amit Kapila wrote: >>> I wonder why anyone would like to freeze during CLUSTER command when >>> they already have separate way (VACUUM FREEZE) to achieve it, do you >>> know or can think of any case where user wants to do it along with >>> Cluster command? >> >> "If I'm rewriting the table anyway, let's freeze it". >> >> Otherwise, you have to write the same pages twice, if both CLUSTER and >> FREEZE are required. > > I wonder if we should go so far as to make this the default behavior, > instead of just making it an option. +1 from me. Can you think of a reason you *wouldn't* want to freeze? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On Thu, Oct 24, 2013 at 1:09 PM, Josh Berkus wrote: > On 10/23/2013 09:58 PM, Amit Kapila wrote: >> I wonder why anyone would like to freeze during CLUSTER command when >> they already have separate way (VACUUM FREEZE) to achieve it, do you >> know or can think of any case where user wants to do it along with >> Cluster command? > > "If I'm rewriting the table anyway, let's freeze it". > > Otherwise, you have to write the same pages twice, if both CLUSTER and > FREEZE are required. I wonder if we should go so far as to make this the default behavior, instead of just making it an option. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On 24 October 2013 05:58, Amit Kapila wrote: > On Thu, Oct 24, 2013 at 4:58 AM, Thomas Munro wrote: > > Hi > > I noticed that CLUSTER doesn't have a FREEZE option. Here is a patch to > add > > that, for consistency with VACUUM. Is it useful? > > I wonder why anyone would like to freeze during CLUSTER command when > they already have separate way (VACUUM FREEZE) to achieve it, do you > know or can think of any case where user wants to do it along with > Cluster command? > As others have said, the goal is to freeze and cluster in a single step. You can already do that if you know how things work under the covers with: SET vacuum_freeze_min_age = 0; CLUSTER my_table; This patch lets you say CLUSTER FREEZE instead. It mirrors VACUUM, which can freeze tuples based on the GUC and tuple age or the FREEZE keyword. > Anyway code side, I think you need to set both feeze_min_age as well > as freeze_table_age, see VACUUM command in gram.y > Ok, I attach a new version that is more like VACUUM in gram.y. (Although I'm not sure why it isn't a single boolean flag). Thanks, Thomas Munro cluster-freeze-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Location for external scripts for Extensions?
On 10/24/2013 02:36 PM, Peter Eisentraut wrote: > On 10/22/13, 2:27 PM, Josh Berkus wrote: >> pg_partman has several external (python) scripts which help the >> extension, located in /extras/ in its source. The problem currently is >> that if you install pg_partman via pgxn or package, you don't get those >> scripts, because there's no "install" location for them. > > Use the SCRIPTS variable in pgxs, and they will get installed. > Oh yeah? Cool. Is there an existing extension with an example of this? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Location for external scripts for Extensions?
On 10/22/13, 2:27 PM, Josh Berkus wrote: > pg_partman has several external (python) scripts which help the > extension, located in /extras/ in its source. The problem currently is > that if you install pg_partman via pgxn or package, you don't get those > scripts, because there's no "install" location for them. Use the SCRIPTS variable in pgxs, and they will get installed. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Thu, Oct 24, 2013 at 10:51 PM, Josh Berkus wrote: > On 10/24/2013 01:14 PM, Heikki Linnakangas wrote: >> One extra WAL record whenever a hint bit is set on a page, for the first >> time after a checkpoint. In other words, a WAL record needs to be >> written in the same circumstances as with page checksums, but the WAL >> records are much smaller as they don't need to contain a full page >> image, just the block number of the changed block. >> >> Or maybe we'll write the full page image after all, like with page >> checksums, just without calculating the checksums. It might be tricky to >> skip the full-page image, because then a subsequent change of the page >> (which isn't just a hint-bit update) needs to somehow know it needs to >> take a full page image even though a WAL record for it was already written. > > I think it would be worth estimating what this actually looks like in > terms of log write quantity. My inclication is to say that if it > increases log writes less than 10%, we don't need to provide an option > to turn it off. > > The reasons I don't want to provide a disabling GUC are: > a) more GUCs > b) confusing users > c) causing users to disable rewind *until they need it*, at which point > it's too late to enable it. > > So if there's any way we can avoid having a GUC for this, I'm for it. > And if we do have a GUC, failback should be enabled by default. +1 on the principle. In fact I've been considering suggesting we might want to retire the difference between archive and hot_standby as wal_level, because the difference is usually so small. And the advantage of hot_standby is in almost every case worth it. Even in the archive recovery mode, being able to do pause_at_recovery_target is extremely useful. And as you say in (c) above, many users don't realize that until it's too late. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reasons not to like asprintf
On Thu, Oct 24, 2013 at 2:18 PM, Peter Eisentraut wrote: > On 10/22/13, 3:40 PM, Tom Lane wrote: >> In order to avoid having to clutter stuff like that with #ifdef FRONTENDs, >> I'm now thinking we should use exactly the same names for the frontend and >> backend versions, ie psprintf() and pvsprintf(). The main reason for >> considering a pg_ prefix for the frontend versions was to avoid cluttering >> application namespace; but it's already the case that we don't expect >> libpgcommon to be namespace clean. > > While this is attractive, the same logic would suggest that we rename > pg_malloc() to palloc(), and that sounds wrong. The frontend and > backend functions do have different freeing semantics. I'd almost be inclined to go the other way and suggest that we try to use the pg_ prefix more, at least for things to be shared between front and back end code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On 10/24/2013 01:14 PM, Heikki Linnakangas wrote: > One extra WAL record whenever a hint bit is set on a page, for the first > time after a checkpoint. In other words, a WAL record needs to be > written in the same circumstances as with page checksums, but the WAL > records are much smaller as they don't need to contain a full page > image, just the block number of the changed block. > > Or maybe we'll write the full page image after all, like with page > checksums, just without calculating the checksums. It might be tricky to > skip the full-page image, because then a subsequent change of the page > (which isn't just a hint-bit update) needs to somehow know it needs to > take a full page image even though a WAL record for it was already written. I think it would be worth estimating what this actually looks like in terms of log write quantity. My inclication is to say that if it increases log writes less than 10%, we don't need to provide an option to turn it off. The reasons I don't want to provide a disabling GUC are: a) more GUCs b) confusing users c) causing users to disable rewind *until they need it*, at which point it's too late to enable it. So if there's any way we can avoid having a GUC for this, I'm for it. And if we do have a GUC, failback should be enabled by default. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On 24.10.2013 23:07, Josh Berkus wrote: On 10/24/2013 11:12 AM, Heikki Linnakangas wrote: On 24.10.2013 20:39, Josh Berkus wrote: On 10/24/2013 04:15 AM, Pavan Deolasee wrote: If we do what you are suggesting, it seems like a single line patch to me. In XLogSaveBufferForHint(), we probably need to look at this additional GUC to decide whether or not to backup the block. Wait, what? Why are we having an additional GUC? I'm opposed to the idea of having a GUC to enable failback. When would anyone using replication ever want to disable that? For example, if you're not replicating for high availability purposes, but to keep a reporting standby up-to-date. What kind of overhead are we talking about here? You probably said, but I've had a mail client meltdown and lost a lot of my -hackers emails. One extra WAL record whenever a hint bit is set on a page, for the first time after a checkpoint. In other words, a WAL record needs to be written in the same circumstances as with page checksums, but the WAL records are much smaller as they don't need to contain a full page image, just the block number of the changed block. Or maybe we'll write the full page image after all, like with page checksums, just without calculating the checksums. It might be tricky to skip the full-page image, because then a subsequent change of the page (which isn't just a hint-bit update) needs to somehow know it needs to take a full page image even though a WAL record for it was already written. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On 10/24/2013 11:12 AM, Heikki Linnakangas wrote: > On 24.10.2013 20:39, Josh Berkus wrote: >> On 10/24/2013 04:15 AM, Pavan Deolasee wrote: >>> If we do what you are suggesting, it seems like a single line patch >>> to me. >>> In XLogSaveBufferForHint(), we probably need to look at this >>> additional GUC >>> to decide whether or not to backup the block. >> >> Wait, what? Why are we having an additional GUC? >> >> I'm opposed to the idea of having a GUC to enable failback. When would >> anyone using replication ever want to disable that? > > For example, if you're not replicating for high availability purposes, > but to keep a reporting standby up-to-date. What kind of overhead are we talking about here? You probably said, but I've had a mail client meltdown and lost a lot of my -hackers emails. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] missing locking in at least INSERT INTO view WITH CHECK
On 24 October 2013 18:28, Andres Freund wrote: > On 2013-10-23 21:20:58 +0100, Dean Rasheed wrote: >> On 23 October 2013 21:08, Andres Freund wrote: >> > On 2013-10-23 20:51:27 +0100, Dean Rasheed wrote: >> >> Hmm, my first thought is that rewriteTargetView() should be calling >> >> AcquireRewriteLocks() on viewquery, before doing too much with it. >> >> There may be sub-queries in viewquery's quals (and also now in its >> >> targetlist) and I don't think the relations referred to by those >> >> sub-queries are getting locked. >> > >> > Well, that wouldn't follow the currently documented rule ontop >> > of QueryRewrite: >> > * NOTE: the parsetree must either have come straight from the parser, >> > * or have been scanned by AcquireRewriteLocks to acquire suitable locks. >> > >> > It might still be the right thing to do, but it seems suspicious that >> > the rules need to be tweaked like that. >> > >> >> Well it matches what already happens in other places in the rewriter >> --- see rewriteRuleAction() and ApplyRetrieveRule(). It's precisely >> because the rule action's query hasn't come from the parser that it >> needs to be processed in this way. > > I really don't know that are of code that well, fortunately I never had > to wade around it much so far... > > Reading your explanation and a bit of the code your plan sound sane. Are > you going to propose a patch? > I thought about making rewriteTargetView() call AcquireRewriteLocks(), but on closer inspection I think that is probably over the top. 90% of the code in AcquireRewriteLocks() is to process the query's RTEs, but rewriteTargetView() is already locking the single RTE in viewquery anyway. Also AcquireRewriteLocks() would acquire the wrong kind of lock on that RTE, since viewquery is a SELECT, but we want an exclusive lock because the RTE is about to become the outer query's result relation. AcquireRewriteLocks() also processes CTEs, but we know that we won't have any of those in rewriteTargetView(). So I think the only thing missing from rewriteTargetView() is to lock any relations from any sublink subqueries in viewquery using acquireLocksOnSubLinks() -- patch attached. Regards, Dean diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c new file mode 100644 index c52a374..e6a9e7b *** a/src/backend/rewrite/rewriteHandler.c --- b/src/backend/rewrite/rewriteHandler.c *** rewriteTargetView(Query *parsetree, Rela *** 2589,2594 --- 2589,2604 heap_close(base_rel, NoLock); /* + * If the view query contains any sublink subqueries, we should also + * acquire locks on any relations they refer to. We know that there won't + * be any subqueries in the range table or CTEs, so we can skip those, as + * in AcquireRewriteLocks. + */ + if (viewquery->hasSubLinks) + query_tree_walker(viewquery, acquireLocksOnSubLinks, NULL, + QTW_IGNORE_RC_SUBQUERIES); + + /* * Create a new target RTE describing the base relation, and add it to the * outer query's rangetable. (What's happening in the next few steps is * very much like what the planner would do to "pull up" the view into the -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 regression
On Thu, Oct 24, 2013 at 5:41 AM, Thom Brown wrote: > On 5 September 2013 22:24, Bruce Momjian wrote: >> On Mon, Aug 19, 2013 at 09:27:57PM -0400, Stephen Frost wrote: >>> * Andres Freund (and...@2ndquadrant.com) wrote: >>> > I vote for adapting the patch to additionally zero out the file via >>> > write(). In your tests that seemed to perform at least as good as the >>> > old method... It also has the advantage that we can use it a littlebit >>> > more as a testbed for possibly using it for heap extensions one day. >>> > We're pretty early in the cycle, so I am not worried about this too >>> > much... >>> >>> I dunno, I'm pretty disappointed that this doesn't actually improve >>> things. Just following this casually, it looks like it might be some >>> kind of locking issue in the kernel that's causing it to be slower; or >>> at least some code path that isn't exercise terribly much and therefore >>> hasn't been given the love that it should. >>> >>> Definitely interested in what Ts'o says, but if we can't figure out why >>> it's slower *without* writing out the zeros, I'd say we punt on this >>> until Linux and the other OS folks improve the situation. >> >> FYI, the patch has been reverted. > > Is there an updated patch available for this? And did anyone hear from Ts'o? After the patch was reverted, it was not re-submitted. I have tried 3 or 4 times to get more info out of Ts'o , without luck. -- Jon -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deprecations in authentication
On 10/24/13, 2:37 PM, Magnus Hagander wrote: > They're not causing compiler warnings when you just build with gssapi, > correct? Only if you enable the native krb5? Well, actually I was just about to reply that gssapi is also deprecated. They want you to use some framework instead. That's something we'll have to look into at some point, if we want to support gssapi on this platform in the future. The issue about removing krb5 is valid independent of this, I think. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deprecations in authentication
On Thu, Oct 24, 2013 at 8:35 PM, Peter Eisentraut wrote: > On 10/18/12, 7:20 AM, Magnus Hagander wrote: >> 1. krb5 authentication. We've had gssapi since 8.3 (which means in all >> supported versions). krb5 has been deprecated, also since 8.3. Time to >> remove it? > > OS X Mavericks has now marked just about everything in krb5.h as > deprecated, leading to compiler warnings. Which reminded me of this > thread. Maybe it's time. Yeah, it's still sitting on my TODO to get done for 9.4. I guess that's another reason... They're not causing compiler warnings when you just build with gssapi, correct? Only if you enable the native krb5? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deprecations in authentication
On 10/18/12, 7:20 AM, Magnus Hagander wrote: > 1. krb5 authentication. We've had gssapi since 8.3 (which means in all > supported versions). krb5 has been deprecated, also since 8.3. Time to > remove it? OS X Mavericks has now marked just about everything in krb5.h as deprecated, leading to compiler warnings. Which reminded me of this thread. Maybe it's time. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reasons not to like asprintf
Peter Eisentraut writes: > On 10/22/13, 3:40 PM, Tom Lane wrote: >> In order to avoid having to clutter stuff like that with #ifdef FRONTENDs, >> I'm now thinking we should use exactly the same names for the frontend and >> backend versions, ie psprintf() and pvsprintf(). The main reason for >> considering a pg_ prefix for the frontend versions was to avoid cluttering >> application namespace; but it's already the case that we don't expect >> libpgcommon to be namespace clean. > While this is attractive, the same logic would suggest that we rename > pg_malloc() to palloc(), and that sounds wrong. The frontend and > backend functions do have different freeing semantics. We already crossed that bridge, though, by defining "palloc" in frontend environments to mean pg_malloc. I'm doubtful that insisting on different names is going to result in anything except #ifdef clutter. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reasons not to like asprintf
On 10/22/13, 3:40 PM, Tom Lane wrote: > In order to avoid having to clutter stuff like that with #ifdef FRONTENDs, > I'm now thinking we should use exactly the same names for the frontend and > backend versions, ie psprintf() and pvsprintf(). The main reason for > considering a pg_ prefix for the frontend versions was to avoid cluttering > application namespace; but it's already the case that we don't expect > libpgcommon to be namespace clean. While this is attractive, the same logic would suggest that we rename pg_malloc() to palloc(), and that sounds wrong. The frontend and backend functions do have different freeing semantics. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On 24.10.2013 20:39, Josh Berkus wrote: On 10/24/2013 04:15 AM, Pavan Deolasee wrote: If we do what you are suggesting, it seems like a single line patch to me. In XLogSaveBufferForHint(), we probably need to look at this additional GUC to decide whether or not to backup the block. Wait, what? Why are we having an additional GUC? I'm opposed to the idea of having a GUC to enable failback. When would anyone using replication ever want to disable that? For example, if you're not replicating for high availability purposes, but to keep a reporting standby up-to-date. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On 10/24/2013 04:15 AM, Pavan Deolasee wrote: > If we do what you are suggesting, it seems like a single line patch to me. > In XLogSaveBufferForHint(), we probably need to look at this additional GUC > to decide whether or not to backup the block. Wait, what? Why are we having an additional GUC? I'm opposed to the idea of having a GUC to enable failback. When would anyone using replication ever want to disable that? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] missing locking in at least INSERT INTO view WITH CHECK
On 2013-10-23 21:20:58 +0100, Dean Rasheed wrote: > On 23 October 2013 21:08, Andres Freund wrote: > > On 2013-10-23 20:51:27 +0100, Dean Rasheed wrote: > >> Hmm, my first thought is that rewriteTargetView() should be calling > >> AcquireRewriteLocks() on viewquery, before doing too much with it. > >> There may be sub-queries in viewquery's quals (and also now in its > >> targetlist) and I don't think the relations referred to by those > >> sub-queries are getting locked. > > > > Well, that wouldn't follow the currently documented rule ontop > > of QueryRewrite: > > * NOTE: the parsetree must either have come straight from the parser, > > * or have been scanned by AcquireRewriteLocks to acquire suitable locks. > > > > It might still be the right thing to do, but it seems suspicious that > > the rules need to be tweaked like that. > > > > Well it matches what already happens in other places in the rewriter > --- see rewriteRuleAction() and ApplyRetrieveRule(). It's precisely > because the rule action's query hasn't come from the parser that it > needs to be processed in this way. I really don't know that are of code that well, fortunately I never had to wade around it much so far... Reading your explanation and a bit of the code your plan sound sane. Are you going to propose a patch? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sigh, my old HPUX box is totally broken by DSM patch
On Thu, Oct 24, 2013 at 1:13 PM, Tom Lane wrote: > Stephen Frost writes: >> * Robert Haas (robertmh...@gmail.com) wrote: >>> On Wed, Oct 23, 2013 at 11:35 AM, Stephen Frost wrote: Would this make sense as a configure-time check, rather than initdb, to try blocking SIGSYS and checking for an ENOSYS from shm_open()? Seems preferrable to do that in a configure check rather than initdb. > >>> I don't see why. It's a run-time behavior; the build system may not >>> be where the binaries will ultimately run. > >> I suppose, just need to be more cautious when blocking signals in initdb >> than in a configure-time check, of course. > > Indeed, telling initdb to ignore SIGSYS makes it do what we want on > this box: > > $ git diff > diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c > index > 3983b23731330b78a66a74d14faaf76f7aff85c2..05252df869d128ac2cf3b1c48c6259d6d95b0ffc > 100644 > *** a/src/bin/initdb/initdb.c > --- b/src/bin/initdb/initdb.c > *** setup_signals(void) > *** 3197,3202 > --- 3197,3207 > #ifdef SIGPIPE > pqsignal(SIGPIPE, SIG_IGN); > #endif > + > + /* Prevent SIGSYS so we can probe for kernel calls that might not work */ > + #ifdef SIGSYS > + pqsignal(SIGSYS, SIG_IGN); > + #endif > } > > > $ initdb > ... > selecting dynamic shared memory implementation ... sysv > ... > > The above patch ignores SIGSYS throughout initdb. We could narrow the > possible side-effects by only disabling SIGSYS around the shm_open call, > but I'm not sure there's any value in that. It seems likely to me that > the same kind of problem might pop up elsewhere in future, as we try > to make use of other modern kernel facilities. In fact, I can foresee > wanting to run the whole backend this way --- though I'm not proposing > doing so today. > > A bit of googling turned up the following paragraph of rationale in the > POSIX spec (Open Group Base Specifications 2013 edition): > > There is very little that a Conforming POSIX.1 Application can do by > catching, ignoring, or masking any of the signals SIGILL, SIGTRAP, > SIGIOT, SIGEMT, SIGBUS, SIGSEGV, SIGSYS, or SIGFPE. They will generally > be generated by the system only in cases of programming errors. While it > may be desirable for some robust code (for example, a library routine) > to be able to detect and recover from programming errors in other code, > these signals are not nearly sufficient for that purpose. One portable > use that does exist for these signals is that a command interpreter can > recognize them as the cause of termination of a process (with wait()) > and print an appropriate message. > > So in other words, the reason for delivering SIGSYS rather than returning > ENOSYS by default is to make it apparent from the process exit code that > you made an invalid kernel call, should your code be sloppy enough that > it fails to notice and report kernel call failures. This argument doesn't > seem to me to hold a lot of water for Postgres' purposes. > > Comments? Your proposed change to initdb seems fine to me. If we change initdb but not the backend, then somebody could later manually change postgresql.conf to set dynamic_shared_memory_type=posix. When they try to restart the postmaster, it'll die with SIGSYS rather than exiting with a relatively clean error message. However, at the moment, it seems like the only people who are likely to encounter that situation are those who install PostgreSQL 9.4 on very old HP-UX boxen and then change the configuration settings chosen by initdb, and there shouldn't be many such people. Therefore I tend to think that changing initdb is sufficient for now; we can take the risk of changing the backend's handling of SIGSYS if and when it becomes clear that there's enough benefit to doing so to justify the risk. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sigh, my old HPUX box is totally broken by DSM patch
On 2013-10-24 13:13:23 -0400, Tom Lane wrote: > Stephen Frost writes: > The above patch ignores SIGSYS throughout initdb. We could narrow the > possible side-effects by only disabling SIGSYS around the shm_open call, > but I'm not sure there's any value in that. It seems likely to me that > the same kind of problem might pop up elsewhere in future, as we try > to make use of other modern kernel facilities. In fact, I can foresee > wanting to run the whole backend this way --- though I'm not proposing > doing so today. Why not? I don't see the advantage of looking for effects/problems of such a chance twice. I'd also much rather see a wrongly configured postgres fail to start with a legible error message instead of it being killed by a signal. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Thu, Oct 24, 2013 at 12:22:59PM -0400, Robert Haas wrote: > On Thu, Oct 24, 2013 at 11:40 AM, k...@rice.edu wrote: > > On Thu, Oct 24, 2013 at 11:07:38AM -0400, Robert Haas wrote: > >> On Mon, Oct 21, 2013 at 11:52 PM, Fujii Masao > >> wrote: > >> > So, our consensus is to introduce the hooks for FPW compression so that > >> > users can freely select their own best compression algorithm? > >> > Also, probably we need to implement at least one compression contrib > >> > module > >> > using that hook, maybe it's based on pglz or snappy. > >> > >> I don't favor making this pluggable. I think we should pick snappy or > >> lz4 (or something else), put it in the tree, and use it. > >> > > Hi, > > > > My vote would be for lz4 since it has faster single thread compression > > and decompression speeds with the decompression speed being almost 2X > > snappy's decompression speed. The both are BSD licensed so that is not > > an issue. The base code for lz4 is c and it is c++ for snappy. There > > is also a HC (high-compression) varient for lz4 that pushes its compression > > rate to about the same as zlib (-1) which uses the same decompressor which > > can provide data even faster due to better compression. Some more real > > world tests would be useful, which is really where being pluggable would > > help. > > Well, it's probably a good idea for us to test, during the development > cycle, which algorithm works better for WAL compression, and then use > that one. Once we make that decision, I don't see that there are many > circumstances in which a user would care to override it. Now if we > find that there ARE reasons for users to prefer different algorithms > in different situations, that would be a good reason to make it > configurable (or even pluggable). But if we find that no such reasons > exist, then we're better off avoiding burdening users with the need to > configure a setting that has only one sensible value. > > It seems fairly clear from previous discussions on this mailing list > that snappy and lz4 are the top contenders for the position of > "compression algorithm favored by PostgreSQL". I am wondering, > though, whether it wouldn't be better to add support for both - say we > added both to libpgcommon, and perhaps we could consider moving pglz > there as well. That would allow easy access to all of those > algorithms from both front-end and backend-code. If we can make the > APIs parallel, it should very simple to modify any code we add now to > use a different algorithm than the one initially chosen if in the > future we add algorithms to or remove algorithms from the list, or if > one algorithm is shown to outperform another in some particular > context. I think we'll do well to isolate the question of adding > support for these algorithms form the current patch or any other > particular patch that may be on the table, and FWIW, I think having > two leading contenders and adding support for both may have a variety > of advantages over crowning a single victor. > +++1 Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
On Thu, Oct 24, 2013 at 1:00 PM, Andres Freund wrote: > On 2013-10-24 16:06:19 +0300, Heikki Linnakangas wrote: >> On 24.10.2013 09:03, Abhijit Menon-Sen wrote: >> >One open question is what to do about rounding up the size. It should >> >not be necessary, but for the fairly recent bug described at the link >> >in the comment (https://bugzilla.kernel.org/show_bug.cgi?id=56881). I >> >tried it without the rounding-up, and it fails on Ubuntu's 3.5.0-28 >> >kernel (mmap returns EINVAL). >> >> Let's get rid of the rounding. It's clearly a kernel bug, and it shouldn't >> be our business to add workarounds for any kernel bug out there. And the >> worst that will happen if you're running a buggy kernel version is that you >> fall back to not using huge pages (assuming huge_tlb_pages=try). > > But it's a range of relatively popular kernels, that will stay around > for a good while. So I am hesitant to just not do anything about it. The > directory scanning code isn't that bad imo. > > Either way: > I think we should log when we tried to use hugepages but fell back to > plain mmap, currently it's hard to see whether they are used. Logging it might be a good idea, but suppose the systems been running for 6 months and you don't have the startup logs. Might be a good way to have an easy way to discover later what happened back then. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sigh, my old HPUX box is totally broken by DSM patch
Stephen Frost writes: > * Robert Haas (robertmh...@gmail.com) wrote: >> On Wed, Oct 23, 2013 at 11:35 AM, Stephen Frost wrote: >>> Would this make sense as a configure-time check, rather than initdb, to >>> try blocking SIGSYS and checking for an ENOSYS from shm_open()? Seems >>> preferrable to do that in a configure check rather than initdb. >> I don't see why. It's a run-time behavior; the build system may not >> be where the binaries will ultimately run. > I suppose, just need to be more cautious when blocking signals in initdb > than in a configure-time check, of course. Indeed, telling initdb to ignore SIGSYS makes it do what we want on this box: $ git diff diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 3983b23731330b78a66a74d14faaf76f7aff85c2..05252df869d128ac2cf3b1c48c6259d6d95b0ffc 100644 *** a/src/bin/initdb/initdb.c --- b/src/bin/initdb/initdb.c *** setup_signals(void) *** 3197,3202 --- 3197,3207 #ifdef SIGPIPE pqsignal(SIGPIPE, SIG_IGN); #endif + + /* Prevent SIGSYS so we can probe for kernel calls that might not work */ + #ifdef SIGSYS + pqsignal(SIGSYS, SIG_IGN); + #endif } $ initdb ... selecting dynamic shared memory implementation ... sysv ... The above patch ignores SIGSYS throughout initdb. We could narrow the possible side-effects by only disabling SIGSYS around the shm_open call, but I'm not sure there's any value in that. It seems likely to me that the same kind of problem might pop up elsewhere in future, as we try to make use of other modern kernel facilities. In fact, I can foresee wanting to run the whole backend this way --- though I'm not proposing doing so today. A bit of googling turned up the following paragraph of rationale in the POSIX spec (Open Group Base Specifications 2013 edition): There is very little that a Conforming POSIX.1 Application can do by catching, ignoring, or masking any of the signals SIGILL, SIGTRAP, SIGIOT, SIGEMT, SIGBUS, SIGSEGV, SIGSYS, or SIGFPE. They will generally be generated by the system only in cases of programming errors. While it may be desirable for some robust code (for example, a library routine) to be able to detect and recover from programming errors in other code, these signals are not nearly sufficient for that purpose. One portable use that does exist for these signals is that a command interpreter can recognize them as the cause of termination of a process (with wait()) and print an appropriate message. So in other words, the reason for delivering SIGSYS rather than returning ENOSYS by default is to make it apparent from the process exit code that you made an invalid kernel call, should your code be sloppy enough that it fails to notice and report kernel call failures. This argument doesn't seem to me to hold a lot of water for Postgres' purposes. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 10/23/2013 08:52 PM, Alvaro Herrera wrote: > Peter Geoghegan escribió: > >> I am interested in making it store richer statistics, >> provided we're very careful about the costs. Every time those counters >> are incremented, a spinlock is held. > > Hmm, now if we had portable atomic addition, so that we could spare the > spinlock ... Oh, sure, just take the *easy* way out. ;-) -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On 10/23/2013 09:58 PM, Amit Kapila wrote: > I wonder why anyone would like to freeze during CLUSTER command when > they already have separate way (VACUUM FREEZE) to achieve it, do you > know or can think of any case where user wants to do it along with > Cluster command? "If I'm rewriting the table anyway, let's freeze it". Otherwise, you have to write the same pages twice, if both CLUSTER and FREEZE are required. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
On 2013-10-24 16:06:19 +0300, Heikki Linnakangas wrote: > On 24.10.2013 09:03, Abhijit Menon-Sen wrote: > >One open question is what to do about rounding up the size. It should > >not be necessary, but for the fairly recent bug described at the link > >in the comment (https://bugzilla.kernel.org/show_bug.cgi?id=56881). I > >tried it without the rounding-up, and it fails on Ubuntu's 3.5.0-28 > >kernel (mmap returns EINVAL). > > Let's get rid of the rounding. It's clearly a kernel bug, and it shouldn't > be our business to add workarounds for any kernel bug out there. And the > worst that will happen if you're running a buggy kernel version is that you > fall back to not using huge pages (assuming huge_tlb_pages=try). But it's a range of relatively popular kernels, that will stay around for a good while. So I am hesitant to just not do anything about it. The directory scanning code isn't that bad imo. Either way: I think we should log when we tried to use hugepages but fell back to plain mmap, currently it's hard to see whether they are used. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Thu, Oct 24, 2013 at 11:40 AM, k...@rice.edu wrote: > On Thu, Oct 24, 2013 at 11:07:38AM -0400, Robert Haas wrote: >> On Mon, Oct 21, 2013 at 11:52 PM, Fujii Masao wrote: >> > So, our consensus is to introduce the hooks for FPW compression so that >> > users can freely select their own best compression algorithm? >> > Also, probably we need to implement at least one compression contrib module >> > using that hook, maybe it's based on pglz or snappy. >> >> I don't favor making this pluggable. I think we should pick snappy or >> lz4 (or something else), put it in the tree, and use it. >> > Hi, > > My vote would be for lz4 since it has faster single thread compression > and decompression speeds with the decompression speed being almost 2X > snappy's decompression speed. The both are BSD licensed so that is not > an issue. The base code for lz4 is c and it is c++ for snappy. There > is also a HC (high-compression) varient for lz4 that pushes its compression > rate to about the same as zlib (-1) which uses the same decompressor which > can provide data even faster due to better compression. Some more real > world tests would be useful, which is really where being pluggable would > help. Well, it's probably a good idea for us to test, during the development cycle, which algorithm works better for WAL compression, and then use that one. Once we make that decision, I don't see that there are many circumstances in which a user would care to override it. Now if we find that there ARE reasons for users to prefer different algorithms in different situations, that would be a good reason to make it configurable (or even pluggable). But if we find that no such reasons exist, then we're better off avoiding burdening users with the need to configure a setting that has only one sensible value. It seems fairly clear from previous discussions on this mailing list that snappy and lz4 are the top contenders for the position of "compression algorithm favored by PostgreSQL". I am wondering, though, whether it wouldn't be better to add support for both - say we added both to libpgcommon, and perhaps we could consider moving pglz there as well. That would allow easy access to all of those algorithms from both front-end and backend-code. If we can make the APIs parallel, it should very simple to modify any code we add now to use a different algorithm than the one initially chosen if in the future we add algorithms to or remove algorithms from the list, or if one algorithm is shown to outperform another in some particular context. I think we'll do well to isolate the question of adding support for these algorithms form the current patch or any other particular patch that may be on the table, and FWIW, I think having two leading contenders and adding support for both may have a variety of advantages over crowning a single victor. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Thu, Oct 24, 2013 at 11:07:38AM -0400, Robert Haas wrote: > On Mon, Oct 21, 2013 at 11:52 PM, Fujii Masao wrote: > > So, our consensus is to introduce the hooks for FPW compression so that > > users can freely select their own best compression algorithm? > > Also, probably we need to implement at least one compression contrib module > > using that hook, maybe it's based on pglz or snappy. > > I don't favor making this pluggable. I think we should pick snappy or > lz4 (or something else), put it in the tree, and use it. > Hi, My vote would be for lz4 since it has faster single thread compression and decompression speeds with the decompression speed being almost 2X snappy's decompression speed. The both are BSD licensed so that is not an issue. The base code for lz4 is c and it is c++ for snappy. There is also a HC (high-compression) varient for lz4 that pushes its compression rate to about the same as zlib (-1) which uses the same decompressor which can provide data even faster due to better compression. Some more real world tests would be useful, which is really where being pluggable would help. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
Robert Haas writes: > On Mon, Oct 21, 2013 at 11:52 PM, Fujii Masao wrote: >> So, our consensus is to introduce the hooks for FPW compression so that >> users can freely select their own best compression algorithm? >> Also, probably we need to implement at least one compression contrib module >> using that hook, maybe it's based on pglz or snappy. > I don't favor making this pluggable. I think we should pick snappy or > lz4 (or something else), put it in the tree, and use it. I agree. Hooks in this area are going to be a constant source of headaches, vastly outweighing any possible benefit. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [BUGS] BUG #7873: pg_restore --clean tries to drop tables that don't exist
2013/10/24 Robert Haas > On Tue, Oct 22, 2013 at 9:37 PM, Josh Kupershmidt > wrote: > > On Mon, Oct 14, 2013 at 5:38 PM, Josh Kupershmidt > wrote: > >> Also, Pavel, this patch is still listed as 'Needs Review' in the CF > >> app, but I haven't seen a response to the concerns in my last message. > > > > It looks like this patch has been imported into the 2013-11 CF [1] and > > marked "Needs Review". I looked at the version of the patch pointed to > > in that CF entry back in July, and noted [2] several problems that > > still seemed to be present in the patch, for which I never saw a > > followup from Pavel. IMO this patch should have gotten marked > > "Returned with Feedback" pending a response from Pavel. > > That sounds reasonable to me. Perhaps you should so mark it. > +1 Regards Pavel > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: [HACKERS] Re: [BUGS] BUG #7873: pg_restore --clean tries to drop tables that don't exist
On Tue, Oct 22, 2013 at 9:37 PM, Josh Kupershmidt wrote: > On Mon, Oct 14, 2013 at 5:38 PM, Josh Kupershmidt wrote: >> Also, Pavel, this patch is still listed as 'Needs Review' in the CF >> app, but I haven't seen a response to the concerns in my last message. > > It looks like this patch has been imported into the 2013-11 CF [1] and > marked "Needs Review". I looked at the version of the patch pointed to > in that CF entry back in July, and noted [2] several problems that > still seemed to be present in the patch, for which I never saw a > followup from Pavel. IMO this patch should have gotten marked > "Returned with Feedback" pending a response from Pavel. That sounds reasonable to me. Perhaps you should so mark it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
On Thu, Oct 24, 2013 at 9:06 AM, Heikki Linnakangas wrote: > * the documentation should perhaps mention that the setting only has an > effect if POSIX shared memory is used. That's the default on Linux, but we > will try to fall back to SystemV shared memory if it fails. This is true for dynamic shared memory, but not for the main shared memory segment. The main shared memory segment is always the combination of a small, fixed-size System V shared memory chunk and a anonymous shared memory region created by mmap(NULL, ..., MAP_SHARED). POSIX shared memory is not used. (Exceptions: Anonymous shared memory isn't used on Windows, which has its own mechanism, or when compiling with EXEC_BACKEND, when the whole chunk is allocated as System V shared memory.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Mon, Oct 21, 2013 at 11:52 PM, Fujii Masao wrote: > So, our consensus is to introduce the hooks for FPW compression so that > users can freely select their own best compression algorithm? > Also, probably we need to implement at least one compression contrib module > using that hook, maybe it's based on pglz or snappy. I don't favor making this pluggable. I think we should pick snappy or lz4 (or something else), put it in the tree, and use it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On Tue, Oct 22, 2013 at 2:13 PM, Andres Freund wrote: > On 2013-10-22 13:57:53 -0400, Robert Haas wrote: >> On Tue, Oct 22, 2013 at 1:08 PM, Andres Freund >> wrote: >> >> That strikes me as a flaw in the implementation rather than the idea. >> >> You're presupposing a patch where the necessary information is >> >> available in WAL yet you don't make use of it at the proper time. >> > >> > The problem is that the mapping would be somewhere *ahead* from the >> > transaction/WAL we're currently decoding. We'd need to read ahead till >> > we find the correct one. >> >> Yes, I think that's what you need to do. > > My problem with that is that rewrite can be gigabytes into the future. > > When reading forward we could either just continue reading data into the > reorderbuffer, but delay replaying all future commits till we found the > currently needed remap. That might have quite the additional > storage/memory cost, but runtime complexity should be the same as normal > decoding. > Or we could individually read ahead for every transaction. But doing so > for every transaction will get rather expensive (rougly O(amount_of_wal^2)). [ Sorry it's taken me a bit of time to get back to this; other tasks intervened, and I also just needed some time to let it settle in my brain. ] If you read ahead looking for a set of ctid translations from relfilenode A to relfilenode B, and along the way you happen to encounter a set of translations from relfilenode C to relfilenode D, you could stash that set of translations away somewhere, so that if the next transaction you process needs that set of mappings, it's already computed. With that approach, you'd never have to pre-read the same set of WAL files more than once. But, as I think about it more, that's not very different from your idea of stashing the translations someplace other than WAL in the first place. I mean, if the read-ahead thread generates a series of files in pg_somethingorother that contain those maps, you could have just written the maps to that directory in the first place. So on further review I think we could adopt that approach. However, I'm leery about the idea of using a relation fork for this. I'm not sure whether that's what you had it mind, but it gives me the willies. First, it adds distributed overhead to the system, as previously discussed; and second, I think the accounting may be kind of tricky, especially in the face of multiple rewrites. I'd be more inclined to find a separate place to store the mappings. Note that, AFAICS, there's no real need for the mapping file to be block-structured, and I believe they'll be written first (with no readers) and subsequently only read (with no further writes) and eventually deleted. One possible objection to this is that it would preclude decoding on a standby, which seems like a likely enough thing to want to do. So maybe it's best to WAL-log the changes to the mapping file so that the standby can reconstruct it if needed. > I think that'd be pretty similar to just disallowing VACUUM > FREEZE/CLUSTER on catalog relations since effectively it'd be to > expensive to use. This seems unduly pessimistic to me; unless the catalogs are really darn big, this is a mostly theoretical problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Thu, Oct 24, 2013 at 6:54 AM, Andrew Dunstan wrote: > I'll be quite happy if we can get around the query text length limit. I have > greatly increased the buffer size at quite a few clients, in one case where > they run some pretty large auto-generated queries and have memory to burn, > up to 40k. I've already hacked up a prototype patch. I've heard enough complaints about that to want to fix it once and for all. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
Pavan Deolasee escribió: > Yeah, I had brought up similar idea up thread. Right now wal_level is > nicely ordered. But with this additional logic, I am not sure if we would > need multiple new levels and also break that ordering (I don't know if its > important). For example, one may want to set up streaming replication > with/without this feature or hot standby with/without the feature. I don't > have a good idea about how to capture them in wal_level. May be something > like: minimal, archive, archive_with_this_new_feature, hot_standby and > hot_standby_with_this_new_feature. That's confusing. A separate GUC sounds better. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: lob conversion functionality
On 22.10.2013 13:55, Pavel Stehule wrote: 2013/10/21 Noah Misch If you're prepared to change the function names and add the subset-oriented functions, I would appreciate that. here is patch lobj.sgml still refer to the old names. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 10/23/2013 07:51 PM, Peter Geoghegan wrote: On Wed, Oct 23, 2013 at 4:46 PM, Josh Berkus wrote: So you're suggesting that instead of storing the aggregates as we currently do, we store a buffer of the last N queries (in normal form) and their stats? And then aggregate when the user asks for it? No, I'm not. I'm suggesting storing the query texts externally, in a file. They usually use 1024 bytes of shared memory per entry, regardless of how long the query text is. This would allow pg_stat_statements to store arbitrarily large query texts, while also giving us breathing room if we have ambitions around expanding what pg_stat_statements can (optionally) track. Having said that, I am still pretty sensitive to bloating pg_stat_statements. Me too. I think min, max and stddev will have a fairly small impact, and give considerable bang for the buck. Not so sure about the other suggestions. And of course, memory impact is only half the story - CPU cycles spent is the other part. I'll be quite happy if we can get around the query text length limit. I have greatly increased the buffer size at quite a few clients, in one case where they run some pretty large auto-generated queries and have memory to burn, up to 40k. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RULE regression test fragility?
Andres Freund writes: > FWIW, I've repeatedly now thought that it'd make maintaining/updating > patches easier if we switched that query into unaligned tuple only (\a > \t) mode. That would remove the frequent conflicts on the row count and > widespread changes due to changed alignment. > Alternatively we could just wrap the query in \copy ... CSV. Hm ... yeah, it would be a good thing if changes in one view didn't so frequently have ripple effects to the whole output. Not sure which format is best for that though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On Thu, Oct 24, 2013 at 10:28:43AM +0530, Amit Kapila wrote: > On Thu, Oct 24, 2013 at 4:58 AM, Thomas Munro wrote: > > Hi > > I noticed that CLUSTER doesn't have a FREEZE option. Here is a patch to add > > that, for consistency with VACUUM. Is it useful? > > I wonder why anyone would like to freeze during CLUSTER command when > they already have separate way (VACUUM FREEZE) to achieve it, do you > know or can think of any case where user wants to do it along with > Cluster command? > > Anyway code side, I think you need to set both feeze_min_age as well > as freeze_table_age, see VACUUM command in gram.y > > CLUSTER opt_freeze opt_verbose qualified_name cluster_index_specification > > { > ClusterStmt *n = makeNode(ClusterStmt); > - n->relation = $3; > - n->indexname = $4; > - n->verbose = $2; > + n->relation = $4; > + n->freeze_min_age = $2 ? 0 : -1; > + n->indexname = $5; > + n->verbose = $3; > .. > > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com > Hi Amit, If the FREEZE is part of the CLUSTER, you would only read/write the table once. With a follow-up VACUUM FREEZE, you would re-read/write a second time. I, for one, would appreciate being able to perform both in the same run. (+1) Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
On 24.10.2013 09:03, Abhijit Menon-Sen wrote: This is a slightly reworked version of the patch submitted by Richard Poole last month, which was based on Christian Kruse's earlier patch. Thanks. With huge_tlb_pages=off, this is the best result I got: tps = 8680.771068 (including connections establishing) tps = 8721.504838 (excluding connections establishing) With huge_tlb_pages=on, this is the best result I got: tps = 9932.245203 (including connections establishing) tps = 9983.190304 (excluding connections establishing) (Even the worst result I got in the latter case was a smidgen faster than the best with huge_tlb_pages=off: 8796.344078 vs. 8721.504838.) That's really impressive. One open question is what to do about rounding up the size. It should not be necessary, but for the fairly recent bug described at the link in the comment (https://bugzilla.kernel.org/show_bug.cgi?id=56881). I tried it without the rounding-up, and it fails on Ubuntu's 3.5.0-28 kernel (mmap returns EINVAL). Let's get rid of the rounding. It's clearly a kernel bug, and it shouldn't be our business to add workarounds for any kernel bug out there. And the worst that will happen if you're running a buggy kernel version is that you fall back to not using huge pages (assuming huge_tlb_pages=try). Other comments: * guc.c doesn't actually need sys/mman.h for anything. Getting rid of the #include also lets you remove the configure test. * the documentation should perhaps mention that the setting only has an effect if POSIX shared memory is used. That's the default on Linux, but we will try to fall back to SystemV shared memory if it fails. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Thu, Oct 24, 2013 at 5:45 PM, Heikki Linnakangas wrote: > >> > Will just have to figure out what we want the user interface to be like; > should it be a separate guc, or somehow cram it into wal_level? > > Yeah, I had brought up similar idea up thread. Right now wal_level is nicely ordered. But with this additional logic, I am not sure if we would need multiple new levels and also break that ordering (I don't know if its important). For example, one may want to set up streaming replication with/without this feature or hot standby with/without the feature. I don't have a good idea about how to capture them in wal_level. May be something like: minimal, archive, archive_with_this_new_feature, hot_standby and hot_standby_with_this_new_feature. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Patch for fail-back without fresh backup
On 24.10.2013 14:15, Pavan Deolasee wrote: On Thu, Oct 24, 2013 at 4:22 PM, Heikki Linnakangas wrote: To fix that, pg_rewind could always start the rewinding process from the last checkpoint before the point that the histories diverge, instead of the exact point of divergence. Is that something required even if someone plans to use pg_rewind for a cluster with checksums enabled ? I mean since only first update after checkpoint is WAL logged, pg_rewind will break if another update happens after standby forks. Yes. It's broken as it is, even when checksums are enabled - good catch. I'll go change it to read all the WAL in the target starting from the last checkpoint before the point of divergence. Or would the recovery logic apply first WAL without looking at the page lsn ? (Sorry, may be I should read the code instead of asking you) WAL recovery does apply all the full-page images without looking at the page LSN, but that doesn't help in this case. pg_rewind copies over the blocks from the source server (= promoted standby) that were changed in the target server (= old master), after the standby's history diverged from it. In other words, it reverts the blocks that were changed in the old master, by copying them over from the promoted standby. After that, WAL recovery is performed, using the WAL from the promoted standby, to apply all the changes from the promoted standby that were not present in the old master. But it never replays any WAL from the old master. It reads it through, to construct the list of blocks that were modified, but it doesn't apply them. If we do what you are suggesting, it seems like a single line patch to me. In XLogSaveBufferForHint(), we probably need to look at this additional GUC to decide whether or not to backup the block. Yeah, it's trivial to add such a guc. Will just have to figure out what we want the user interface to be like; should it be a separate guc, or somehow cram it into wal_level? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Thu, Oct 24, 2013 at 4:45 PM, Pavan Deolasee wrote: > . Or would the recovery logic apply first WAL without looking at the page > lsn ? (Sorry, may be I should read the code instead of asking you) > > Never mind. I realized it has to. That's the whole purpose of backing it up in the first place. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Patch for fail-back without fresh backup
On Thu, Oct 24, 2013 at 4:22 PM, Heikki Linnakangas wrote: > . >> > > To fix that, pg_rewind could always start the rewinding process from the > last checkpoint before the point that the histories diverge, instead of the > exact point of divergence. Is that something required even if someone plans to use pg_rewind for a cluster with checksums enabled ? I mean since only first update after checkpoint is WAL logged, pg_rewind will break if another update happens after standby forks. Or would the recovery logic apply first WAL without looking at the page lsn ? (Sorry, may be I should read the code instead of asking you) If we do what you are suggesting, it seems like a single line patch to me. In XLogSaveBufferForHint(), we probably need to look at this additional GUC to decide whether or not to backup the block. That would make the rewinding more expensive as it needs to read through a > lot more WAL, but I think it would still be OK. Yeah, probably you are right. Though the amount of additional work could be significantly higher and some testing might be warranted. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Ident context leak during reloading of conf files when no ident information is present in the file
On 24.10.2013 13:20, Haribabu kommi wrote: There is an ident context leak which is occurs during reload of configuration files when there is no ident configuration items are present in the configuration file. In function load_ident(), New context is allocated to store the new_parsed_lines and deletes the old context when parsed_ident_lines are present. If no parsed_ident_lines are present then the context is not deleted. Patch with fix is attached in the mail. please review and let me know your suggestions. Thanks, committed. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On 24.10.2013 13:02, Pavan Deolasee wrote: Another difference AFAICS is that checksum feature needs the block to be backed up only after the first time a hint bit is updated after checkpoint. But for something like pg_rewind to work, we will need to WAL log every hint bit update on a page. So we would want to keep it as short as possible. To fix that, pg_rewind could always start the rewinding process from the last checkpoint before the point that the histories diverge, instead of the exact point of divergence. That would make the rewinding more expensive as it needs to read through a lot more WAL, but I think it would still be OK. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 regression
On 5 September 2013 22:24, Bruce Momjian wrote: > On Mon, Aug 19, 2013 at 09:27:57PM -0400, Stephen Frost wrote: >> * Andres Freund (and...@2ndquadrant.com) wrote: >> > I vote for adapting the patch to additionally zero out the file via >> > write(). In your tests that seemed to perform at least as good as the >> > old method... It also has the advantage that we can use it a littlebit >> > more as a testbed for possibly using it for heap extensions one day. >> > We're pretty early in the cycle, so I am not worried about this too much... >> >> I dunno, I'm pretty disappointed that this doesn't actually improve >> things. Just following this casually, it looks like it might be some >> kind of locking issue in the kernel that's causing it to be slower; or >> at least some code path that isn't exercise terribly much and therefore >> hasn't been given the love that it should. >> >> Definitely interested in what Ts'o says, but if we can't figure out why >> it's slower *without* writing out the zeros, I'd say we punt on this >> until Linux and the other OS folks improve the situation. > > FYI, the patch has been reverted. Is there an updated patch available for this? And did anyone hear from Ts'o? -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] UNION ALL on partitioned tables won't use indices.
Hello, Sorry that it's been a while.. 1. Observed symptom As you know, UNION ALL accompanied with ORDER BY uses indexes if available. > uniontest=# EXPLAIN SELECT * FROM c11 UNION ALL SELECT * FROM c12 ORDER BY a; >QUERY PLAN > --- > Merge Append (cost=0.59..10214.10 rows=20 width=16) >Sort Key: c11.a >-> Index Scan using ... on c11 (cost=0.29..3857.04 rows=10 width=16) >-> Index Scan using ... on c12 (cost=0.29..3857.04 rows=10 width=16) So do simple queries on partitioned (inheritance) tables. > uniontest=# EXPLAIN SELECT * FROM p1 ORDER BY a; > QUERY PLAN > -- > Merge Append (cost=0.73..11392.19 rows=21 width=16) >Sort Key: p1.a >-> Index Scan using ... on p1 (cost=0.12..8.14 rows=1 width=44) >-> Index Scan using ... on c11 (cost=0.29..3857.04 rows=10 width=16) >-> Index Scan using ... on c12 (cost=0.29..3857.04 rows=10 width=16) Nevertheless, UNION ALL on partitioned tables doesn't. This is quite unfavourable behavior especially having LIMIT. >uniontest=# EXPLAIN ANALYZE SELECT * FROM p1 >UNION ALL SELECT * FROM p2 ORDER BY a LIMIT 10; > QUERY PLAN > > Limit (actual time=182.732..182.735 rows=10 loops=1) > -> Sort (actual time=182.729..182.730 rows=10 loops=1) > Sort Key: p1.a > Sort Method: top-N heapsort Memory: 25kB > -> Append (actual time=0.029..108.109 rows=40 loops=1) > -> Seq Scan on p1 (actual time=0.001..0.001 rows=0 loops=1) > -> Seq Scan on c11 (actual time=0.027..19.074 rows=10 loops=1) > -> Seq Scan on c12 (actual time=0.014..16.653 rows=10 loops=1) > -> Seq Scan on p2 (actual time=0.000..0.000 rows=0 loops=1) > -> Seq Scan on c21 (actual time=0.012..16.677 rows=10 loops=1) > -> Seq Scan on c22 (actual time=0.012..16.794 rows=10 loops=1) > Total runtime: 182.857 ms 2. The cause In grouping_planner, flattern_simple_union_all creates appendrelinfos for each subqueries then expand_inherited_tables furthur expands the parent tables in each subqueries. Finally, this sequence leaves following structure. Where rte[2] and [3] are subqueries abandoned on the way pulling up rte[4] and [5]. rte[1] Subquery "SELECT*1", inh = 1 +- appinfo[0] - rte[4] Relation "p1", inh = 1 | +- appinfo[2] - rte[6] Relation "p1", inh = 0 | +- appinfo[3] - rte[7] Relation "c11", inh = 0 | +- appinfo[4] - rte[8] Relation "c12", inh = 0 +- appinfo[1] - rte[5] Relation "p2", inh = 1 +- appinfo[5] - rte[9] Relation "p1", inh = 0 +- appinfo[6] - rte[10] Relation "c11", inh = 0 +- appinfo[7] - rte[11] Relation "c12", inh = 0 On the other hand, ec member finally has exprs only for varno = 1, 4 and 5, in other words, it lacks the members for the most descendant RTEs. This is because add_child_rel_equivalences() always inhibits add_eq_member from registering the child_rel's relid on EC. Conequently these grandchild relations does not find index pathkeys for themselves. 3. Measures I could thought up three approaches for the problem. One is to simplly modifying here to give child flag in the parameters of add_eq_member accordig to whether the child_rel is inh or not. This seems to work fine although leaves two levels of MergeAppends (PATCH-1). So the additional patch is attached to collapse these MergeAppends (PATCH-2). This gives the same plan as PATCH-3. > uniontest=# explain analyze select * from p1 union all > select * from p2 order by a limit 10; >QUERY PLAN > > Limit (actual time=0.208..0.224 rows=10 loops=1) >-> Merge Append (actual time=0.205..0.218 rows=10 loops=1) > Sort Key: p1.a > -> Merge Append (actual time=0.110..0.120 rows=10 loops=1) >Sort Key: p1.a >-> Index Scan .. on p1 (actual time=0.006..0.006 rows=0 loops=1) >-> Index Scan .. on c11 (actual time=0.054..0.060 rows=10 loops=1) >-> Index Scan .. on c12 (actual time=0.046..0.046 rows=1 loops=1) > -> Merge Append (actual time=0.093..0.093 rows=1 loops=1) >Sort Key: p2.a >-> Index Scan .. on p2 (actual time=0.002..0.002 rows=0 loops=1) >-> Index Scan .. on c21 (actual time=0.047..0.047 rows=1 loops=1) >-> Index Scan .. on c22 (actual time=0.043..0.043 rows=1 loops=1) > Total runtime: 0.448 ms The second is to collaps
[HACKERS] Ident context leak during reloading of conf files when no ident information is present in the file
There is an ident context leak which is occurs during reload of configuration files when there is no ident configuration items are present in the configuration file. In function load_ident(), New context is allocated to store the new_parsed_lines and deletes the old context when parsed_ident_lines are present. If no parsed_ident_lines are present then the context is not deleted. Patch with fix is attached in the mail. please review and let me know your suggestions. Regards, Hari babu. ident_leak_v1.patch Description: ident_leak_v1.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Mon, Oct 21, 2013 at 7:10 PM, Sawada Masahiko wrote: > > > I agree with you. > If writing FPW is not large performance degradation, it is just idea > that we can use to write FPW in same timing as checksum enabled. > i.g., if we support new wal_level, the system writes FPW when a simple > SELECT updates hint bits. but checksum function is disabled. > Thought? I wonder if its too much for this purpose. In fact, we just need a way to know that a block could have been written on the master which the standby never saw. So even WAL logging just the block id should be good enough for pg_rewind to be able to detect and later copy that block from the new master. Having said that, I don't know if there is general advantage of WAL logging the exact hint bit update operation for other reasons. Another difference AFAICS is that checksum feature needs the block to be backed up only after the first time a hint bit is updated after checkpoint. But for something like pg_rewind to work, we will need to WAL log every hint bit update on a page. So we would want to keep it as short as possible. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
[HACKERS] Regress tests to improve the function coverage of schemacmds and user and tablespace files
Here I have added some more regression tests to improve the missing function coverage of schemacmds.c, user.c and tablespace.c. The added tests are mainly RENAME TO and OWNER TO support. patches are attached in the mail. please check and provide your suggestions. Regards, Hari babu. schema_user_v1.patch Description: schema_user_v1.patch tablespace_v1.patch Description: tablespace_v1.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RULE regression test fragility?
On 2013-10-23 20:50:30 -0400, Tom Lane wrote: > Mike Blackwell writes: > > While reviewing the Network Stats Traffic patch I discovered the current > > regression test for rules depends on the system view definitions not > > changing: > > Yes, this is standard. We just update the expected output anytime > somebody changes a system view. FWIW, I've repeatedly now thought that it'd make maintaining/updating patches easier if we switched that query into unaligned tuple only (\a \t) mode. That would remove the frequent conflicts on the row count and widespread changes due to changed alignment. Alternatively we could just wrap the query in \copy ... CSV. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
Hi, On 2013-10-24 00:28:44 +0100, Thomas Munro wrote: > I noticed that CLUSTER doesn't have a FREEZE option. Here is a patch to > add that, for consistency with VACUUM. Is it useful? I think you'd need to prevent that form from working on system catalogs - xmin has a meaning there sometimes (e.g. indcheckxmin) at least. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On 24 October 2013 05:58, Amit Kapila wrote: > On Thu, Oct 24, 2013 at 4:58 AM, Thomas Munro wrote: >> Hi >> I noticed that CLUSTER doesn't have a FREEZE option. Here is a patch to add >> that, for consistency with VACUUM. Is it useful? > > I wonder why anyone would like to freeze during CLUSTER command when > they already have separate way (VACUUM FREEZE) to achieve it, do you > know or can think of any case where user wants to do it along with > Cluster command? Well I can't speak for Thomas, but if you're doing such a heavy-duty operation on a table that involves an exclusive lock, you may as well freeze it. And in the case of CLUSTER, I imagine that in most instances you'd be confident the table isn't going to undergo much change (or at least not quickly). CLUSTER FREEZE would mean not having to run a separate VACUUM FREEZE after, and on a 10GB table, that's a big deal as it means not having to rescan the whole table again to freeze every row. Note that REFRESH MATERIALIZED VIEW freezes rows too as it's already writing all the data from scratch again, and has an exclusive lock. (this isn't the case with the CONCURRENTLY option obviously) -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers