Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)
On 7 December 2012 23:51, Stephen Frost sfr...@snowman.net wrote: Jeff, * Jeff Davis (pg...@j-davis.com) wrote: Most of your concerns seem to be related to freezing, and I'm mostly interested in HEAP_XMIN_COMMITTED optimizations. So I think we're talking past each other. My concern is about this patch/capability as a whole. I agree that the hint bit setting may be fine by itself, I'm not terribly concerned with that. Now, if you (and others) aren't concerned about the overall behavior that is being introduced here, that's fine, but it was my understanding from previous discussions that making tuples visible to all transactions, even those started before the committing transaction which are set more strictly than 'read-committed', wasn't 'ok'. Now, what I've honestly been hoping for on this thread was for someone to speak up and point out why I'm wrong about this concern and explain how this patch addresses that issue. If that's been done, I've missed it.. Visibility of pre-hinted data is an issue and because of that we previously agreed that it would be allowed only when explicitly requested by the user, which has been implemented as the FREEZE option on COPY. This then makes it identical to TRUNCATE, where a separate command differentiates MVCC-compliant row removal (DELETE) from non-MVCC row removal (TRUNCATE). So the committed feature does address the visibility issue. Jeff has been speaking about an extension to that behaviour, which would be to allow HEAP_XMIN_COMMITTED to be set in some cases also. The committed feature specifically does not do that. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On Fri, Dec 7, 2012 at 10:33 PM, Simon Riggs si...@2ndquadrant.com wrote: On 7 December 2012 12:37, Michael Paquier michael.paqu...@gmail.com wrote: - There is still a problem with toast indexes. If the concurrent reindex of a toast index fails for a reason or another, pg_relation will finish with invalid toast index entries. I am still wondering about how to clean up that. Any ideas? Build another toast index, rather than reindexing the existing one, then just use the new oid. Hum? The patch already does that. It creates concurrently a new index which is a duplicate of the existing one, then the old and new indexes are swapped. Finally the old index is dropped concurrently. The problem I still see is the following one: If a toast index, or a relation having a toast index, is being reindexed concurrently, and that the server crashes during the process, there will be invalid toast indexes in the server. If the crash happens before the swap, the new toast index is invalid. If the crash happens after the swap, the old toast index is invalid. I am not sure the user is able to clean up such invalid toast indexes manually as they are not visible to him. -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On Sat, Dec 8, 2012 at 2:19 AM, Andres Freund and...@2ndquadrant.comwrote: On 2012-12-07 12:01:52 -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On 7 December 2012 12:37, Michael Paquier michael.paqu...@gmail.com wrote: - There is still a problem with toast indexes. If the concurrent reindex of a toast index fails for a reason or another, pg_relation will finish with invalid toast index entries. I am still wondering about how to clean up that. Any ideas? Build another toast index, rather than reindexing the existing one, then just use the new oid. Thats easier said than done in the first place. toast_save_datum() explicitly opens/modifies the one index it needs and updates it. Um, I don't think you can swap in a new toast index OID without taking exclusive lock on the parent table at some point. The whole swapping issue isn't solved satisfyingly as whole yet :(. If we just swap the index relfilenodes in the pg_index entries itself, we wouldn't need to modify the main table's pg_class at all. I think you are mistaking here, relfilenode is a column of pg_class and not pg_index. So whatever the method used for swapping: relfilenode switch or relname switch, you need to modify the pg_class entry of the old and new indexes. -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] [WIP] pg_ping utility
On Fri, Dec 7, 2012 at 12:56 PM, Phil Sorber p...@omniti.com wrote: On Thu, Dec 6, 2012 at 8:54 PM, Michael Paquier OK. Let's do that and then mark this patch as ready for committer. Thanks, Those changes have been made. Cool. Thanks. Something I was just thinking about while testing this again. I mentioned the issue before about someone meaning to put -v and putting -V instead and it being a potential source of problems. What about making verbose the default and removing -v and adding -q to make it quiet? This would also match other tools behavior. I want to get this wrapped up and I am fine with it as is, but just wanted to ask what others thought. Bruce mentionned that pg_isready could be used directly by pg_ctl -w. Default as being non-verbose would make sense. What are the other tools you are thinking about? Some utilities in core? Except if you change the default behavior, let's change this patch status as ready to committer. Thanks, -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On 2012-12-08 21:24:47 +0900, Michael Paquier wrote: On Sat, Dec 8, 2012 at 2:19 AM, Andres Freund and...@2ndquadrant.comwrote: On 2012-12-07 12:01:52 -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On 7 December 2012 12:37, Michael Paquier michael.paqu...@gmail.com wrote: - There is still a problem with toast indexes. If the concurrent reindex of a toast index fails for a reason or another, pg_relation will finish with invalid toast index entries. I am still wondering about how to clean up that. Any ideas? Build another toast index, rather than reindexing the existing one, then just use the new oid. Thats easier said than done in the first place. toast_save_datum() explicitly opens/modifies the one index it needs and updates it. Um, I don't think you can swap in a new toast index OID without taking exclusive lock on the parent table at some point. The whole swapping issue isn't solved satisfyingly as whole yet :(. If we just swap the index relfilenodes in the pg_index entries itself, we wouldn't need to modify the main table's pg_class at all. I think you are mistaking here, relfilenode is a column of pg_class and not pg_index. So whatever the method used for swapping: relfilenode switch or relname switch, you need to modify the pg_class entry of the old and new indexes. The point is that with a relname switch the pg_class.oid of the index changes. Which is a bad idea because it will possibly be referred to by pg_depend entries. Relfilenodes - which certainly live in pg_class too, thats not the point - aren't referred to externally though. So if everything else in pg_class/pg_index stays the same a relfilenode switch imo saves you a lot of trouble. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] pg_ping utility
On Sat, Dec 8, 2012 at 7:50 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Fri, Dec 7, 2012 at 12:56 PM, Phil Sorber p...@omniti.com wrote: Something I was just thinking about while testing this again. I mentioned the issue before about someone meaning to put -v and putting -V instead and it being a potential source of problems. What about making verbose the default and removing -v and adding -q to make it quiet? This would also match other tools behavior. I want to get this wrapped up and I am fine with it as is, but just wanted to ask what others thought. Bruce mentionned that pg_isready could be used directly by pg_ctl -w. Default as being non-verbose would make sense. What are the other tools you are thinking about? Some utilities in core? I think Bruce meant that PQPing() is used by pg_ctl -w, not that he would use pg_isready. I was just thinking that if someone is going to use it in a script adding -q won't be a big deal. If someone wants to use it by hand adding -v could become cumbersome. Except if you change the default behavior, let's change this patch status as ready to committer. Thanks, -- Michael Paquier http://michael.otacoo.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] CommitFest 2012-11 Progress
We're entering the last planned week of the CF. Apart from a 72-hour period in mid-November, some CF has remained in-progress continuously for the last 176 days. With 64 out of the 82 current patches unresolved, we're on track to again see no gap between CF 2012-11 and CF 2013-01. The process as practiced today is a shell of its former self[1][2]. Authors, reviewers, committers: has the process changed to something better, worthy of formally replacing its predecessor? Or, rather, do we need a fresh process correction? Thanks, nm [1] http://wiki.postgresql.org/wiki/CommitFest [2] http://wiki.postgresql.org/wiki/Running_a_CommitFest -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction
On 2012-12-06 23:38:59 +0100, Tomas Vondra wrote: On 6.12.2012 05:47, Shigeru Hanada wrote: I've done a simple benchmark on my laptop with 2GB shared buffers, it's attached in the drop-test.py (it's a bit messy, but it works). [snip] With those parameters, I got these numbers on the laptop: creating 1 tables all tables created in 3.298694 seconds dropping 1 tables one by one ... all tables dropped in 32.692478 seconds creating 1 tables all tables created in 3.458178 seconds dropping 1 tables in batches by 100... all tables dropped in 3.28268 seconds So it's 33 seconds vs. 3.3 seconds, i.e. 10x speedup. On AWS we usually get larger differences, as we use larger shared buffers and the memory is significantly slower there. Do you have performance numbers of same test on not-patched PG? This patch aims to improve performance of bulk DROP, so 4th number in the result above should be compared to 4th number of not-patched PG? I've re-run the tests with the current patch on my home workstation, and the results are these (again 10k tables, dropped either one-by-one or in batches of 100). 1) unpatched dropping one-by-one:15.5 seconds dropping in batches of 100: 12.3 sec 2) patched (v3.1) dropping one-by-one:32.8 seconds dropping in batches of 100: 3.0 sec The problem here is that when dropping the tables one-by-one, the bsearch overhead is tremendous and significantly increases the runtime. I've done a simple check (if dropping a single table, use the original simple comparison) and I got this: 3) patched (v3.2) dropping one-by-one:16.0 seconds dropping in batches of 100: 3.3 sec i.e. the best of both. But it seems like an unnecessary complexity to me - if you need to drop a lot of tables you'll probably do that in a transaction anyway. Imo that's still a pretty bad performance difference. And your single-table optimization will probably fall short as soon as the table has indexes and/or a toast table... + +/* + * Used to sort relfilenode array (ordered by [relnode, dbnode, spcnode]), so + * that it's suitable for bsearch. + */ +static int +rnode_comparator(const void * p1, const void * p2) +{ + RelFileNodeBackend n1 = * (RelFileNodeBackend *) p1; + RelFileNodeBackend n2 = * (RelFileNodeBackend *) p2; + + if (n1.node.relNode n2.node.relNode) + return -1; + else if (n1.node.relNode n2.node.relNode) + return 1; + + if (n1.node.dbNode n2.node.dbNode) + return -1; + else if (n1.node.dbNode n2.node.dbNode) + return 1; + + if (n1.node.spcNode n2.node.spcNode) + return -1; + else if (n1.node.spcNode n2.node.spcNode) + return 1; + else + return 0; +} ISTM that this whole comparator could be replaced by memcmp(). That could quite possibly lower the overhead of the bsearch() in simple cases. We already rely on the fact that RelFileNode's have no padding atm (c.f. buf_internal.h), so a memcmp() seems fine to me. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework
On 2012-10-18 22:40:15 +0200, Boszormenyi Zoltan wrote: 2012-10-18 20:08 keltezéssel, Tom Lane írta: Alvaro Herrera alvhe...@2ndquadrant.com writes: Boszormenyi Zoltan escribió: this is the latest one, fixing a bug in the accounting of per-statement lock timeout handling and tweaking some comments. Tom, are you able to give this patch some more time on this commitfest? I'm still hoping to get to it, but I've been spending a lot of time on bug fixing rather than patch review lately :-(. If you're hoping to close out the current CF soon, maybe we should just slip it to the next one. Fine by me. Thanks. According to this the current state of the patch should be Ready for Committer not Needs Review is that right? I changed the state for now... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
Andres Freund and...@2ndquadrant.com writes: On 2012-12-08 21:24:47 +0900, Michael Paquier wrote: So whatever the method used for swapping: relfilenode switch or relname switch, you need to modify the pg_class entry of the old and new indexes. The point is that with a relname switch the pg_class.oid of the index changes. Which is a bad idea because it will possibly be referred to by pg_depend entries. Relfilenodes - which certainly live in pg_class too, thats not the point - aren't referred to externally though. So if everything else in pg_class/pg_index stays the same a relfilenode switch imo saves you a lot of trouble. I do not believe that it is safe to modify an index's relfilenode *nor* its OID without exclusive lock; both of those are going to be in use to identify and access the index in concurrent sessions. The only things we could possibly safely swap in a REINDEX CONCURRENTLY are the index relnames, which are not used for identification by the system itself. (I think. It's possible that even this breaks something.) Even then, any such update of the pg_class rows is dependent on switching to MVCC-style catalog access, which frankly is pie in the sky at the moment; the last time pgsql-hackers talked seriously about that, there seemed to be multiple hard problems besides mere performance. If you want to wait for that, it's a safe bet that we won't see this feature for a few years. I'm tempted to propose that REINDEX CONCURRENTLY simply not try to preserve the index name exactly. Something like adding or removing trailing underscores would probably serve to generate a nonconflicting name that's not too unsightly. Or just generate a new name using the same rules that CREATE INDEX would when no name is specified. Yeah, it's a hack, but what about the CONCURRENTLY commands isn't a hack? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort memory usage: grow_memtuples
Hi, On 2012-11-21 14:52:18 -0800, Jeff Janes wrote: On Thu, Nov 15, 2012 at 11:16 AM, Robert Haas robertmh...@gmail.com wrote: So what's next here? Do you want to work on these issue some more? Or does Jeff? This has been rewritten enough that I no longer feel much ownership of it. I'd prefer to leave it to Peter or Greg S., if they are willing to do it. Is anybody planning to work on this? There hasn't been any activity since the beginning of the CF and it doesn't look like there is much work left? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction
On 8.12.2012 15:26, Andres Freund wrote: On 2012-12-06 23:38:59 +0100, Tomas Vondra wrote: I've re-run the tests with the current patch on my home workstation, and the results are these (again 10k tables, dropped either one-by-one or in batches of 100). 1) unpatched dropping one-by-one:15.5 seconds dropping in batches of 100: 12.3 sec 2) patched (v3.1) dropping one-by-one:32.8 seconds dropping in batches of 100: 3.0 sec The problem here is that when dropping the tables one-by-one, the bsearch overhead is tremendous and significantly increases the runtime. I've done a simple check (if dropping a single table, use the original simple comparison) and I got this: 3) patched (v3.2) dropping one-by-one:16.0 seconds dropping in batches of 100: 3.3 sec i.e. the best of both. But it seems like an unnecessary complexity to me - if you need to drop a lot of tables you'll probably do that in a transaction anyway. Imo that's still a pretty bad performance difference. And your single-table optimization will probably fall short as soon as the table has indexes and/or a toast table... Why? I haven't checked the code but either those objects are droppped one-by-one (which seems unlikely) or they're added to the pending list and then the new code will kick in, making it actually faster. Yes, the original code might be faster even for 2 or 3 objects, but I can't think of a simple solution to fix this, given that it really depends on CPU/RAM speed and shared_buffers size. + +/* + * Used to sort relfilenode array (ordered by [relnode, dbnode, spcnode]), so + * that it's suitable for bsearch. + */ +static int +rnode_comparator(const void * p1, const void * p2) +{ +RelFileNodeBackend n1 = * (RelFileNodeBackend *) p1; +RelFileNodeBackend n2 = * (RelFileNodeBackend *) p2; + +if (n1.node.relNode n2.node.relNode) +return -1; +else if (n1.node.relNode n2.node.relNode) +return 1; + +if (n1.node.dbNode n2.node.dbNode) +return -1; +else if (n1.node.dbNode n2.node.dbNode) +return 1; + +if (n1.node.spcNode n2.node.spcNode) +return -1; +else if (n1.node.spcNode n2.node.spcNode) +return 1; +else +return 0; +} ISTM that this whole comparator could be replaced by memcmp(). That could quite possibly lower the overhead of the bsearch() in simple cases. We already rely on the fact that RelFileNode's have no padding atm (c.f. buf_internal.h), so a memcmp() seems fine to me. Good point, I'll try that. The original code used a macro that simply compared the fields and I copied that logic, but if we can remove the code ... Thanks for the review! Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On 2012-12-08 09:40:43 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2012-12-08 21:24:47 +0900, Michael Paquier wrote: So whatever the method used for swapping: relfilenode switch or relname switch, you need to modify the pg_class entry of the old and new indexes. The point is that with a relname switch the pg_class.oid of the index changes. Which is a bad idea because it will possibly be referred to by pg_depend entries. Relfilenodes - which certainly live in pg_class too, thats not the point - aren't referred to externally though. So if everything else in pg_class/pg_index stays the same a relfilenode switch imo saves you a lot of trouble. I do not believe that it is safe to modify an index's relfilenode *nor* its OID without exclusive lock; both of those are going to be in use to identify and access the index in concurrent sessions. The only things we could possibly safely swap in a REINDEX CONCURRENTLY are the index relnames, which are not used for identification by the system itself. (I think. It's possible that even this breaks something.) Well, the patch currently *does* take an exlusive lock in an extra transaction just for the swapping. In that case it should actually be safe. Although that obviously removes part of the usefulness of the feature. Even then, any such update of the pg_class rows is dependent on switching to MVCC-style catalog access, which frankly is pie in the sky at the moment; the last time pgsql-hackers talked seriously about that, there seemed to be multiple hard problems besides mere performance. If you want to wait for that, it's a safe bet that we won't see this feature for a few years. Yea :( I'm tempted to propose that REINDEX CONCURRENTLY simply not try to preserve the index name exactly. Something like adding or removing trailing underscores would probably serve to generate a nonconflicting name that's not too unsightly. Or just generate a new name using the same rules that CREATE INDEX would when no name is specified. Yeah, it's a hack, but what about the CONCURRENTLY commands isn't a hack? I have no problem with ending up with a new name or something like that. If that is what it takes: fine, no problem. The issue I raised above is just about keeping the pg_depend entries pointing to something valid... And not changing the indexes pg_class.oid seems to be the easiest solution for that. I have some vague schemes in my had that we can solve the swapping issue with 3 entries for the index in pg_class, but they all only seem to come to my head while I don't have anything to write them down, so they are probably bogus. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gistchoose vs. bloat
Hi, On 2012-11-02 12:54:33 +0400, Alexander Korotkov wrote: On Sun, Oct 21, 2012 at 11:03 AM, Jeff Davis pg...@j-davis.com wrote: On Thu, 2012-10-18 at 15:09 -0300, Alvaro Herrera wrote: Jeff, do you think we need more review of this patch? In the patch, it refers to rd_options without checking for NULL first, which needs to be fixed. There's actually still one place where it says id rather than is. Just a nitpick. Regarding my point 4 from the previous email, I mildly disagree with the style, but I don't see a correctness problem there. If the first two items are fixed, then the patch is fine with me. First two items are fixed in attached version of the patch. So the patch is ready for committer now? I notice there's no documentation about the new reloption at all? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statistics and selectivity estimation for ranges
On 2012-11-05 22:10:50 -0800, Jeff Davis wrote: On Mon, 2012-11-05 at 11:12 -0300, Alvaro Herrera wrote: What's going on with this patch? I haven't seen any activity in a while. Should I just move this to the next commitfest? Sorry, I dropped the ball here. I will still review it, whether it makes this commitfest or not. Sorry to nag, but it starts to look like it might fall of the end of the next CF... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SP-GiST for ranges based on 2d-mapping and quad-tree
Hi Alexander, On 2012-11-04 11:41:48 -0800, Jeff Davis wrote: On Fri, 2012-11-02 at 12:47 +0400, Alexander Korotkov wrote: Right version of patch is attached. * In bounds_adjacent, there's no reason to flip the labels back. * Comment should indicate more explicitly that bounds_adjacent is sensitive to the argument order. * In bounds_adjacent, it appears that bound1 corresponds to B in the comment above, and bound2 corresponds to A in the comment above. I would have guessed from reading the comment that bound1 corresponded to A. We should just use consistent names between the comment and the code (e.g. boundA and boundB). * I could be getting confused, but I think that line 645 of rangetypes_spgist.c should be inverted (!bounds_adjacent(...)). * I think needPrevious should have an explanatory comment somewhere. It looks like you are using it to store some state as you descend the tree, but it doesn't look like it's used to reconstruct the value (because we already have the value anyway). Since it's being used for a purpose other than what's intended, that should be explained. Do you have time to address these comments or should the patch marked as returned with Feedback? Afaics there hasn't been progress since CF2... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
Andres Freund and...@2ndquadrant.com writes: The issue I raised above is just about keeping the pg_depend entries pointing to something valid... And not changing the indexes pg_class.oid seems to be the easiest solution for that. Yeah, we would have to update pg_depend, pg_constraint, maybe some other places if we go with that. I think that would be safe because we'd be holding ShareRowExclusive lock on the parent table throughout, so nobody else should be doing anything that's critically dependent on seeing such rows. But it'd be a lot of ugly code, for sure. Maybe the best way is to admit that we need a short-term exclusive lock for the swapping step. Or we could wait for MVCC catalog access ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: auto updatable views [Review of Patch]
Continuing to work on this ... I found multiple things I didn't like about the permission-field update code. Attached are some heavily commented extracts from the code as I've changed it. Does anybody object to either the code or the objectives given in the comments? regards, tom lane /* * adjust_view_column_set - map a set of column numbers according to targetlist * * This is used with simply-updatable views to map column-permissions sets for * the view columns onto the matching columns in the underlying base relation. * The targetlist is expected to be a list of plain Vars of the underlying * relation (as per the checks above in view_is_auto_updatable). */ static Bitmapset * adjust_view_column_set(Bitmapset *cols, List *targetlist) { Bitmapset *result = NULL; Bitmapset *tmpcols; AttrNumber col; tmpcols = bms_copy(cols); while ((col = bms_first_member(tmpcols)) = 0) { /* bit numbers are offset by FirstLowInvalidHeapAttributeNumber */ AttrNumber attno = col + FirstLowInvalidHeapAttributeNumber; if (attno == InvalidAttrNumber) { /* * There's a whole-row reference to the view. For permissions * purposes, treat it as a reference to each column available from * the view. (We should *not* convert this to a whole-row * reference to the base relation, since the view may not touch * all columns of the base relation.) */ ListCell *lc; foreach(lc, targetlist) { TargetEntry *tle = (TargetEntry *) lfirst(lc); Var*var; if (tle-resjunk) continue; var = (Var *) tle-expr; Assert(IsA(var, Var)); result = bms_add_member(result, var-varattno - FirstLowInvalidHeapAttributeNumber); } } else { /* * Views do not have system columns, so we do not expect to see * any other system attnos here. If we do find one, the error * case will apply. */ TargetEntry *tle = get_tle_by_resno(targetlist, attno); if (tle != NULL IsA(tle-expr, Var)) { Var*var = (Var *) tle-expr; result = bms_add_member(result, var-varattno - FirstLowInvalidHeapAttributeNumber); } else elog(ERROR, attribute number %d not found in view targetlist, attno); } } bms_free(tmpcols); return result; } /* * Mark the new target RTE for the permissions checks that we want to * enforce against the view owner, as distinct from the query caller. At * the relation level, require the same INSERT/UPDATE/DELETE permissions * that the query caller needs against the view. We drop the ACL_SELECT * bit that is presumably in new_rte-requiredPerms initially. * * Note: the original view RTE remains in the query's rangetable list. * Although it will be unused in the query plan, we need it there so that * the executor still performs appropriate permissions checks for the * query caller's use of the view. */ new_rte-checkAsUser = view-rd_rel-relowner; new_rte-requiredPerms = view_rte-requiredPerms; /* * Now for the per-column permissions bits. * * Initially, new_rte contains selectedCols permission check bits for all * base-rel columns referenced by the view, but since the view is a SELECT * query its modifiedCols is empty. We set modifiedCols to include all * the columns the outer query is trying to modify, adjusting the column * numbers as needed. But we leave selectedCols as-is, so the view owner * must have read permission for all columns used in the view definition, * even if some of them are not read by the upper query. We could try to * limit selectedCols to only columns used in the transformed query, but * that does not correspond to what happens in ordinary SELECT usage of a
Re: [HACKERS] review: pgbench - aggregation of info written into log
Hi Tomas, On 2012-11-27 14:55:59 +0100, Pavel Stehule wrote: attached is a v4 of the patch. There are not many changes, mostly some simple tidying up, except for handling the Windows. After a quick look I am not sure what all the talk about windows is about? instr_time.h seems to provide all you need, even for windows? The only issue of gettimeofday() for windows seems to be that it is that its not all that fast an not too high precision, but that shouldn't be a problem in this case? Could you expand a bit on the problems? * I had a problem with doc The current patch has conflict markers in the sgml source, there seems to have been some unresolved merge. Maybe that's all that causes the errors? Whats your problem with setting up the doc toolchain? issues: * empty lines with invisible chars (tabs) + and sometimes empty lines after and before {} * adjustment of start_time + * the desired interval */ + while (agg-start_time + agg_interval INSTR_TIME_GET_DOUBLE(now)) + agg-start_time = agg-start_time + agg_interval; can skip one interval - so when transaction time will be larger or similar to agg_interval - then results can be strange. We have to know real length of interval Could you post a patch that adresses these issues? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel pg_dump
Hi, On 2012-10-15 17:13:10 -0400, Andrew Dunstan wrote: On 10/13/2012 10:46 PM, Andrew Dunstan wrote: On 09/17/2012 10:01 PM, Joachim Wieland wrote: On Mon, Jun 18, 2012 at 10:05 PM, Joachim Wieland j...@mcknight.de wrote: Attached is a rebased version of the parallel pg_dump patch. Attached is another rebased version for the current commitfest. These did not apply cleanly, but I have fixed them up. The combined diff against git tip is attached. It can also be pulled from my parallel_dump branch on https://github.com/adunstan/postgresql-dev.git This builds and runs OK on Linux, which is a start ... Well, you would also need this piece if you're applying the patch (sometimes I forget to do git add ...) The patch is marked as Ready for Committer in the CF app, but at least the whole windows situation seems to be unresolved as of yet? Is anybody working on this? I would *love* to get this... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: auto updatable views [Review of Patch]
On 8 December 2012 15:21, Tom Lane t...@sss.pgh.pa.us wrote: Continuing to work on this ... I found multiple things I didn't like about the permission-field update code. Attached are some heavily commented extracts from the code as I've changed it. Does anybody object to either the code or the objectives given in the comments? Thanks for looking at this. The new, adjusted code looks good to me. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failing SSL connection due to weird interaction with openssl
On 2012-11-26 21:45:32 -0500, Tom Lane wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: I gather that this is supposed to be back-patched to all supported branches. FWIW, I don't like that patch any better than Robert does. It seems as likely to do harm as good. If there are places where libpq itself is leaving entries on the error stack, we should fix them -- retail. But it's not our business to clobber global state because there might be bugs in some other part of an application. As there hasn't been any new input since this comment I am marking the patch as Rejected in the CF application. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction
On 8.12.2012 15:49, Tomas Vondra wrote: On 8.12.2012 15:26, Andres Freund wrote: On 2012-12-06 23:38:59 +0100, Tomas Vondra wrote: I've re-run the tests with the current patch on my home workstation, and the results are these (again 10k tables, dropped either one-by-one or in batches of 100). 1) unpatched dropping one-by-one:15.5 seconds dropping in batches of 100: 12.3 sec 2) patched (v3.1) dropping one-by-one:32.8 seconds dropping in batches of 100: 3.0 sec The problem here is that when dropping the tables one-by-one, the bsearch overhead is tremendous and significantly increases the runtime. I've done a simple check (if dropping a single table, use the original simple comparison) and I got this: 3) patched (v3.2) dropping one-by-one:16.0 seconds dropping in batches of 100: 3.3 sec i.e. the best of both. But it seems like an unnecessary complexity to me - if you need to drop a lot of tables you'll probably do that in a transaction anyway. Imo that's still a pretty bad performance difference. And your single-table optimization will probably fall short as soon as the table has indexes and/or a toast table... Why? I haven't checked the code but either those objects are droppped one-by-one (which seems unlikely) or they're added to the pending list and then the new code will kick in, making it actually faster. Yes, the original code might be faster even for 2 or 3 objects, but I can't think of a simple solution to fix this, given that it really depends on CPU/RAM speed and shared_buffers size. I've done some test and yes - once there are other objects the optimization falls short. For example for tables with one index, it looks like this: 1) unpatched one by one: 28.9 s 100 batches: 23.9 s 2) patched one by one: 44.1 s 100 batches: 4.7 s So the patched code is by about 50% slower, but this difference quickly disappears with the number of indexes / toast tables etc. I see this as an argument AGAINST such special-case optimization. My reasoning is this: * This difference is significant only if you're dropping a table with low number of indexes / toast tables. In reality this is not going to be very frequent. * If you're dropping a single table, it really does not matter - the difference will be like 100 ms vs. 200 ms or something like that. * If you're dropping a lot of tables, you should do than in a transaction anyway, or you should be aware that doing that in a transaction will be faster (we can add this info into DROP TABLE docs). IMHO this is similar to doing a lot of INSERTs - doing all of them in a single transaction is faster. The possibility that someone needs to drop a lot of tables in separate transactions exists in theory, but I don't know of a realistic real-world usage. So I'd vote to ditch that special case optimization. + +/* + * Used to sort relfilenode array (ordered by [relnode, dbnode, spcnode]), so + * that it's suitable for bsearch. + */ +static int +rnode_comparator(const void * p1, const void * p2) +{ + RelFileNodeBackend n1 = * (RelFileNodeBackend *) p1; + RelFileNodeBackend n2 = * (RelFileNodeBackend *) p2; + + if (n1.node.relNode n2.node.relNode) + return -1; + else if (n1.node.relNode n2.node.relNode) + return 1; + + if (n1.node.dbNode n2.node.dbNode) + return -1; + else if (n1.node.dbNode n2.node.dbNode) + return 1; + + if (n1.node.spcNode n2.node.spcNode) + return -1; + else if (n1.node.spcNode n2.node.spcNode) + return 1; + else + return 0; +} ISTM that this whole comparator could be replaced by memcmp(). That could quite possibly lower the overhead of the bsearch() in simple cases. We already rely on the fact that RelFileNode's have no padding atm (c.f. buf_internal.h), so a memcmp() seems fine to me. Good point, I'll try that. The original code used a macro that simply compared the fields and I copied that logic, but if we can remove the code ... Hmmm, I've replaced the comparator with this one: static int rnode_comparator(const void * p1, const void * p2) { return memcmp(p1, p2, sizeof(RelFileNode)); } and while it's not significantly faster (actually it's often a bit slower than the original comparator), it's a much simpler code. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel pg_dump
On 12/08/2012 11:01 AM, Andres Freund wrote: Hi, On 2012-10-15 17:13:10 -0400, Andrew Dunstan wrote: On 10/13/2012 10:46 PM, Andrew Dunstan wrote: On 09/17/2012 10:01 PM, Joachim Wieland wrote: On Mon, Jun 18, 2012 at 10:05 PM, Joachim Wieland j...@mcknight.de wrote: Attached is a rebased version of the parallel pg_dump patch. Attached is another rebased version for the current commitfest. These did not apply cleanly, but I have fixed them up. The combined diff against git tip is attached. It can also be pulled from my parallel_dump branch on https://github.com/adunstan/postgresql-dev.git This builds and runs OK on Linux, which is a start ... Well, you would also need this piece if you're applying the patch (sometimes I forget to do git add ...) The patch is marked as Ready for Committer in the CF app, but at least the whole windows situation seems to be unresolved as of yet? Is anybody working on this? I would *love* to get this... I am working on it when I get a chance, but keep getting hammered. I'd love somebody else to review it too. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: pgbench - aggregation of info written into log
On 8.12.2012 16:33, Andres Freund wrote: Hi Tomas, On 2012-11-27 14:55:59 +0100, Pavel Stehule wrote: attached is a v4 of the patch. There are not many changes, mostly some simple tidying up, except for handling the Windows. After a quick look I am not sure what all the talk about windows is about? instr_time.h seems to provide all you need, even for windows? The only issue of gettimeofday() for windows seems to be that it is that its not all that fast an not too high precision, but that shouldn't be a problem in this case? Could you expand a bit on the problems? Well, in the current pgbench.c, there's this piece of code #ifndef WIN32 /* This is more than we really ought to know about instr_time */ fprintf(logfile, %d %d %.0f %d %ld %ld\n, st-id, st-cnt, usec, st-use_file, (long) now.tv_sec, (long) now.tv_usec); #else /* On Windows, instr_time doesn't provide a timestamp anyway */ fprintf(logfile, %d %d %.0f %d 0 0\n, st-id, st-cnt, usec, st-use_file); #endif and the Windows-related discussion in this patch is about the same issue. As I understand it the problem seems to be that INSTR_TIME_SET_CURRENT does not provide the timestamp at all. I was thinking about improving this by combining gettimeofday and INSTR_TIME, but I have no Windows box to test it on :-( * I had a problem with doc The current patch has conflict markers in the sgml source, there seems to have been some unresolved merge. Maybe that's all that causes the errors? Ah, I see. Probably my fault, I'll fix that. Whats your problem with setting up the doc toolchain? issues: * empty lines with invisible chars (tabs) + and sometimes empty lines after and before {} * adjustment of start_time + * the desired interval */ + while (agg-start_time + agg_interval INSTR_TIME_GET_DOUBLE(now)) + agg-start_time = agg-start_time + agg_interval; can skip one interval - so when transaction time will be larger or similar to agg_interval - then results can be strange. We have to know real length of interval Could you post a patch that adresses these issues? Yes, I'll work on it today/tomorrow and I'll send an improved patch. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CommitFest 2012-11 Progress
Hi Noah, On 2012-12-08 09:06:01 -0500, Noah Misch wrote: We're entering the last planned week of the CF. Apart from a 72-hour period in mid-November, some CF has remained in-progress continuously for the last 176 days. With 64 out of the 82 current patches unresolved, we're on track to again see no gap between CF 2012-11 and CF 2013-01. The process as practiced today is a shell of its former self[1][2]. Authors, reviewers, committers: has the process changed to something better, worthy of formally replacing its predecessor? Or, rather, do we need a fresh process correction? I agree that lately there seems to be minimal benefit of the commitfest process. I don't see many disadvantages either, but... No idea how to fix that, it seems most people are simply swamped with work and stuff. I made a pass through most of the commitfest entries to update their state to something sensible and where it seemed necessary inquired the newest status. Not sure thats really welcome, but it was the only thing I could think of helping to make some progress. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework
2012-12-08 15:30 keltezéssel, Andres Freund írta: On 2012-10-18 22:40:15 +0200, Boszormenyi Zoltan wrote: 2012-10-18 20:08 keltezéssel, Tom Lane írta: Alvaro Herrera alvhe...@2ndquadrant.com writes: Boszormenyi Zoltan escribió: this is the latest one, fixing a bug in the accounting of per-statement lock timeout handling and tweaking some comments. Tom, are you able to give this patch some more time on this commitfest? I'm still hoping to get to it, but I've been spending a lot of time on bug fixing rather than patch review lately :-(. If you're hoping to close out the current CF soon, maybe we should just slip it to the next one. Fine by me. Thanks. According to this the current state of the patch should be Ready for Committer not Needs Review is that right? I changed the state for now... Thanks. I just tried the patch on current GIT HEAD and gives some offset warnings but no rejects. Also, it compiles without warnings and still works as it should. Should I post a new patch that applies cleanly? Best regards, Zoltán Böszörményi -- -- Zoltán Böszörményi Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework
Boszormenyi Zoltan z...@cybertec.at writes: Thanks. I just tried the patch on current GIT HEAD and gives some offset warnings but no rejects. Also, it compiles without warnings and still works as it should. Should I post a new patch that applies cleanly? Offsets are not a problem --- if you tried to keep them exact you'd just be making a lot of unnecessary updates. If there were fuzz warnings I'd be more concerned. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Sketch of a Hook into the Logging Collector
Hello all, I have some needs that seem to support changing Postgres slightly to give user programs a lot more power over how to process logging output that neither the log collector nor the syslog output can well-satisfy as-is. I am approaching this from the angle of increasing power by exposing the log collector (syslogger) pipe protocol. Included is a patch whose general aesthetic is to touch as little of the logging code as possible while achieving maximum power in a contrib and stand-alone sample program. However, a more satisfying treatment that may be slightly more invasive to the logging system is very welcome. The general idea here is that the logging collector pipe protocol is a pretty reasonable one for non-Postgres programs to consume as-is pretty easily. The proof of concept consists of three things: * A hook into the log collector. It is small but a little weird compared to the other hooks in the system. * A contrib, pg_logcollectdup. This contrib lets one forward logs to a named pipe specified in postgresql.conf. * A stand-alone program, pg_logcollect_sample, that renders protocol traffic newline separated and with its binary fields converted to readable text. The patch can also be seen from https://github.com/fdr/postgres.git in the branch 'logdup'. I do rebase this at-will for the time being. I have a few detailed dissatisfactions with this approach, but I'd rather hear your own dissatisfactions. I also don't like the name. A demo of configuring all of the above and seeing some output follows. I can paste this in one go on my machine. It creates /tmp/pg-logdup and /tmp/pgdata-logdup. All spawned processes can be listed afterwards with 'jobs'. # Install postgres and pg_logcollectdup contrib ./configure --prefix=/tmp/pg-logdup make -sj16 install cd contrib/pg_logcollectdup/ make -s install /tmp/pg-logdup/bin/initdb -D /tmp/pgdata-logdup mkfifo /tmp/pgdata-logdup/log-pipe # Configure postgresql.conf echo logging_collector = on \ /tmp/pgdata-logdup/postgresql.conf echo shared_preload_libraries = 'pg_logcollectdup' \ /tmp/pgdata-logdup/postgresql.conf echo logcollectdup.destination = '/tmp/pgdata-logdup/log-pipe' \ /tmp/pgdata-logdup/postgresql.conf # Set up destination pipe mkfifo /tmp/pgdata-logdup/log-pipe # Build sample pipe formatter make pg_logcollect_sample # Run it in the background ./pg_logcollect_sample\ /tmp/pgdata-logdup/log-pipe \ /tmp/pgdata-logdup/duplogs.txt # Run Postgres with a non-default port for convenience /tmp/pg-logdup/bin/postgres -D /tmp/pgdata-logdup --port=2345 # Prevent race against file creation, then look at the logs sleep 1 cat /tmp/pgdata-logdup/duplogs.txt -- fdr log-collector-extension-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sketch of a Hook into the Logging Collector
On Sat, Dec 8, 2012 at 10:40 AM, Daniel Farina dan...@heroku.com wrote: * A contrib, pg_logcollectdup. This contrib lets one forward logs to a named pipe specified in postgresql.conf. I have revised this part in the attached patch. It's some error handling in a case of user error, and the previous demo script and narrative precepts are still the same. -- fdr log-collector-extension-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel pg_dump
On Sat, Dec 8, 2012 at 11:13:30AM -0500, Andrew Dunstan wrote: On 12/08/2012 11:01 AM, Andres Freund wrote: Hi, On 2012-10-15 17:13:10 -0400, Andrew Dunstan wrote: On 10/13/2012 10:46 PM, Andrew Dunstan wrote: On 09/17/2012 10:01 PM, Joachim Wieland wrote: On Mon, Jun 18, 2012 at 10:05 PM, Joachim Wieland j...@mcknight.de wrote: Attached is a rebased version of the parallel pg_dump patch. Attached is another rebased version for the current commitfest. These did not apply cleanly, but I have fixed them up. The combined diff against git tip is attached. It can also be pulled from my parallel_dump branch on https://github.com/adunstan/postgresql-dev.git This builds and runs OK on Linux, which is a start ... Well, you would also need this piece if you're applying the patch (sometimes I forget to do git add ...) The patch is marked as Ready for Committer in the CF app, but at least the whole windows situation seems to be unresolved as of yet? Is anybody working on this? I would *love* to get this... I am working on it when I get a chance, but keep getting hammered. I'd love somebody else to review it too. FYI, I will be posting pg_upgrade performance numbers using Unix processes. I will try to get the Windows code working but will also need help. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)
Simon, * Simon Riggs (si...@2ndquadrant.com) wrote: Visibility of pre-hinted data is an issue and because of that we previously agreed that it would be allowed only when explicitly requested by the user, which has been implemented as the FREEZE option on COPY. This then makes it identical to TRUNCATE, where a separate command differentiates MVCC-compliant row removal (DELETE) from non-MVCC row removal (TRUNCATE). To be honest, I really don't buy off on this. Yes, TRUNCATE (a top-level, individually permissioned command) can violate MVCC and make things which might have been visible to a given transaction no longer visible to that transaction. I find that very different from making rows which should *not* be visible suddenly appear. So the committed feature does address the visibility issue. Not hardly. It lets a user completely violate the basic rules of the overall database. The *correct* solution to this problem is to actually *fix* it, by setting it up such that tables created after a particular transaction starts aren't visible and similar. Not by punting to the user with very little control over it (or so I'm guessing). Will we accept a patch to do an INSERT or UPDATE FREEZE...? I realize this might be a bit of a stretch, but why not allow February 31st to be accepted as a date also, should the user request it? This is quite a slippery slope, in my view. Jeff has been speaking about an extension to that behaviour, which would be to allow HEAP_XMIN_COMMITTED to be set in some cases also. The committed feature specifically does not do that. It's not clear to me why you *wouldn't* just go ahead and set everything you can at the same time...? It hardly seems like it could be worse than what is apparently going to happen with this command... Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] autovacuum truncate exclusive lock round two
On 12/6/2012 12:45 PM, Robert Haas wrote: On Wed, Dec 5, 2012 at 10:16 PM, Jan Wieck janwi...@yahoo.com wrote: That sort of dynamic approach would indeed be interesting. But I fear that it is going to be complex at best. The amount of time spent in scanning heavily depends on the visibility map. The initial vacuum scan of a table can take hours or more, but it does update the visibility map even if the vacuum itself is aborted later. The next vacuum may scan that table in almost no time at all, because it skips all blocks that are marked all visible. Well, if that's true, then there's little reason to worry about giving up quickly, because the next autovacuum a minute later won't consume many resources. Almost no time is of course relative to what an actual scan and dead tuple removal cost. Looking at a table with 3 GB of dead tuples at the end, the initial vacuum scan takes hours. When vacuum comes back to this table, cleaning up a couple megabytes of newly deceased tuples and then skipping over the all visible pages may take a minute. Based on the discussion and what I feel is a consensus I have created an updated patch that has no GUC at all. The hard coded parameters in include/postmaster/autovacuum.h are AUTOVACUUM_TRUNCATE_LOCK_CHECK_INTERVAL 20 /* ms */ AUTOVACUUM_TRUNCATE_LOCK_WAIT_INTERVAL 50 /* ms */ AUTOVACUUM_TRUNCATE_LOCK_TIMEOUT 5000 /* ms */ I gave that the worst workload I can think of. A pgbench (style) application that throws about 10 transactions per second at it, so that there is constantly the need to give up the lock due to conflicting lock requests and then reacquiring it again. A cleanup process is periodically moving old tuples from the history table to an archive table, making history a rolling window table. And a third job that 2-3 times per minute produces a 10 second lasting transaction, forcing autovacuum to give up on the lock reacquisition. Even with that workload autovacuum slow but steady is chopping away at the table. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index c9253a9..fd3e91e 100644 *** a/src/backend/commands/vacuumlazy.c --- b/src/backend/commands/vacuumlazy.c *** *** 52,62 --- 52,64 #include storage/bufmgr.h #include storage/freespace.h #include storage/lmgr.h + #include storage/proc.h #include utils/lsyscache.h #include utils/memutils.h #include utils/pg_rusage.h #include utils/timestamp.h #include utils/tqual.h + #include portability/instr_time.h /* *** typedef struct LVRelStats *** 103,108 --- 105,111 ItemPointer dead_tuples; /* array of ItemPointerData */ int num_index_scans; TransactionId latestRemovedXid; + bool lock_waiter_detected; } LVRelStats; *** lazy_vacuum_rel(Relation onerel, VacuumS *** 193,198 --- 196,203 vacrelstats-old_rel_pages = onerel-rd_rel-relpages; vacrelstats-old_rel_tuples = onerel-rd_rel-reltuples; vacrelstats-num_index_scans = 0; + vacrelstats-pages_removed = 0; + vacrelstats-lock_waiter_detected = false; /* Open all indexes of the relation */ vac_open_indexes(onerel, RowExclusiveLock, nindexes, Irel); *** lazy_vacuum_rel(Relation onerel, VacuumS *** 259,268 vacrelstats-hasindex, new_frozen_xid); ! /* report results to the stats collector, too */ ! pgstat_report_vacuum(RelationGetRelid(onerel), onerel-rd_rel-relisshared, new_rel_tuples); /* and log the action if appropriate */ if (IsAutoVacuumWorkerProcess() Log_autovacuum_min_duration = 0) --- 264,281 vacrelstats-hasindex, new_frozen_xid); ! /* ! * report results to the stats collector, too. ! * An early terminated lazy_truncate_heap attempt ! * suppresses the message and also cancels the ! * execution of ANALYZE, if that was ordered. ! */ ! if (!vacrelstats-lock_waiter_detected) ! pgstat_report_vacuum(RelationGetRelid(onerel), onerel-rd_rel-relisshared, new_rel_tuples); + else + vacstmt-options = ~VACOPT_ANALYZE; /* and log the action if appropriate */ if (IsAutoVacuumWorkerProcess() Log_autovacuum_min_duration = 0) *** lazy_truncate_heap(Relation onerel, LVRe *** 1255,1334 BlockNumber old_rel_pages = vacrelstats-rel_pages; BlockNumber new_rel_pages; PGRUsage ru0; pg_rusage_init(ru0); /* ! * We need full exclusive lock on the relation in order to do truncation. ! * If we can't get it, give up rather than waiting --- we don't want to ! * block other backends, and we don't want to deadlock (which is quite ! * possible considering we already hold a lower-grade lock). ! */ ! if (!ConditionalLockRelation(onerel, AccessExclusiveLock)) ! return; ! ! /* ! * Now that
Re: [HACKERS] Proof of concept: auto updatable views [Review of Patch]
Dean Rasheed dean.a.rash...@gmail.com writes: Attached is a rebased patch using new OIDs. Applied after a fair amount of hacking. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PGCon 2013 - call for papers
PGCon 2013 will be on 23-24 May 2013 at University of Ottawa. This year, we are planning to have an un-conference day around PGCon. This is currently being scheduled. More information on the un-conference will be available within a few weeks. NOTE: the un-conference day content will be set on the day by those turning up on that day. We expect heavy attendance by developers and users of PostgreSQL. We are now accepting proposals for the main part of the conference (23-24 May). Proposals can be quite simple. We do not require academic-style papers. If you are doing something interesting with PostgreSQL, please submit a proposal. You might be one of the backend hackers or work on a PostgreSQL related project and want to share your know-how with others. You might be developing an interesting system using PostgreSQL as the foundation. Perhaps you migrated from another database to PostgreSQL and would like to share details. These, and other stories are welcome. Both users and developers are encouraged to share their experiences. Here are a some ideas to jump start your proposal process: - novel ways in which PostgreSQL is used - migration of production systems from another database - data warehousing - tuning PostgreSQL for different work loads - replication and clustering - hacking the PostgreSQL code - PostgreSQL derivatives and forks - applications built around PostgreSQL - benchmarking and performance engineering - case studies - location-aware and mapping software with PostGIS - The latest PostgreSQL features and features in development - research and teaching with PostgreSQL - things the PostgreSQL project could do better - integrating PostgreSQL with 3rd-party software Both users and developers are encouraged to share their experiences. The schedule is: 1 Dec 2012 Proposal acceptance begins 19 Jan 2013 Proposal acceptance ends 19 Feb 2013 Confirmation of accepted proposals NOTE: the call for lightning talks will go out very close to the conference. Do not submit lightning talks proposals until then. See also http://www.pgcon.org/2013/papers.php Instructions for submitting a proposal to PGCon 2013 are available from: http://www.pgcon.org/2013/submissions.php -- Dan Langille - http://langille.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: pgbench - aggregation of info written into log
Hi, attached is a v5 of this patch. Details below: On 8.12.2012 16:33, Andres Freund wrote: Hi Tomas, On 2012-11-27 14:55:59 +0100, Pavel Stehule wrote: attached is a v4 of the patch. There are not many changes, mostly some simple tidying up, except for handling the Windows. After a quick look I am not sure what all the talk about windows is about? instr_time.h seems to provide all you need, even for windows? The only issue of gettimeofday() for windows seems to be that it is that its not all that fast an not too high precision, but that shouldn't be a problem in this case? Could you expand a bit on the problems? As explained in the previous message, this is an existing problem with unavailable timestamp. I'm not very fond of adding Linux-only features, but fixing that was not the goal of this patch. * I had a problem with doc The current patch has conflict markers in the sgml source, there seems to have been some unresolved merge. Maybe that's all that causes the errors? Whats your problem with setting up the doc toolchain? Yeah, my fault because of incomplete merge. But the real culprit was a missing refsect2. Fixed. issues: * empty lines with invisible chars (tabs) + and sometimes empty lines after and before {} Fixed (I've removed the lines etc.) * adjustment of start_time + * the desired interval */ + while (agg-start_time + agg_interval INSTR_TIME_GET_DOUBLE(now)) + agg-start_time = agg-start_time + agg_interval; can skip one interval - so when transaction time will be larger or similar to agg_interval - then results can be strange. We have to know real length of interval Could you post a patch that adresses these issues? So, in the end I've rewritten the section advancing the start_time. Now it works so that when skipping an interval (because of a very long transaction), it will print lines even for those empty intervals. So for example with a transaction file containing a single query SELECT pg_sleep(1.5); and an interval length of 1 second, you'll get something like this: 1355009677 0 0 0 0 0 1355009678 1 1501028 2253085056784 1501028 1501028 1355009679 0 0 0 0 0 1355009680 1 1500790 2252370624100 1500790 1500790 1355009681 1 1500723 2252169522729 1500723 1500723 1355009682 0 0 0 0 0 1355009683 1 1500719 2252157516961 1500719 1500719 1355009684 1 1500703 2252109494209 1500703 1500703 1355009685 0 0 0 0 0 which is IMHO the best way to deal with this. I've fixed several minor issues, added a few comments. regards Tomas diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index e376452..5a03796 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -150,6 +150,7 @@ char *index_tablespace = NULL; #define naccounts 10 bool use_log;/* log transaction latencies to a file */ +intagg_interval; /* log aggregates instead of individual transactions */ bool is_connect; /* establish connection for each transaction */ bool is_latencies; /* report per-command latencies */ intmain_pid; /* main process id used in log filename */ @@ -245,6 +246,18 @@ typedef struct char *argv[MAX_ARGS]; /* command word list */ } Command; +typedef struct +{ + + longstart_time; /* when does the interval start */ + int cnt;/* number of transactions */ + double min_duration; /* min/max durations */ + double max_duration; + double sum;/* sum(duration), sum(duration^2) - for estimates */ + double sum2; + +} AggVals; + static Command **sql_files[MAX_FILES]; /* SQL script files */ static int num_files; /* number of script files */ static int num_commands = 0; /* total number of Command structs */ @@ -377,6 +390,10 @@ usage(void) -l write transaction times to log file\n --sampling-rate NUM\n fraction of transactions to log (e.g. 0.01 for 1%% sample)\n +#ifndef WIN32 +--aggregate-interval NUM\n + aggregate data over NUM seconds\n +#endif -M simple|extended|prepared\n protocol for submitting queries to server (default: simple)\n -n do not run VACUUM before tests\n @@ -830,9 +847,25 @@ clientDone(CState *st, bool ok) return false; /* always false */ } +static +void agg_vals_init(AggVals * aggs, instr_time start) +{ + /* basic counters */ + aggs-cnt = 0; /* number of transactions */ + aggs-sum =
Re: [HACKERS] too much pgbench init output
On 20.11.2012 08:22, Jeevan Chalke wrote: Hi, On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra t...@fuzzy.cz mailto:t...@fuzzy.cz wrote: On 19.11.2012 11:59, Jeevan Chalke wrote: Hi, I gone through the discussion for this patch and here is my review: The main aim of this patch is to reduce the number of log lines. It is also suggested to use an options to provide the interval but few of us are not much agree on it. So final discussion ended at keeping 5 sec interval between each log line. However, I see, there are two types of users here: 1. Who likes these log lines, so that they can troubleshoot some slowness and all 2. Who do not like these log lines. Who likes these lines / needs them for something useful? No idea. I fall in second category. But from the discussion, I believe some people may need detailed (or lot more) output. I've read the thread again and my impression is that no one really needs or likes those lines, but (1) it's rather pointless to print a message every 100k rows, as it usually fills the console with garbabe (2) it's handy to have regular updates of the progress I don't think there're people (in the thread) that require to keep the current amount of log messages. But there might be users who actually use the current logs to do something (although I can't imagine what). If we want to do this in a backwards compatible way, we should probably use a new option (e.g. -q) to enable the new (less verbose) logging. Do we want to allow both types of logging, or shall we keep only the new one? If both, which one should be the default one? So keeping these in mind, I rather go for an option which will control this. People falling in category one can set this option to very low where as users falling under second category can keep it high. So what option(s) would you expect? Something that tunes the interval length or something else? Interval length. Well, I can surely imagine something like --interval N. A switch that'd choose between the old and new behavior might be a good idea, but I'd strongly vote against automagic heuristics. It makes the behavior very difficult to predict and I really don't want to force the users to wonder whether the long delay is due to general slowness of the machine or some clever logic that causes long delays between log messages. That's why I choose a very simple approach with constant time interval. It does what I was aiming for (less logs) and it's easy to predict. Sure, we could choose different interval (or make it an option). I am preferring an option for choosing an interval, say from 1 second to 10 seconds. U, why not to allow arbitrary integer? Why saying 1 to 10 seconds? BTW, what if, we put one log message every 10% (or 5%) with time taken (time taken for last 10% (or 5%) and cumulative) over 5 seconds ? This will have only 10 (or 20) lines per pgbench initialisation. And since we are showing time taken for each block, if any slowness happens, one can easily find a block by looking at the timings and troubleshoot it. Though 10% or 5% is again a debatable number, but keeping it constant will eliminate the requirement of an option. That's what I originally proposed in September (see the messages from 17/9), and Alvaro was not relly excited about this. Attached is a patch with fixed whitespace / indentation errors etc. Otherwise it's the same as the previous version. Tomas diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index e376452..334ce4c 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -135,6 +135,11 @@ intunlogged_tables = 0; double sample_rate = 0.0; /* + * logging steps (seconds between log messages) + */ +intlog_step_seconds = 5; + +/* * tablespace selection */ char *tablespace = NULL; @@ -1362,6 +1367,11 @@ init(bool is_no_vacuum) charsql[256]; int i; + /* used to track elapsed time and estimate of the remaining time */ + instr_time start, diff; + double elapsed_sec, remaining_sec; + int log_interval = 1; + if ((con = doConnect()) == NULL) exit(1); @@ -1430,6 +1440,8 @@ init(bool is_no_vacuum) } PQclear(res); + INSTR_TIME_SET_CURRENT(start); + for (i = 0; i naccounts * scale; i++) { int j = i + 1; @@ -1441,10 +1453,27 @@ init(bool is_no_vacuum) exit(1); } - if (j % 10 == 0) - fprintf(stderr, %d of %d tuples (%d%%) done.\n, - j, naccounts * scale, -
Re: [HACKERS] parallel pg_dump
On Sat, Dec 8, 2012 at 3:05 PM, Bruce Momjian br...@momjian.us wrote: On Sat, Dec 8, 2012 at 11:13:30AM -0500, Andrew Dunstan wrote: I am working on it when I get a chance, but keep getting hammered. I'd love somebody else to review it too. FYI, I will be posting pg_upgrade performance numbers using Unix processes. I will try to get the Windows code working but will also need help. Just let me know if there's anything I can help you guys with. Joachim
Re: [HACKERS] Review: Patch to compute Max LSN of Data Pages
On Saturday, December 08, 2012 9:44 AM Tom Lane wrote: Amit kapila amit.kap...@huawei.com writes: On Friday, December 07, 2012 7:43 PM Muhammad Usama wrote: Although I am thinking why are you disallowing the absolute path of file. Any particular reason? The reason to disallow absolute path is that, we need to test on multiple platforms and to keep the scope little less. This argument seems to me to be completely nuts. What's wrong with an absolute path? Wouldn't you have to go out of your way to disallow it? There's nothing wrong with absolute path. I have updated the patch to work for absolute path as well. Updated patch attached with this mail. With Regards, Amit Kapila. pg_computemaxlsn_v4.patch Description: pg_computemaxlsn_v4.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
2012/12/7 Simon Riggs si...@2ndquadrant.com: On 5 December 2012 11:16, Kohei KaiGai kai...@kaigai.gr.jp wrote: * TRUNCATE works, and allows you to remove all rows of a table, even ones you can't see to run a DELETE on. Er... It was my oversight. My preference is to rewrite TRUNCATE command with DELETE statement in case when row-security policy is active on the target table. In this case, a NOTICE message may be helpful for users not to assume the table is always empty after the command. I think the default must be to throw an ERROR, since part of the contract with TRUNCATE is that it is fast and removes storage. OK. Does the default imply you are suggesting configurable behavior using GUC or something? I think both of the behaviors are reasonable from security point of view, as long as user cannot remove unprivileged rows. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
2012/12/7 Simon Riggs si...@2ndquadrant.com: On 5 December 2012 11:16, Kohei KaiGai kai...@kaigai.gr.jp wrote: Oracle defaults to putting VPD on all event types: INSERT, UPDATE, DELETE, SELECT. ISTM we should be doing the same, not just say we can add an INSERT trigger if you want. Adding a trigger just begs the question as to why we are bothering in the first place, since this functionality could already be added by INSERT, UPDATE or DELETE triggers, if they are a full replacement for this feature. The only answer is ease of use We can easily add syntax like this [ROW SECURITY CHECK ( ) [ON [ ALL | INSERT, UPDATE, DELETE, SELECT [.., with the default being ALL I think it is flaw of Oracle. :-) Agreed In case when user can define leakable function, it enables to leak contents of invisible rows at the timing when executor fetch the rows, prior to modification stage, even if we allows to configure individual row-security policies for SELECT and DELETE or UPDATE commands. My preference is one policy on a particular table for all the commands. Yes, only one security policy allowed. Question is, should we offer the option to enforce it on a subset of command types. That isn't anything I can see a need for myself. It is not hard to support a feature not to apply security policy on particular command types, from implementation perspective. So, my preference is to support only the behavior corresponding to above ALL option, then support per commands basis when we got strong demands. How about your thought? Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers