Re: [HACKERS] Graph datatype addition
Usually though, you'd be interested a large graphs which include information for lots of records (e.g., nodes are individual users, or products, or whatever). A graph datatype is not well suited for that, because it'd store each graph as a single value, and updating the graph would mean rewriting that whole value. If you're e.g. doing social network analysis, and each new edge between two users requires you to pull the whole graph from disk, update it, and write it back, you'll probably hit problems once you reach a few hundred users or so… Which really isn't a lot for that kind of application. Yes, I agree. Hence, I suggested keeping the nodes and links separate. So, if you need to add a new edge, you just need to update the adjacency lists. What I think will work is more of a 'logical' graph i.e. Consider a tuple in some table. To add it to a new/existing graph as a node, we just need to add an adjacency list for it. For each edge that connects the tuple to some other node, we add the corresponding entry in the list. Essentially, the idea is again, to separate the nodes and links, hence, the actual node is accessed pretty infrequently as compared to the links. Since they are separate, there is no need to pull out the entire graph for updating a single edge. This was the fundamental problem with my HStore based design. Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add sql_drop event for event triggers
Tom Lane t...@sss.pgh.pa.us writes: Applied with some further hacking. Thanks! Hmm, that leads me to wonder exactly how extensively the regression tests test event triggers, because it sure looked to me like there were multiple bugs left in this version. The first versions of the event triggers patch series used to include a large regression test files with some level of covering for each and every supported command. It was used this way because there was some command specific code at the time. It was apparently not testing the right things and was judged to be too much tests for no clear benefit, thus thrown away. Then with the never stopping refactoring of the patch series, no specific regression tests ever found their way back in. It's not clear to me what tests to add exactly, will find some time to read the current code in more details and figure out what's easy enough to cover and not covered already. Finally, I've been surprised to find out that those cases are only triggering for ddl_event_start (and not ddl_event_end), I think that's a bug we should be fixing: Agreed, I fixed it. Thanks! Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Graph datatype addition
Have you considered maybe ltree datatype? To me all described sounds solveable on pure sql way ( + ltree datatype to help with indexes and performance as materialised path to avoid recursive query all the time...) This may build over the existing solutions itself. I will have to check that. Though would be nice to see something new what would simplify the tasks... Exactly.A specialized data type with built in support for common operations will be helpful,IMHO. Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Graph datatype addition
I believe it would be best to largely separate the questions of storage and access. Partly because of Florian's concern that you'd frequently want only one representation of the whole graph, but also because the actual storage interface does NOT have to be user friendly if we have a good access layer. In particular, if rows had a low overhead, we'd probably just store graphs that way. That's obviously not the case in PG, so is there some kind of hybrid approach we could use? Perhaps sections of a graph could be stored with one piece of MVCC overhead per section? Yes, I agree. We can probably store some(persistent) data in a table,and the 'hot' parts of the datatype(like adjacency lists) could be stored locally in a data structure which allows fast and low overhead accesses and updates. I completely agree with separating storage and access layers. Supporting Florian's concern, I would also agree with your point of abstraction. Users should not concern themselves with backend storage details. That's why I think separating access from storage is going to be very important; if we do that up-front, we can change the storage latter as we get real experience with this. +1. Second, we should consider how much the access layer should build on WITH RECURSIVE and the like. Being able to detect specific use patterns of CTE/WITH RECURSIVE seems like it could add a lot of value; but I also worry that it's way to magical to be practical. I thought of a frequent access patterns mining tool recently. We can store the recent accesses and apply a pattern mining algorithm on that dataset(I was thinking of FP Growth algorithm). But again, it has its own set of issues and is a project in itself. Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The missing pg_get_*def functions
On 04/29/2013 11:47 PM, Joel Jacobson wrote: Note also that minor releases can readily fix bugs in C-language functions, but we have no infrastructure to update sql-language functions after initdb. That flexibility is unfortunate to lose, particularly for something that pg_dump depends on. Now, the right thing is probably to design a mechanism for applying simple catalog updates in concert with a minor release. In the mean time, its absence puts the sql PL at a nontrivial disadvantage here. What do you mean with infrastructure? Isn't it as simple as CREATE OR REPLACE FUNCTION? As long as the interface the pg_get_*def functions don't change, I cannot see how simply replacing the existing functions in a minor release upgrade could do any harm. Minor releases are supposed not to require any such operations. You should normally be able to drop the binaries in place and restart. For C language functions that is indeed all you have to do, but that's not the case for SQL language functions, where the definition is contained in the catalogs, not the binary. If all you want is SQL language functions, there is nothing to stop you from writing them and publishing them today as an extension. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
On 30 April 2013 06:57, Simon Riggs si...@2ndquadrant.com wrote: I'm about to light up the build farm with a trial commit of the compiler instructions stuff. Amazingly that seemed to work. ISTM that we also need this patch to put memory barriers in place otherwise the code might be rearranged. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services checksum_memory_barrier.v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] corrupt pages detected by enabling checksums
On 9 April 2013 08:36, Jeff Davis pg...@j-davis.com wrote: 1. I believe that the issue I brought up at the end of this email: http://www.postgresql.org/message-id/1365035537.7580.380.camel@sussancws0025 is a real issue. In lazy_vacuum_page(), the following sequence can happen when checksums are on: a. PageSetAllVisible b. Pass heap page to visibilitymap_set c. visibilitymap_set logs the heap page and sets the LSN d. MarkBufferDirty If a checkpoint happens between (c) and (d), then we have a problem. The fix is easy: just mark the heap buffer dirty first. There's another call site that looks like a potential problem, but I don't think it is. I simplified the code there to make it (hopefully) more clearly correct. Applied 2. A cleanup patch to pass the buffer_std flag down through MarkBufferDirtyHint. This is a matter of preference and purely cosmetic, so it might not be wanted. The reason I thought it was useful is that a future caller that sets a hint on a non-standard buffer might easily miss the assumption that we have a standard buffer. Skipped that for now. Do we really need it? Can you set a hint on a non-standard buffer? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remaining beta blockers
* Robert Haas (robertmh...@gmail.com) wrote: If you assume that people are going to modify files while the backend is running, nothing we do anywhere is safe. I agree that it's a bad idea and that people who do such things deserve what they get, but that doesn't mean it won't happen when people realize they can do it. And, really, it's all fun and games until someone writes a tool to do it... Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
On 2013-04-30 11:55:29 +0100, Simon Riggs wrote: ISTM that we also need this patch to put memory barriers in place otherwise the code might be rearranged. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services --- a/src/backend/storage/page/bufpage.c +++ b/src/backend/storage/page/bufpage.c @@ -960,11 +960,14 @@ PageCalcChecksum16(Page page, BlockNumber blkno) * Save pd_checksum and set it to zero, so that the checksum calculation * isn't affected by the checksum stored on the page. We do this to * allow optimization of the checksum calculation on the whole block - * in one go. + * in one go. Memory barriers are required to avoid rearrangement here. */ save_checksum = phdr-pd_checksum; + pg_memory_barrier(); phdr-pd_checksum = 0; + pg_memory_barrier(); checksum = checksum_block(page, BLCKSZ); + pg_memory_barrier(); phdr-pd_checksum = save_checksum; /* mix in the block number to detect transposed pages */ Why? I am not sure which rearrangement you're fearing? In all cases where there is danger of concurrent write access to the page we should already be working on a copy? Also, if we need a memory barrier I can only see a point in the 2nd one. The first and third shouldn't ever be able to change anything since we are only writing to local memory? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
On Tue, Apr 30, 2013 at 1:55 PM, Simon Riggs si...@2ndquadrant.com wrote: On 30 April 2013 06:57, Simon Riggs si...@2ndquadrant.com wrote: I'm about to light up the build farm with a trial commit of the compiler instructions stuff. Amazingly that seemed to work. Thanks for committing. Sorry about missing the .h file from the patch. The two commits look good to me. I can confirm that compiling with CFLAGS=-O2 -march=native will vectorize the committed code on GCC 4.7. I also checked the situation on clang. clang-3.2 isn't able to vectorize the loop even with vectorization options. I will check what is stopping it. If any volunteer has a working build setup with ICC or MSVC and is willing to run a couple of test compiles, I think we can achieve vectorization there too. ISTM that we also need this patch to put memory barriers in place otherwise the code might be rearranged. The compiler and CPU both have to preserve correctness when rearranging code, so I don't think we care about it here. It might matter if these routine could be called concurrently by multiple backends for a single buffer, but in that case memory barriers won't be enough, we'd need full exclusion. Regards, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remaining beta blockers
Robert Haas robertmh...@gmail.com wrote: Kevin Grittner kgri...@ymail.com wrote: The hysteria and FUD about using the currently-supported technique for unlogged tables to implement unlogged matviews are very discouraging. And the notion that we would release a matview feature which allowed false results (in the form of treating a never-populated matview as a legal empty matview) is truly scary to me. I think that labeling other people's concerns as hysteria and FUD is not something which will advance the debate. The point of that language is that charged language which has nothing to do with reality is already being thrown around; by all means let's get back to constructive discussion. If there is some particular problem someone sees, I don't think it has been expressed yet, which makes it impossible to address, unless you count the assertion that *if* we had chosen a zero-length heap to represent a table with valid data we *would* have a problem? Let's look at the corner this supposedly paints us into. If a later major release creates a better mechanism, current pg_dump and load will already use it, based on the way matviews are created empty and REFRESHed by pg_dump. Worst case, we need to modify the behavior of pg_dump running with the switch used by pg_upgrade to use a new ALTER MATERIALIZED VIEW SET (populated); (or whatever syntax is chosen) -- a command we would probably want at that point anyway. I'm not seeing cause for panic here. What is a real problem or risk with using this mechanism until we engineer something better? What problems with converting to a later major release does anyone see? If we can't tell the difference between those two things, I don't think we should be releasing the feature. So, speaking of hysteria... Yeah, I know you don't get it, but as a DBA I would never have allowed a feature which could quietly generate false results to be used -- which is exactly what you have without a way to differentiate these cases. If it comes down to a choice between a performance tweak like unlogged matviews and an issue of correctness of results, I will pick correctness. Now, as I've already said, this tweak is significant (I expect it will make matviews useful in roughly twice as many cases), but it is just a performance tweak. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The missing pg_get_*def functions
On Tue, Apr 30, 2013 at 11:22 AM, Andrew Dunstan and...@dunslane.net wrote: Minor releases are supposed not to require any such operations. You should normally be able to drop the binaries in place and restart. For C language functions that is indeed all you have to do, but that's not the case for SQL language functions, where the definition is contained in the catalogs, not the binary. Minor releases don't even require pg_dump/pg_restore, so I really don't see the issue. Nothing in the versioning policy* appears to be in conflict with fixing bugs in pg_catalog sql-language functions. Maybe it's an unwritten rule minor versions mustn't affect the pg_catalog. I thought the rule was to require no data relational modifications of pg_catalog between minor versions, but simply modifying or adding functions, without breaking their interfaces, is not a modification of the pg_catalog as I see it, as it couldn't affect any old code, since the tables and columns would be untouched. Old code couldn't be affected of new functions, or the inner-details of existing pg_catalog functions, as long as they don't change the interface, i.e., the IN/OUT parameters. I would actually want to go a bit further and propose to move the sql queries within the else if (fout-remoteVersion = ...) { ... } statements also for _old_ versions, and put them in sql-language functions, which, if you would use a future version of pg_dump to dump an old version, would install sql-language functions for the old version in pg_catalog, so they could be called by the new pg_dump. That way we could get rid of these kind of sql queries for _all_ versions, and not just for future versions. These functions would need to adhere to exactly the same interface, where the OUT params must fit perfectly with the args passed to ArchiveEntry(). The existing pg_get_*def() are not comprehensive enough, as they still require a lot of sql code to extract various other necessary fields. *) http://www.postgresql.org/support/versioning/ If all you want is SQL language functions, there is nothing to stop you from writing them and publishing them today as an extension. Yes, maybe I will, as a proof-of-concept and test of how complex or simple it would be, and how many lines of code pg_dump.c could be reduced with. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
On 30 April 2013 12:23, Ants Aasma a...@cybertec.at wrote: ISTM that we also need this patch to put memory barriers in place otherwise the code might be rearranged. The compiler and CPU both have to preserve correctness when rearranging code, so I don't think we care about it here. It might matter if these routine could be called concurrently by multiple backends for a single buffer, but in that case memory barriers won't be enough, we'd need full exclusion. Certainly happy not to commit anything else... Thanks to Ants and Andres. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The missing pg_get_*def functions
On 04/30/2013 07:34 AM, Joel Jacobson wrote: On Tue, Apr 30, 2013 at 11:22 AM, Andrew Dunstan and...@dunslane.net wrote: Minor releases are supposed not to require any such operations. You should normally be able to drop the binaries in place and restart. For C language functions that is indeed all you have to do, but that's not the case for SQL language functions, where the definition is contained in the catalogs, not the binary. Minor releases don't even require pg_dump/pg_restore, so I really don't see the issue. You have already been told what the issue is. There is no provision for updating the catalogs. Our users expect to be able to drop a minor upgrade binary in place and have nothing more to do. It's what they are used to and I at least have no interest in changing that. If you change the definition of provided SQL language functions you would need to apply those changes to each database in each cluster. Otherwise, a user will think they are on version X which has a fix for function Y and in fact they won't have the fix. That's a recipe for utter confusion. All the existing pg_catalog SQL functions are one liners, apart from ts_debug(). If all you want is SQL language functions, there is nothing to stop you from writing them and publishing them today as an extension. Yes, maybe I will, as a proof-of-concept and test of how complex or simple it would be, and how many lines of code pg_dump.c could be reduced with. pg_dump does bulk operations in many cases, and one way we could make it faster would be to increase that, not reduce it (see comments where we get table attributes in pg_dump.c, for example). Providing singleton operations like this will not help it at all. If your aim is to be able to replace all the code pg_dump runs by these singleton operations then I think it's almost certainly doomed to failure. Having undertaken some of the exercise, I can also assure you that writing pg_get_table_def() in a single pure SQL statement will be challenging. There is a case for having functions like these, but probably not for use by pg_dump, and I suspect they would best be done in C. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Graph datatype addition
1. We created JSON and XML types as ways of storing data that has a robust validation system. They're still, in a sense, just plain old text, but it's plain old text that the user can be certain satisfies the respective rules for representations. Yes, although, I feel that we can use JSON to port a graph's data out of the database. I suspect that the best answer is NOT one where a graph is represented as a value in a table; that has the implication that modifying The Graph requires altering a single tuple, and that seems likely to become a horrible bottleneck. I'm suspicious that using HSTORE leads down that path, where you'll have a database that has a table with just one tuple in it, and where it's nearly impossible to alter that tuple. I agree, hence, I have dropped the idea of building it completely over HStore. I'm having a hard time thinking about what it looks like to have a graph as table except to effectively compose the graph as a set of nodes, one tuple per node, and I'm not sure that a new data type has anything to contribute to that. I can think of three points: 1) We can maintain pointers to tuples which are nodes in a graph instead of copying over the actual data. This idea is based on the assumption that node data access will be very specific i.e. accessing actual data of nodes happens very less, for most analysis we are more interested in the linkage. 2) In continuation of the above point, the more frequently accessed areas are the link details, or in my current plan, the adjacency lists. This could be maintained in a data structure by the data type, with low overheads of updating, traversal and insertion. 3) The data type shall have in built support for common operations on graphs. The user can call them directly and hence shall be saved the overhead of designing his own data structures and functions. The one place where I *could* see a special type having a contribution is for there to be a data type that can contain an arbitrary number of links. That means you have one tuple per node, and, instead of needing a tuple for each link between nodes, you have one attribute indicating *all* the links. (And interesting is for that one attribute to be usable for foreign key purposes.) That has a hard time scaling in cases where nodes are over-connected, which is, broadly speaking, an acceptable sort of scenario. Yes, that one attribute can be the adjacency set of adjacency lists. For efficient access, we can store pointers to adjacency lists in hash maps,with node identifier as keys(maybe). Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] corrupt pages detected by enabling checksums
On Tue, Apr 30, 2013 at 6:58 AM, Simon Riggs si...@2ndquadrant.com wrote: On 9 April 2013 08:36, Jeff Davis pg...@j-davis.com wrote: 1. I believe that the issue I brought up at the end of this email: http://www.postgresql.org/message-id/1365035537.7580.380.camel@sussancws0025 is a real issue. In lazy_vacuum_page(), the following sequence can happen when checksums are on: a. PageSetAllVisible b. Pass heap page to visibilitymap_set c. visibilitymap_set logs the heap page and sets the LSN d. MarkBufferDirty If a checkpoint happens between (c) and (d), then we have a problem. The fix is easy: just mark the heap buffer dirty first. There's another call site that looks like a potential problem, but I don't think it is. I simplified the code there to make it (hopefully) more clearly correct. Applied Uh, wait a minute. I think this is completely wrong. The buffer is LOCKED for this entire sequence of operations. For a checkpoint to happen, it's got to write every buffer, which it will not be able to do for so long as the buffer is locked. The effect of the change to lazy_scan_heap is to force the buffer to be written even if we're only updating the visibility map page. That's a bad idea and should be reverted. The change to lazy_vacuum_page is harmless, but the comment is wrong. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Graph datatype addition
actually, i was going to suggest to Atri to take a look at that, pgrouting is currently in develop of v2.0 which will rewrite some parts (including some of the algorithms). Maybe this datatype could fit better as part of pgrouting Thanks, I will have a look at pgRouting.Although, in a short look, I feel that the purposes of pgrouting and the proposed data type are different, even though they support similar functions. One good idea can be to refer to pgRouting's functions when implementing similar functions for the proposed datatype. Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remaining beta blockers
Kevin, * Kevin Grittner (kgri...@ymail.com) wrote: If there is some particular problem someone sees, I don't think it has been expressed yet, which makes it impossible to address, unless you count the assertion that *if* we had chosen a zero-length heap to represent a table with valid data we *would* have a problem? The problem which people see is that this approach isn't a good idea and will lead down a road to ugly hacks to make things work correctly. It's not a question about code correctness- it's a question about the design. Those are two very different things but both need to be considered. I'm not seeing cause for panic here. Speaking to overcharged words, I don't see any 'panic' here but rather a strong disagreement between individuals over this design. Actually, I'm not even sure there is disagreement about there being a better design for this- it sounds like everyone agrees that there is. The question boils down to do we ship it with an inferior design, or hold off and do it right the first time. What is a real problem or risk with using this mechanism until we engineer something better? What problems with converting to a later major release does anyone see? Various examples have been articulated and you've worked through specific high-level designs for how to deal with those, which is great, but design isn't about just those things which you can forsee and predict, it's about ensuring flexibility is there for those things that you don't predict. If we can't tell the difference between those two things, I don't think we should be releasing the feature. So, speaking of hysteria... Yeah, I know you don't get it, but as a DBA I would never have allowed a feature which could quietly generate false results to be used -- which is exactly what you have without a way to differentiate these cases. It also happens to be what you can end up having with unlogged tables already.. We ran into exactly that issue and decided that we'd be better off not using them (for now anyway) even for things which really should be just transient data sets. If it comes down to a choice between a performance tweak like unlogged matviews and an issue of correctness of results, I will pick correctness. Now, as I've already said, this tweak is significant (I expect it will make matviews useful in roughly twice as many cases), but it is just a performance tweak. And, I think anyway, I agree w/ you (Kevin) here- we should really have a way to tell the difference between no-data-in-query-result and never-populated. I'd like more options than just a big ERROR result happening when it's never been populated, but that's a different discussion. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] corrupt pages detected by enabling checksums
On 30 April 2013 13:34, Robert Haas robertmh...@gmail.com wrote: On Tue, Apr 30, 2013 at 6:58 AM, Simon Riggs si...@2ndquadrant.com wrote: On 9 April 2013 08:36, Jeff Davis pg...@j-davis.com wrote: 1. I believe that the issue I brought up at the end of this email: http://www.postgresql.org/message-id/1365035537.7580.380.camel@sussancws0025 is a real issue. In lazy_vacuum_page(), the following sequence can happen when checksums are on: a. PageSetAllVisible b. Pass heap page to visibilitymap_set c. visibilitymap_set logs the heap page and sets the LSN d. MarkBufferDirty If a checkpoint happens between (c) and (d), then we have a problem. The fix is easy: just mark the heap buffer dirty first. There's another call site that looks like a potential problem, but I don't think it is. I simplified the code there to make it (hopefully) more clearly correct. Applied Uh, wait a minute. I think this is completely wrong. Thanks for your review. The buffer is LOCKED for this entire sequence of operations. For a checkpoint to happen, it's got to write every buffer, which it will not be able to do for so long as the buffer is locked. I was following the logic we use for WAL: mark the buffer dirty, then write WAL. The important thing is for us to signal that the buffer should be written, which we do by marking it dirty. The actual writing of the buffer comes later, often minutes later. The effect of the change to lazy_scan_heap is to force the buffer to be written even if we're only updating the visibility map page. That's a bad idea and should be reverted. OK, agreed. Will wait for this discussion to end before acting though, so I'm sure exactly what you mean. The change to lazy_vacuum_page is harmless, but the comment is wrong. In what way, wrong? What should it say? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The missing pg_get_*def functions
Tom, all, * Tom Lane (t...@sss.pgh.pa.us) wrote: It may be that the functions Joel proposes are worth having for other tools to use, but I'm not in favor of making pg_dump use them. I care very little about migrating the pg_dump functionality into server-side functions and a great deal about *having* such functionality available server-side. We've been around this time and time again: there are use-cases beyond pg_dump for being able to get the definition of an object. Having to first figure out and then replicate what pg_dump (or psql) does is no trivial feat. In short, I think I agree w/ Tom here (though I contend that there *are* such use-cases, not that 'it may be'.. :). Let's drop the discussion about changing pg_dump and/or psql and instead simply go implement these functions. We can revisit the pg_dump discussion 5 or 10 years down the road, after we've seen how well these functions work and what uses they are put to outside of our (relatively small) world. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] The missing pg_get_*def functions
* Robert Haas (robertmh...@gmail.com) wrote: On Mon, Apr 29, 2013 at 7:58 PM, Tom Lane t...@sss.pgh.pa.us wrote: One of the things that we frequently recommend when doing upgrades is that you do the dump with the newer version's pg_dump, so as to get the benefits of any bug fixes that are in it. The more dump functionality is on the server side, the less opportunity we have to repair things that way. But why wouldn't we be able to fix the version in the server, if it turns out to be buggy? I suppose we wouldn't fix bugs discovered after EOL, but I'm not sure that's a sufficient objection. There are other things beyond bugs here.. Changes in reserved keywords is actually the biggest reason, ime, to use the newer pg_dump when you're trying to move to a newer PG version. I don't think we'd want to either go to quoteing everything (yuck), or having a point release suddenly change what gets quoted and what doesn't in a pg_dump.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] The missing pg_get_*def functions
* Joel Jacobson (j...@trustly.com) wrote: On Tue, Apr 30, 2013 at 12:58 AM, Tom Lane t...@sss.pgh.pa.us wrote: That alone would probably be sufficient reason why we would never allow pg_dump to depend on any such thing (not that I see a compelling argument for it to do so anyway...). It would be better to find a way to update sql-language functions in minor upgrades, instead of shutting that door entirely for all future implementation ideas involving sql-language functions in the pg_catalog. Go for it? :) I don't think you'll find much disagreement here, but it's no trivial thing to do either.. Perhaps some kind of one-time startup script that gets run? Or maybe an internal identifier that tracks the version of the catalog and runs a script which updates it on first start when there are changes to the catalog? But what about any dependencies which exist? Are we confident enough in the existing infrastructure for 'create-or-replace' that, if we don't change the API, it'll just replace the contents and we won't hurt user objects? The long and the short of it here is that there isn't any very good reason to migrate any of the existing pg_dump-side functionality into server-side functions, and especially not server-side functions that aren't in C. One of the things that we frequently recommend when doing upgrades is that you do the dump with the newer version's pg_dump, so as to get the benefits of any bug fixes that are in it. The more dump functionality is on the server side, the less opportunity we have to repair things that way. There are two very good reasons in particular of doing exactly that: [... good reasons to have pg_dump-like capability in the server ...] There's a difference between 'migrate' and 'implement' that I'm not sure you saw.. I agree w/ Tom that there's probably no really good reason to *remove* the existing code in pg_dump. Adding code which is similar to the server is quite a different thing and, as Tom commented later, might be useful for other tools to use. b) Reduce the amount of hard-coded sql queries in pg_dump.c. Unless you're hacking on pg_dump quite a bit (which, if you are, you probably need to understand/see the queries that it's making anyway), I don't see why this is actually a problem..? I'm thinking that you've simply had to be in the guts of pg_dump enough, as some others of us have also, that you grow tired of having to extract out the useful bits from it. If those bits are replicated as clean functions in the backend, you won't need to be groking around in pg_dump anymore.. :) Won't help for previous versions, but perhaps the sql queries to support older versions can be hidden in some separately included file at least so you don't have to see it, which would make sense since the sql for the old versions won't change and won't need the same amount of active development as the code for the latest version. Makes no sense of keeping the sql for all versions at the same place. I'd be all for a rework of pg_dump which more cleanly sets up the SQL for each server version- feel free to work on such. :) It may be that the functions Joel proposes are worth having for other tools to use, but I'm not in favor of making pg_dump use them. I don't agree with you as I don't think it's a very hard problem to deal with updates of sql-language functions in the pg_catalog schema. The above doesn't really seem like a response to the piece quoted, so I'll respond to just what you wrote- I agree. Let's figure out a way to update sql-language functions. Once we have that, it'll certainly open up the door to having more of them. I don't think we can punt on this as a well, we'll do it when we need it; we've needed it in the past (as I recall, we've had to tell people in the release notes to hack the catalog for some issue or another at least once in most major revs..) and it'd be great to have the problem solved. If we fix that first, this and a whole range of problems, where the solution involves taking in-data from pg_catalog, and producing out-data based on that, becomes super easy to achieve by writing simple sql-language functions with direct and simple access to the pg_catalog, instead of writing C functions with inline hard-coded sql. Sounds fantastic. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] The missing pg_get_*def functions
On Tue, Apr 30, 2013 at 2:12 PM, Stephen Frost sfr...@snowman.net wrote: Go for it? :) I don't think you'll find much disagreement here, but it's no trivial thing to do either.. Perhaps some kind of one-time startup script that gets run? Or maybe an internal identifier that The above doesn't really seem like a response to the piece quoted, so I'll respond to just what you wrote- I agree. Let's figure out a way to update sql-language functions. Once we have that, it'll certainly open up the door to having more of them. I don't think we can punt on this as a well, we'll do it when we need it; we've needed it in the past (as I recall, we've had to tell people in the release notes to hack the catalog for some issue or another at least once in most major revs..) and it'd be great to have the problem solved. Sounds fantastic. Thanks for the motivational speech! :) I'll start working on it today! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The missing pg_get_*def functions
* Joel Jacobson (j...@trustly.com) wrote: Thanks for the motivational speech! :) I'll start working on it today! Great, but you should really come up with an initial design and get feedback on it before you start coding up something. :) I've outlined a few very, very high-level ideas about what could be done, but there's a *lot* of details that would need to be worked out here. You might also ask some of the packagers (eg: Martin Pitt w/ Ubuntu/Debian) about how they've handled point-upgrades which have required catalog hackery. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Remaining beta blockers
Could we please stop the ad-hominem stuff from all sides? We want to solve the issue not to make it bigger. On 2013-04-30 04:29:26 -0700, Kevin Grittner wrote: Let's look at the corner this supposedly paints us into. If a later major release creates a better mechanism, current pg_dump and load will already use it, based on the way matviews are created empty and REFRESHed by pg_dump. Worst case, we need to modify the behavior of pg_dump running with the switch used by pg_upgrade to use a new ALTER MATERIALIZED VIEW SET (populated); (or whatever syntax is chosen) -- a command we would probably want at that point anyway. I'm not seeing cause for panic here. I don't think panic is appropriate either, but I think there are some valid concerns around this. 1) vacuum can truncate the table to an empty relation already if there is no data returned by the view's query which then leads to the wrong scannability state. S1: CREATE MATERIALIZED VIEW matview_empty AS SELECT false WHERE random() -1; S2: S2: SELECT * FROM matview_empty; -- works S1: VACUUM matview_empty; S2: SELECT * FROM matview_empty; -- still works S3: SELECT * FROM matview_empty; -- errors out So we need to make vacuum skip cleaning out the last page. Once we get incrementally updated matviews there are more situations to get into this than just a query not returning anything. I remember this being discussed somewhere already, but couldn't find it quickly enough. Imo this is quite an indicator that the idea of using the filesize is just plain wrong. Adding logic to vacuum not to truncate data because its a flag for matview scannability is quite the layer violation and a sign for a bad design decision. Such a hack has already been added to copy_heap_data(), while not as bad, shows that it is hard to find all the places where we need to add it. 2) Since we don't have a metapage to represent scannability in 9.3 we cannot easily use one in 9.4+ without pg_upgrade emptying all matviews unless we only rely on the catalogs which we currently cannot. This will either slow down development or make users unhappy. Alternatively we can add yet another fork, but that has its price (quite a bit more open files during normal operation, slower CREATE DATABASE). This is actually an argument for not releasing matviews without such an external state. Going from disk-based state to catalog is easy, the other way round: not so much. 3) Using the filesize as a flag will make other stuff like preallocating on-disk data in bigger chunks and related things more complicated. 4) doing the check for scannability in the executor imo is a rather bad idea. Note that we e.g. don't see an error about a matview which won't be scanned because of constraint exclusion or one-time filters. S1: CREATE MATERIALIZED VIEW matview_unit_false AS SELECT false WHERE false WITH NO DATA; S1: SELECT * FROM matview_unit_false; You can get there in far more reasonable cases than WHERE false. 5) I have to agree with Kevin that the scannability is an important thing to track though. a) we cannot remove support for WITH NO DATA because of pg_dump order and unlogged relations. So even without unlogged relations the problem exists although its easier to represent. b) Just giving wrong responses is bad [tm]. Outdated data is something completely different (it has existed in that state in the (near) past) than giving an empty response (might never have been a visible state, and more likely not so in any reasonably near past). Consider an application behind a pooler suddently getting an empty response from a SELECT * FROM unlogged_matview; It won't notice anything without a unscannable state since it probably won't even notice the database restart. Not sure what the consequence of this is. The most reasonable solution seems to be to introduce a metapage somewhere *now* which sucks, but ... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The missing pg_get_*def functions
On 2013-04-30 05:14:15 +0100, Joel Jacobson wrote: On Tue, Apr 30, 2013 at 12:58 AM, Tom Lane t...@sss.pgh.pa.us wrote: Noah Misch n...@leadboat.com writes: Note also that minor releases can readily fix bugs in C-language functions, but we have no infrastructure to update sql-language functions after initdb. That flexibility is unfortunate to lose, particularly for something that pg_dump depends on. That alone would probably be sufficient reason why we would never allow pg_dump to depend on any such thing (not that I see a compelling argument for it to do so anyway...). It would be better to find a way to update sql-language functions in minor upgrades, instead of shutting that door entirely for all future implementation ideas involving sql-language functions in the pg_catalog. I'd be very careful with jumping on this task. I am pretty sure its a very good way to get very, very frustrated if you don't present a widely accepted design beforehand. Doing this correctly is *far far* from easy. Just a little collection of problems: * You need to connect to all databases, not just one. There's no infrastructure for this. * You need to do the update *before* allowing any external connections. Otherwise the feature won't be useful to fix actual problems. Again, there is no infrastructure for this. * You need to do it in a way that a) doesn't slow down normal startup b) doesn't break if the update has only been applied to of 1 databases. FWIW I really, really doubt you can do all the functions referred to upthread sensibly and correctly from SQL functions. The infrastructure to do this just isn't there. Besides, I really don't buy this helping pg_dump. I think the far more realistic course here is to put some parts that are required by pg_dump and by such functions in some common library that then can *also* be used by such backend functions. But thats a humongous task that, besides deep technical knowledge in lots of areas, requires quite a bit of politics. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remaining beta blockers
On Tue, Apr 30, 2013 at 7:29 AM, Kevin Grittner kgri...@ymail.com wrote: Let's look at the corner this supposedly paints us into. If a later major release creates a better mechanism, current pg_dump and load will already use it, based on the way matviews are created empty and REFRESHed by pg_dump. Worst case, we need to modify the behavior of pg_dump running with the switch used by pg_upgrade to use a new ALTER MATERIALIZED VIEW SET (populated); (or whatever syntax is chosen) -- a command we would probably want at that point anyway. I'm not seeing cause for panic here. What is a real problem or risk with using this mechanism until we engineer something better? What problems with converting to a later major release does anyone see? Well, it's a pg_upgrade hazard, if nothing else, isn't it? Yeah, I know you don't get it, but as a DBA I would never have allowed a feature which could quietly generate false results to be used -- which is exactly what you have without a way to differentiate these cases. If it comes down to a choice between a performance tweak like unlogged matviews and an issue of correctness of results, I will pick correctness. Now, as I've already said, this tweak is significant (I expect it will make matviews useful in roughly twice as many cases), but it is just a performance tweak. Sure, I wouldn't allow that either. My point is that I feel that could be engineered around in user space. If you have a materialized view which could legitimately be empty (there are many for which that won't be an issue), then you can either arrange the view definition so that it isn't (e.g. by including a dummy record that clients can look for), or you can include a sentinel unlogged table that will contain a row if and only if materialized views have been refreshed since the last crash. In the examples I can think of, indefinitely-stale-but-valid-at-some-point wouldn't be very good either, so I would anticipate needing to do some engineering around relative freshness anyway - e.g. keeping a side table that lists the last-refreshed-time for each matview. Or, maybe I'd wouldn't care about tracking elapsed time per se, but instead want to track freshness relative to updates - e.g. set things up so that anyone who performs an action that would invalidate a row in the materialized view would also update a row someplace that would allow me to identify the row as stale. In either case, handling the case where the view is altogether invalid doesn't seem to add a whole lot of additional complexity. Now, I can imagine cases where it does. For example, suppose you have a cron job (which you trust to work) to refresh your materialized views every night. Well, that means that you'll never be more than 24 hours stale - but if any of those materialized views are unlogged, that also means that you could have no data at all for up to 24 hours following a crash. Not great, because now you need some logic to handle just that one case that wouldn't be necessary if the DB did it for you. But I just think it's a judgement call how serious one thinks that scenario is, vs. any other scenario where a boolean isn't adequate either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remaining beta blockers
Andres Freund and...@2ndquadrant.com wrote: On 2013-04-30 04:29:26 -0700, Kevin Grittner wrote: Let's look at the corner this supposedly paints us into. If a later major release creates a better mechanism, current pg_dump and load will already use it, based on the way matviews are created empty and REFRESHed by pg_dump. Worst case, we need to modify the behavior of pg_dump running with the switch used by pg_upgrade to use a new ALTER MATERIALIZED VIEW SET (populated); (or whatever syntax is chosen) -- a command we would probably want at that point anyway. I'm not seeing cause for panic here. I don't think panic is appropriate either, but I think there are some valid concerns around this. 1) vacuum can truncate the table to an empty relation already if there is no data returned by the view's query which then leads to the wrong scannability state. S1: CREATE MATERIALIZED VIEW matview_empty AS SELECT false WHERE random() -1; S2: S2: SELECT * FROM matview_empty; -- works S1: VACUUM matview_empty; S2: SELECT * FROM matview_empty; -- still works S3: SELECT * FROM matview_empty; -- errors out So we need to make vacuum skip cleaning out the last page. Once we get incrementally updated matviews there are more situations to get into this than just a query not returning anything. I remember this being discussed somewhere already, but couldn't find it quickly enough. Yeah, I posted a short patch earlier on this thread that fixes that. Nobody has commented on it, and so far I have not committed anything related to this without posting details and giving ample opportunity for anyone to comment. If nobody objects, I can push that, and this issue is gone. Imo this is quite an indicator that the idea of using the filesize is just plain wrong. Adding logic to vacuum not to truncate data because its a flag for matview scannability is quite the layer violation and a sign for a bad design decision. Such a hack has already been added to copy_heap_data(), while not as bad, shows that it is hard to find all the places where we need to add it. I agree, but if you review the thread I started with a flag in pg_class, I tried using the special area of the first page of the heap, and finally wound up with the current technique based on several people reviewing the patch and offering opinions and reaching an apparent consensus that this was the least evil way to do it given current infrastructure for unlogged tables. This really is a result of trying to preserver *correct* behavior while catering to those emphasizing how important the performance of unlogged matviews is. 2) Since we don't have a metapage to represent scannability in 9.3 we cannot easily use one in 9.4+ without pg_upgrade emptying all matviews unless we only rely on the catalogs which we currently cannot. I am not following this argument at all. Doesn't pg_upgrade use pg_dump to create the tables and matviews WITH NO DATA and take later action for ones which are populated in the source? It would not be hard at all to move to a new release which used a different technique for tracking populated tables by changing what pg_dump does for populated tables with the switch pg_upgrade uses. This will either slow down development or make users unhappy. Alternatively we can add yet another fork, but that has its price (quite a bit more open files during normal operation, slower CREATE DATABASE). This is actually an argument for not releasing matviews without such an external state. Going from disk-based state to catalog is easy, the other way round: not so much. I am not seeing this at all. Given my comment above about how this works for pg_upgrade and pg_dump, can you explain how this is a problem? 3) Using the filesize as a flag will make other stuff like preallocating on-disk data in bigger chunks and related things more complicated. Why? You don't need to preallocate for a non-populated matview, since its heap will be replaced on REFRESH. You would not *want* to preallocate a lot of space for something guaranteed to remain at zero length until deleted. 4) doing the check for scannability in the executor imo is a rather bad idea. Note that we e.g. don't see an error about a matview which won't be scanned because of constraint exclusion or one-time filters. S1: CREATE MATERIALIZED VIEW matview_unit_false AS SELECT false WHERE false WITH NO DATA; S1: SELECT * FROM matview_unit_false; You can get there in far more reasonable cases than WHERE false. I am a little concerned about that, but the case you show doesn't demonstrate a problem: test=# CREATE MATERIALIZED VIEW matview_unit_false AS SELECT false WHERE false WITH NO DATA; SELECT 0 test=# SELECT * FROM matview_unit_false; ERROR: materialized view matview_unit_false has not been populated HINT: Use the REFRESH MATERIALIZED VIEW command. The view was defined WITH NO DATA
Re: [HACKERS] Remaining beta blockers
Robert Haas robertmh...@gmail.com wrote: Kevin Grittner kgri...@ymail.com wrote: Let's look at the corner this supposedly paints us into. If a later major release creates a better mechanism, current pg_dump and load will already use it, based on the way matviews are created empty and REFRESHed by pg_dump. Worst case, we need to modify the behavior of pg_dump running with the switch used by pg_upgrade to use a new ALTER MATERIALIZED VIEW SET (populated); (or whatever syntax is chosen) -- a command we would probably want at that point anyway. I'm not seeing cause for panic here. What is a real problem or risk with using this mechanism until we engineer something better? What problems with converting to a later major release does anyone see? Well, it's a pg_upgrade hazard, if nothing else, isn't it? I don't think so. What do you see as a problem? Sure, I wouldn't allow that either. My point is that I feel that could be engineered around in user space. If you have a materialized view which could legitimately be empty (there are many for which that won't be an issue), then you can either arrange the view definition so that it isn't (e.g. by including a dummy record that clients can look for), or you can include a sentinel unlogged table that will contain a row if and only if materialized views have been refreshed since the last crash. In the examples I can think of, indefinitely-stale-but-valid-at-some-point wouldn't be very good either, so I would anticipate needing to do some engineering around relative freshness anyway - e.g. keeping a side table that lists the last-refreshed-time for each matview. Or, maybe I'd wouldn't care about tracking elapsed time per se, but instead want to track freshness relative to updates - e.g. set things up so that anyone who performs an action that would invalidate a row in the materialized view would also update a row someplace that would allow me to identify the row as stale. In either case, handling the case where the view is altogether invalid doesn't seem to add a whole lot of additional complexity. Now, I can imagine cases where it does. For example, suppose you have a cron job (which you trust to work) to refresh your materialized views every night. Well, that means that you'll never be more than 24 hours stale - but if any of those materialized views are unlogged, that also means that you could have no data at all for up to 24 hours following a crash. Not great, because now you need some logic to handle just that one case that wouldn't be necessary if the DB did it for you. But I just think it's a judgement call how serious one thinks that scenario is, vs. any other scenario where a boolean isn't adequate either. Staleness is a completely different issue, in my view, from quietly returning results that are not, and never were, accurate. Sure we need to implement more refined scannability tests than whether valid data from *some* point in time is present. But that should always be *part* of the scannability testing, and without it I don't feel we have a feature of a quality suitable for delivery. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The missing pg_get_*def functions
Andrew Dunstan and...@dunslane.net writes: If all you want is SQL language functions, there is nothing to stop you from writing them and publishing them today as an extension. It's worth noting also that we actually *have* infrastructure for updating extensions without initdb; unlike the initial contents of pg_proc. So this approach is more attractive than it might seem on its face, assuming you are going to do this as SQL functions. (I share the doubts expressed elsewhere as to how feasible that actually is.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
Simon Riggs si...@2ndquadrant.com writes: ISTM that we also need this patch to put memory barriers in place otherwise the code might be rearranged. This is simply silly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
On 30 April 2013 15:51, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: ISTM that we also need this patch to put memory barriers in place otherwise the code might be rearranged. This is simply silly. You crack me up sometimes. Yes, it is; seem to be having a bad day for thinkos. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Back branches vs. gcc 4.8.0
On Fri, Apr 5, 2013 at 11:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: Since gcc 4.8 is going to be on a lot of people's machines pretty soon, I think we need to do something to prevent it from breaking 8.4.x and 9.0.x. It looks like our choices are (1) teach configure to enable -fno-aggressive-loop-optimizations if the compiler recognizes it, or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9. I'm a bit leaning towards (1), mainly because I'm not excited about I'm confused. I would have described (1) as entering an arms race. Each new optimization related to arrays and structs would need a new flag. Whereas (2) makes the code pretty common traditional code that gcc is going to need to tolerate for the foreseeable future -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remaining beta blockers
On 2013-04-30 07:33:05 -0700, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: 1) vacuum can truncate the table to an empty relation already if there is no data returned by the view's query which then leads to the wrong scannability state. S1: CREATE MATERIALIZED VIEW matview_empty AS SELECT false WHERE random() -1; S2: S2: SELECT * FROM matview_empty; -- works S1: VACUUM matview_empty; S2: SELECT * FROM matview_empty; -- still works S3: SELECT * FROM matview_empty; -- errors out So we need to make vacuum skip cleaning out the last page. Once we get incrementally updated matviews there are more situations to get into this than just a query not returning anything. I remember this being discussed somewhere already, but couldn't find it quickly enough. Yeah, I posted a short patch earlier on this thread that fixes that. Nobody has commented on it, and so far I have not committed anything related to this without posting details and giving ample opportunity for anyone to comment. If nobody objects, I can push that, and this issue is gone. Well, this bug is gone, but the multiple layer violations aren't. Imo this is quite an indicator that the idea of using the filesize is just plain wrong. Adding logic to vacuum not to truncate data because its a flag for matview scannability is quite the layer violation and a sign for a bad design decision. Such a hack has already been added to copy_heap_data(), while not as bad, shows that it is hard to find all the places where we need to add it. I agree, but if you review the thread I started with a flag in pg_class, I tried using the special area of the first page of the heap, and finally wound up with the current technique based on several people reviewing the patch and offering opinions and reaching an apparent consensus that this was the least evil way to do it given current infrastructure for unlogged tables. This really is a result of trying to preserver *correct* behavior while catering to those emphasizing how important the performance of unlogged matviews is. Imo it now uses the worst of both worlds... 2) Since we don't have a metapage to represent scannability in 9.3 we cannot easily use one in 9.4+ without pg_upgrade emptying all matviews unless we only rely on the catalogs which we currently cannot. I am not following this argument at all. Doesn't pg_upgrade use pg_dump to create the tables and matviews WITH NO DATA and take later action for ones which are populated in the source? It would not be hard at all to move to a new release which used a different technique for tracking populated tables by changing what pg_dump does for populated tables with the switch pg_upgrade uses. What I am thinking about is a 100GB materialized view which has been filled in 9.3 and should now be pg_upgraded into 9.4. If we don't have a metapage yet and we want to add one we would need to rewrite the whole 100GB which seems like a rather bad idea. Or how are you proposing to add the metapage? You cannot add it to the end or somesuch. I am not seeing this at all. Given my comment above about how this works for pg_upgrade and pg_dump, can you explain how this is a problem? Well, that only works if there is a cheap way to add the new notation to existing matview heaps which I don't see. 3) Using the filesize as a flag will make other stuff like preallocating on-disk data in bigger chunks and related things more complicated. Why? You don't need to preallocate for a non-populated matview, since its heap will be replaced on REFRESH. You would not *want* to preallocate a lot of space for something guaranteed to remain at zero length until deleted. But we would likely also want to optimize reducing the filesize in the same chunks because you otherwise would end up with even worse fragmentation. And I am not talking about an unscannable relation but about one which is currently empty (e.g. SELECT ... WHERE false). 4) doing the check for scannability in the executor imo is a rather bad idea. Note that we e.g. don't see an error about a matview which won't be scanned because of constraint exclusion or one-time filters. S1: CREATE MATERIALIZED VIEW matview_unit_false AS SELECT false WHERE false WITH NO DATA; S1: SELECT * FROM matview_unit_false; You can get there in far more reasonable cases than WHERE false. I am a little concerned about that, but the case you show doesn't demonstrate a problem: The view was defined WITH NO DATA and is behaving correctly for that case. When populated (with zero rows) it behaves correctly: Ah. Tom already fixed the problem in 5194024d72f33fb209e10f9ab0ada7cc67df45b7. I was working in a branch based on 3ccae48f44d993351e1f881761bd6c556ebd6638 and it failed there. I would be interested to see whether anyone can come up with an actual mis-behavior
Re: [HACKERS] The missing pg_get_*def functions
On 2013-04-30 15:57:02 +0200, Andres Freund wrote: On 2013-04-30 05:14:15 +0100, Joel Jacobson wrote: On Tue, Apr 30, 2013 at 12:58 AM, Tom Lane t...@sss.pgh.pa.us wrote: Noah Misch n...@leadboat.com writes: Note also that minor releases can readily fix bugs in C-language functions, but we have no infrastructure to update sql-language functions after initdb. That flexibility is unfortunate to lose, particularly for something that pg_dump depends on. That alone would probably be sufficient reason why we would never allow pg_dump to depend on any such thing (not that I see a compelling argument for it to do so anyway...). It would be better to find a way to update sql-language functions in minor upgrades, instead of shutting that door entirely for all future implementation ideas involving sql-language functions in the pg_catalog. I'd be very careful with jumping on this task. I am pretty sure its a very good way to get very, very frustrated if you don't present a widely accepted design beforehand. Doing this correctly is *far far* from easy. Just a little collection of problems: * You need to connect to all databases, not just one. There's no infrastructure for this. * You need to do the update *before* allowing any external connections. Otherwise the feature won't be useful to fix actual problems. Again, there is no infrastructure for this. * You need to do it in a way that a) doesn't slow down normal startup b) doesn't break if the update has only been applied to of 1 databases. Another rather fundamental problem: This obviously cannot be done directly on a standby. So either we cannot rely on those updates having been performed or you need to update the standby in lockstep with the primary. Neither seems acceptable. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remaining beta blockers
Andres Freund and...@2ndquadrant.com wrote: On 2013-04-30 07:33:05 -0700, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: 2) Since we don't have a metapage to represent scannability in 9.3 we cannot easily use one in 9.4+ without pg_upgrade emptying all matviews unless we only rely on the catalogs which we currently cannot. I am not following this argument at all. Doesn't pg_upgrade use pg_dump to create the tables and matviews WITH NO DATA and take later action for ones which are populated in the source? It would not be hard at all to move to a new release which used a different technique for tracking populated tables by changing what pg_dump does for populated tables with the switch pg_upgrade uses. What I am thinking about is a 100GB materialized view which has been filled in 9.3 and should now be pg_upgraded into 9.4. If we don't have a metapage yet and we want to add one we would need to rewrite the whole 100GB which seems like a rather bad idea. Or how are you proposing to add the metapage? You cannot add it to the end or somesuch. Oh, you are suggesting prepending a metapage to existing matviews (and tables?)? So to check whether a view has been populated you not only look at the directory but open the file and read a page? Now I follow why you think this would be an issue. I'm not sure I think that is the best solution, though. In what way would it be better than adding info to pg_class or some other system table? Why would this be important for unlogged matviews but not unlogged tables? I am not seeing this at all. Given my comment above about how this works for pg_upgrade and pg_dump, can you explain how this is a problem? Well, that only works if there is a cheap way to add the new notation to existing matview heaps which I don't see. We could perhaps reserve some space in the special area of the first page, if we can agree on a generous enough amount of space. 3) Using the filesize as a flag will make other stuff like preallocating on-disk data in bigger chunks and related things more complicated. Why? You don't need to preallocate for a non-populated matview, since its heap will be replaced on REFRESH. You would not *want* to preallocate a lot of space for something guaranteed to remain at zero length until deleted. But we would likely also want to optimize reducing the filesize in the same chunks because you otherwise would end up with even worse fragmentation. And I am not talking about an unscannable relation but about one which is currently empty (e.g. SELECT ... WHERE false). Yes, if we get to both incremental updates *and* preallocations before we develop a better technique for this, a special case would be needed for the case where a matview was incrementally updated to zero rows. Not sure what the consequence of this is. The most reasonable solution seems to be to introduce a metapage somewhere *now* which sucks, but ... That seems far riskier to me than using the current lame-but-functional approach now and improving it in a subsequent release. Why? We have bitten by the lack of such metapages several times now and I don't think we have bitten by their presence in relation types that have them by now. Like I said, months ago I had a version which used the special area for the first page of a matview heap, but was convinced to change that. I could probably be convinced to change back. :-) I don't know whether you see a problem with using the special like that for the metadata you envision. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remaining beta blockers
On Tue, Apr 30, 2013 at 10:40 AM, Kevin Grittner kgri...@ymail.com wrote: What is a real problem or risk with using this mechanism until we engineer something better? What problems with converting to a later major release does anyone see? Well, it's a pg_upgrade hazard, if nothing else, isn't it? I don't think so. What do you see as a problem? pg_upgrade only handles changes in catalog state, not on-disk representation. If the on-disk representation of an non-scannable view might change in a future release, it's a pg_upgrade hazard. Sure, I wouldn't allow that either. My point is that I feel that could be engineered around in user space. If you have a materialized view which could legitimately be empty (there are many for which that won't be an issue), then you can either arrange the view definition so that it isn't (e.g. by including a dummy record that clients can look for), or you can include a sentinel unlogged table that will contain a row if and only if materialized views have been refreshed since the last crash. In the examples I can think of, indefinitely-stale-but-valid-at-some-point wouldn't be very good either, so I would anticipate needing to do some engineering around relative freshness anyway - e.g. keeping a side table that lists the last-refreshed-time for each matview. Or, maybe I'd wouldn't care about tracking elapsed time per se, but instead want to track freshness relative to updates - e.g. set things up so that anyone who performs an action that would invalidate a row in the materialized view would also update a row someplace that would allow me to identify the row as stale. In either case, handling the case where the view is altogether invalid doesn't seem to add a whole lot of additional complexity. Now, I can imagine cases where it does. For example, suppose you have a cron job (which you trust to work) to refresh your materialized views every night. Well, that means that you'll never be more than 24 hours stale - but if any of those materialized views are unlogged, that also means that you could have no data at all for up to 24 hours following a crash. Not great, because now you need some logic to handle just that one case that wouldn't be necessary if the DB did it for you. But I just think it's a judgement call how serious one thinks that scenario is, vs. any other scenario where a boolean isn't adequate either. Staleness is a completely different issue, in my view, from quietly returning results that are not, and never were, accurate. Sure we need to implement more refined scannability tests than whether valid data from *some* point in time is present. But that should always be *part* of the scannability testing, and without it I don't feel we have a feature of a quality suitable for delivery. I don't really see these as being all that separate. The user is going to want to know whether the data is usable or not. The answer to that question depends on the user's business rules, and those could be anything. The current system implements the following business rule: The data can be used if the matview has ever been populated. But there are lots of other possibilities: - The data can be used if the matview is fully up-to-date. - The data can be used if the matview is not out of date by more than a certain amount of time. - The data can be used if the matview is out of date with respect to one of its base tables, but not if it is out of date with respect to another of its base tables. For example, maybe you're OK with using an order-summary view if new orders have arrived (but not if the view is more than a day out of date); but not if any customers have been renamed. Not only can the business rules vary, but they can vary between queries. Query A might need an exact answer, hence can only use the matview when its up to date. Query B against the very same matview might only need data that's up to date within some time threshold, and query C might well want to just use whatever data exists. I'm not at all clear that it's even feasible to solve all of these problems inside the database; my suspicion is that at least some of these things are going to end up being things that have to be handled at least partially in user-space, while others will be handled in the DB through mechanisms not yet designed. I am even willing to go so far as to say that I am unconvinced that everyone will want a just-truncated materialized view to be automatically set to non-scannable. Suppose you have an unlogged view that summarizes a real-time data stream - all SMTP traps received in the last minute summarized by the node from which they were received. When the server reboots unexpectedly, the raw data is lost along with the summary. But for the scannability flag, we'd read the summary as empty, which would be right. Adding the scannability flag forces the user to add application logic to treat the error as equivalent to an empty
Re: [HACKERS] Remaining beta blockers
On 2013-04-30 08:35:32 -0700, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: On 2013-04-30 07:33:05 -0700, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: 2) Since we don't have a metapage to represent scannability in 9.3 we cannot easily use one in 9.4+ without pg_upgrade emptying all matviews unless we only rely on the catalogs which we currently cannot. I am not following this argument at all. Doesn't pg_upgrade use pg_dump to create the tables and matviews WITH NO DATA and take later action for ones which are populated in the source? It would not be hard at all to move to a new release which used a different technique for tracking populated tables by changing what pg_dump does for populated tables with the switch pg_upgrade uses. What I am thinking about is a 100GB materialized view which has been filled in 9.3 and should now be pg_upgraded into 9.4. If we don't have a metapage yet and we want to add one we would need to rewrite the whole 100GB which seems like a rather bad idea. Or how are you proposing to add the metapage? You cannot add it to the end or somesuch. Oh, you are suggesting prepending a metapage to existing matviews (and tables?)? So to check whether a view has been populated you not only look at the directory but open the file and read a page? Now I follow why you think this would be an issue. I'm not sure I think that is the best solution, though. In what way would it be better than adding info to pg_class or some other system table? You can write/read it without having a database opened. Like in the startup process. Then you can abstract away the knowledge about that into PageIsMetapage(relation, blockno) or such which then can be used by vacuum, heapam et al in an extensible fashion. This is far from a fully working solution though - you still have issues with BEGIN;REFRESH ...; ROLLBACK; if you do it naively. Afair that was what made Tom protest against this. Why would this be important for unlogged matviews but not unlogged tables? Unlogged tables basically have some very raw version of this - the _init fork. But yes, a more generic version would have been nice. I am not seeing this at all. Given my comment above about how this works for pg_upgrade and pg_dump, can you explain how this is a problem? Well, that only works if there is a cheap way to add the new notation to existing matview heaps which I don't see. We could perhaps reserve some space in the special area of the first page, if we can agree on a generous enough amount of space. If we do it we should just use a whole page, no point in being to cautious there imo. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
I re-ran the benchmark that's had me most worried against the committed code and things look good so far. I've been keeping quiet because my tests recently have all agreed with what Ants already described. This is more a confirmation summary than new data. The problem case has been Jeff's test 2 worst-case overhead for calculating checksum while reading data from the OS cache. I wrapped that into a test harness and gave results similar to Jeff's at http://www.postgresql.org/message-id/5133d732.4090...@2ndquadrant.com based on the originally proposed Fletcher-16 checksum. I made some system improvements since then such that the absolute runtime improved for most of the tests I'm running. But the percentage changes didn't seem off enough to bother re-running the Fletcher tests again. Details are in attached spreadsheet, to summarize: -The original Fletcher-16 code slowed this test case down 24 to 32%, depending on whether you look at the average of 3 runs or the median. -The initial checksum commit with the truncated WAL CRC was almost an order of magnitude worse: 146% to 224% slowdown. The test case that took ~830ms was taking as much as 2652ms with that method. I'm still not sure why the first run of this test is always so much faster than the second and third. But since it happens so often I think it's fair to consider that worst case really important. -Committed FNV-1a implementation is now slightly better than Fletcher-16 speed wise: 19 to 27% slowdown. -Slicing by 8 CRC I didn't test because once I'd fully come around to agree with Ants's position it didn't seem likely to be useful. I don't want to lose track of that idea though, it might be the right path for a future implementation with 32 bit checksums. Since the =25% slowdown on this test with Fletcher-16 turned into more like a 2% drop on more mixed workloads, I'd expect we're back to where that's again the case with the new FNV-1a. I plan to step back to looking at more of those cases, but it will take a few days at least to start sorting that out. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com ChecksumMethods.xls Description: MS-Excel spreadsheet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remaining beta blockers
Robert, - The data can be used if the matview is fully up-to-date. - The data can be used if the matview is not out of date by more than a certain amount of time. - The data can be used if the matview is out of date with respect to one of its base tables, but not if it is out of date with respect to another of its base tables. For example, maybe you're OK with using an order-summary view if new orders have arrived (but not if the view is more than a day out of date); but not if any customers have been renamed. We discussed this around the beginning of CF4. For some reason (which I didn't quite understand at the time), the consensus was that creating a matview last updated timestamp was not implementable for 9.3 and would need to wait for 9.4. Again, all of this points to additional columns, or at least relopts, in pg_class. Why is that off the table? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remaining beta blockers
Josh Berkus j...@agliodbs.com wrote: We discussed this around the beginning of CF4. For some reason (which I didn't quite understand at the time), the consensus was that creating a matview last updated timestamp was not implementable for 9.3 and would need to wait for 9.4. The reason was that the start of CF4 was deemed too late in the development cycle to be trying to design what that should look like. No sooner had you suggested that one column than someone suggested two others which might also be useful, and it seemed to me that it would be hard to get it right before we had a better sense of what incremental maintenance based on the view declaration would look like. So, as I recall it, the consensus developed around the simplest possible test, which from the user perspective was hidden behind a function, should be used for this release. Again, all of this points to additional columns, or at least relopts, in pg_class. Why is that off the table? That was deemed to be incompatible with unlogged matviews, which some didn't want to give up in this initial release. Basically, what this patch aims at is more or less what some other databases had in their initial releases of materialized views 10 to 20 years ago. Other products have built on those foundations with each major release. I was hoping we could do the same. We are not going to reach parity on this with any other major database product in one release, or probably even two or three. I didn't dig back through all the threads to confirm these recollections, so take them with a grain of salt. The one thing I am sure of is that they are a rehash of discussions which go back to before the start of CF3. Only with more tone and urgency. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] materialized view scannability in other DBs
On Tue, Apr 30, 2013 at 10:40 AM, Kevin Grittner kgri...@ymail.com wrote: Staleness is a completely different issue, in my view, from quietly returning results that are not, and never were, accurate. Sure we need to implement more refined scannability tests than whether valid data from *some* point in time is present. But that should always be *part* of the scannability testing, and without it I don't feel we have a feature of a quality suitable for delivery. I did a little more research on this. The Oracle equivalent of WITH NO DATA is apparently BUILD DEFERRED. And as far as I can see, if you specify BUILD DEFERRED when creating the view, then it just shows up as having no rows: http://dbataj.blogspot.com/2007/11/materialized-view-for-data-warehouse.html The system declines to use such a view for query rewrite, but you can still select from it without an error. See also Pro Oracle Database 11g Administration by Darl Kuhn, page 378, which confirms that this is the case and that it will simply return 0 rows. On the other hand, in SQL Anywhere and DB2, it seems that it works the way you've implemented it: http://dcx.sybase.com/1201/en/saerrors/errm1077.html http://www.ibm.com/developerworks/data/library/techarticle/dm-0509melnyk/ The question doesn't seem to arise for Microsoft SQL Server or MySQL. The former supports indexed views rather than materialized views, but you can't build them deferred, so the question of what data you'd see if you could doesn't arise. The latter doesn't seem to support materialized views at all. I have to admit that I'm slightly surprised to find that both DB2 and Sybase have this concept, so maybe this is not as much of a random wart as I had been thinking. However, the fact that Oracle has (FWICT) had materialized views since 8i (1999), and that they have not felt compelled to add a flag of this type, suggests to me that the feature can't be considered mandatory for a minimal implementation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The missing pg_get_*def functions
On Tue, Apr 30, 2013 at 9:02 AM, Stephen Frost sfr...@snowman.net wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Mon, Apr 29, 2013 at 7:58 PM, Tom Lane t...@sss.pgh.pa.us wrote: One of the things that we frequently recommend when doing upgrades is that you do the dump with the newer version's pg_dump, so as to get the benefits of any bug fixes that are in it. The more dump functionality is on the server side, the less opportunity we have to repair things that way. But why wouldn't we be able to fix the version in the server, if it turns out to be buggy? I suppose we wouldn't fix bugs discovered after EOL, but I'm not sure that's a sufficient objection. There are other things beyond bugs here.. Changes in reserved keywords is actually the biggest reason, ime, to use the newer pg_dump when you're trying to move to a newer PG version. Oh. Good point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] materialized view scannability in other DBs
Robert Haas robertmh...@gmail.com wrote: Oracle has (FWICT) had materialized views since 8i (1999) 8i was when they expanded beyond requiring a manual refresh, and changed the name of the feature from snapshot to materialized view. I'm not sure how long they had snapshots before 8i. Microsoft SQL Server ... you can't build them deferred, so the question of what data you'd see if you could doesn't arise. in SQL Anywhere and DB2, it seems that it works the way you've implemented it Sybase ASE, which is probably a better comparison than Sybase SQL Anywhere, has the concept, too: http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc36272.1572/html/commands/commands31.htm See the populate | nonpopulate section. If not populated, you can't enable it for operations. MySQL ... doesn't seem to support materialized views at all. Correct. the fact that Oracle has [...] not felt compelled to add a flag of this type, suggests to me that the feature can't be considered mandatory for a minimal implementation. It seems to me pretty fundamental to have a way to avoid quietly generating completely bogus results, whether or not one other vendor has decided it doesn't matter. It's not like they are completely without the concept of freshness (or, as they seem to express it, staleness). If you build with DEFERRED that property of the matview is set to UNUSABLE; but in their world that doesn't mean it's unusable by direct reference -- only for automatic query rewrites. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remaining beta blockers
Andres Freund and...@2ndquadrant.com wrote: Ah. Tom already fixed the problem in 5194024d72f33fb209e10f9ab0ada7cc67df45b7. I was working in a branch based on 3ccae48f44d993351e1f881761bd6c556ebd6638 and it failed there. Since previous regression tests missed this bug, I've added the test you posted, to make sure it doesn't come back unnoticed. Thanks for the test case! -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remaining beta blockers
Andres Freund and...@2ndquadrant.com wrote: On 2013-04-30 07:33:05 -0700, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: 1) vacuum can truncate the table to an empty relation already if there is no data returned by the view's query which then leads to the wrong scannability state. So we need to make vacuum skip cleaning out the last page. Once we get incrementally updated matviews there are more situations to get into this than just a query not returning anything. Yeah, I posted a short patch earlier on this thread that fixes that. Nobody has commented on it, and so far I have not committed anything related to this without posting details and giving ample opportunity for anyone to comment. If nobody objects, I can push that, and this issue is gone. Well, this bug is gone, but the multiple layer violations aren't. Attached is a patch with regression test if we don't obviate the need for it by tracking the populated status in a different way or allowing unpopulated matviews to be used in queries. I'll hold off on pushing it until we decide. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company*** a/src/backend/commands/vacuumlazy.c --- b/src/backend/commands/vacuumlazy.c *** *** 230,236 lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt, --- 230,242 * * Don't even think about it unless we have a shot at releasing a goodly * number of pages. Otherwise, the time taken isn't worth it. + * + * Leave a populated materialized view with at least one page. */ + if (onerel-rd_rel-relkind == RELKIND_MATVIEW + vacrelstats-nonempty_pages == 0) + vacrelstats-nonempty_pages = 1; + possibly_freeable = vacrelstats-rel_pages - vacrelstats-nonempty_pages; if (possibly_freeable 0 (possibly_freeable = REL_TRUNCATE_MINIMUM || *** a/src/test/regress/expected/matview.out --- b/src/test/regress/expected/matview.out *** *** 430,432 SELECT * FROM matview_unit_false; --- 430,452 (0 rows) DROP MATERIALIZED VIEW matview_unit_false; + -- test that vacuum does not make empty matview look unpopulated + CREATE TABLE hoge (i int); + INSERT INTO hoge VALUES (generate_series(1,10)); + CREATE MATERIALIZED VIEW hogeview AS SELECT * FROM hoge WHERE i % 2 = 0; + CREATE INDEX hogeviewidx ON hogeview (i); + DELETE FROM hoge; + REFRESH MATERIALIZED VIEW hogeview; + SELECT * FROM hogeview WHERE i 10; + i + --- + (0 rows) + + VACUUM ANALYZE; + SELECT * FROM hogeview WHERE i 10; + i + --- + (0 rows) + + DROP TABLE hoge CASCADE; + NOTICE: drop cascades to materialized view hogeview *** a/src/test/regress/sql/matview.sql --- b/src/test/regress/sql/matview.sql *** *** 136,138 SELECT * FROM matview_unit_false; --- 136,150 REFRESH MATERIALIZED VIEW matview_unit_false; SELECT * FROM matview_unit_false; DROP MATERIALIZED VIEW matview_unit_false; + + -- test that vacuum does not make empty matview look unpopulated + CREATE TABLE hoge (i int); + INSERT INTO hoge VALUES (generate_series(1,10)); + CREATE MATERIALIZED VIEW hogeview AS SELECT * FROM hoge WHERE i % 2 = 0; + CREATE INDEX hogeviewidx ON hogeview (i); + DELETE FROM hoge; + REFRESH MATERIALIZED VIEW hogeview; + SELECT * FROM hogeview WHERE i 10; + VACUUM ANALYZE; + SELECT * FROM hogeview WHERE i 10; + DROP TABLE hoge CASCADE; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remaining beta blockers
Kevin, The reason was that the start of CF4 was deemed too late in the development cycle to be trying to design what that should look like. No sooner had you suggested that one column than someone suggested two others which might also be useful, and it seemed to Yeah, I'm just pointing out that we *already had* this discussion, so there isn't any point in having it again. That was deemed to be incompatible with unlogged matviews, which some didn't want to give up in this initial release. Huh? Unlogged tables don't go in pg_class? Basically, what this patch aims at is more or less what some other databases had in their initial releases of materialized views 10 to 20 years ago. Other products have built on those foundations with each major release. I was hoping we could do the same. We are not going to reach parity on this with any other major database product in one release, or probably even two or three. Yep. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remaining beta blockers
Josh Berkus j...@agliodbs.com wrote: That was deemed to be incompatible with unlogged matviews, which some didn't want to give up in this initial release. Huh? Unlogged tables don't go in pg_class? Sorry if I wasn't clear. I try to do incremental development -- get one part working and then go on to the next. I had stored the populated state in pg_class, although under what, in retrospect, was a bad name -- I had a bool called relisvalid. It should have been relispopulated or a bit in a flag byte. Anyway, as I proceeded to implement the unlogged feature I discovered that the heap is truncated at recovery time based on the init fork, before the server is at the point where we can access the system tables. So then I tried to keep the pg_class state but populate it when an unlogged matview was opened, based on using the special space in the first page of the init fork, updating pg_class if that was found to indicate a truncated heap. I was roundly criticized for keeping this state in two places -- both the heap and pg_class, and urged to keep it only in the heap, because only keeping it in pg_class would not allow the proper recovery for an unlogged matview. So I responded to that feedback with the current implementation. Clearly we would need to change how we do recovery of unlogged relations to both allow unlogged matviews and keep the populated state in pg_class. I don't think we can really make the two places technique work, where the recovery state of the unlogged matview is used to trigger a pg_class change. For one thing, it was really messy -- far more so than current code. For another thing, the matview would show as good until it was first opened, which was nasty. Kinda verbose, but I hope that makes it more clear. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
On Tue, Apr 30, 2013 at 01:05:30PM -0400, Greg Smith wrote: I re-ran the benchmark that's had me most worried against the committed code and things look good so far. I've been keeping quiet because my tests recently have all agreed with what Ants already described. This is more a confirmation summary than new data. I came across this today: Data Integrity Extensions, basically a standard for have an application calculate a checksum of a block and submitting it together with the block so that the disk can verify that the block it is writing matches what the application sent. It appears SCSI has standardised on a CRC-16 checksum with polynomial 0x18bb7 . http://www.t10.org/ftp/t10/document.03/03-290r0.pdf https://oss.oracle.com/~mkp/docs/dix-draft.pdf Not directly relavent to PostgreSQL now, but possibly in the future. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] corrupt pages detected by enabling checksums
On Tue, 2013-04-30 at 08:34 -0400, Robert Haas wrote: Uh, wait a minute. I think this is completely wrong. The buffer is LOCKED for this entire sequence of operations. For a checkpoint to happen, it's got to write every buffer, which it will not be able to do for so long as the buffer is locked. I went back and forth on this, so you could be right, but here was my reasoning: I was worried because SyncOneBuffer checks whether it needs writing without taking a content lock, so the exclusive lock doesn't help. That makes sense, because you don't want a checkpoint to have to get a content lock on every buffer in the buffer pool. But it also means we need to follow the rules laid out in transam/README and dirty the pages before writing WAL. The effect of the change to lazy_scan_heap is to force the buffer to be written even if we're only updating the visibility map page. That's a bad idea and should be reverted. The only time the VM and the data page are out of sync during vacuum is after a crash, right? If that's the case, I didn't think it was a big deal to dirty one extra page (should be extremely rare). Am I missing something? The reason I removed that special case was just code complexity/readability. I tried preserving the previous behavior, and it's not so bad, but it seemed unnecessarily ugly for the benefit of a rare case. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Incomplete description of pg_start_backup?
I think that at http://www.postgresql.org/docs/current/static/functions-admin.html and http://www.postgresql.org/docs/current/static/continuous-archiving.html two important points on how pg_start_backup() works are missing: 1. After pg_start_backup() and till pg_stop_backup() VACUUM is denied (e.g. autovacuum is turned off), so the new data is always appended to data files, is never written at their middle. This allows to archive the data directory using any copying tools (rsync, tar, cp etc.). If you forget to call pg_stop_backup() by accident, data files will grow forever. So pg_start_backup() switches the database to append-only mode which is safe to backup without stopping (data files temporarily become append-only, WAL logs are append-only always). 2. After pg_start_backup() and till pg_stop_backup() full_page_writes is forced to be ON. BTW are these points fully correct? If yes, it would be great to update the documentation, because in google there are a lot of questions on how exactly backup with pg_start_backup() works and why cp, tar etc. are safe after pg_start_backup(), but no clear answers. If no, could you please give a couple of comments on all these?
Re: [HACKERS] The missing pg_get_*def functions
On Tue, Apr 30, 2013 at 04:47:58AM +0100, Joel Jacobson wrote: On Tue, Apr 30, 2013 at 12:46 AM, Noah Misch n...@leadboat.com wrote: Those existing functions give a mostly-SnapshotNow picture of their objects, but an sql-language implementation would give a normally-snapshotted picture. I assume normally is better than mostly? Inconsistent snapshot usage (mostly-anything) is bad, especially within the confines of a single function. pg_get_viewdef() grabs pg_rewrite.ev_action using an MVCC snapshot, then interprets it relative to SnapshotNow. That's a paradox waiting to happen. (Granted, the same could be said for *any* use of SnapshotNow in the absence of locking.) Whether all-normally-snapshotted (all-MVCC) beats all-SnapshotNow is less clear-cut, but I tentatively think it would. That status quo is perhaps more an implementation accident than a designed behavior. Before proliferating functions like this, we should pick a snapshot policy and stick to it. See the block comment at the top of pg_dump.c. I didn't think there would be any reason to migrate the existing functions from C to SQL, but this snapshot problem seems like a good motive to do it. If they would all be written in SQL, the snapshot problem would be solved, right? Nominally yes, but not because of difficulty using a normal MVCC snapshot from C. It's just that the sql PL uses nothing but normal MVCC snapshots. So, this isn't a sound reason to translate C to SQL. In any case, I can't fathom a prudent 100% sql implementation of pg_get_viewdef(), which needs to deparse arbitrary queries. Note also that minor releases can readily fix bugs in C-language functions, but we have no infrastructure to update sql-language functions after initdb. That flexibility is unfortunate to lose, particularly for something that pg_dump depends on. Now, the right thing is probably to design a mechanism for applying simple catalog updates in concert with a minor release. In the mean time, its absence puts the sql PL at a nontrivial disadvantage here. What do you mean with infrastructure? Isn't it as simple as CREATE OR REPLACE FUNCTION? As long as the interface the pg_get_*def functions don't change, I cannot see how simply replacing the existing functions in a minor release upgrade could do any harm. Stephen described the sort of infrastructure I had in mind. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
On 4/30/13 5:26 PM, Martijn van Oosterhout wrote: I came across this today: Data Integrity Extensions, basically a standard for have an application calculate a checksum of a block and submitting it together with the block so that the disk can verify that the block it is writing matches what the application sent. It appears SCSI has standardised on a CRC-16 checksum with polynomial 0x18bb7 . To be pedantic for a minute (for the first time *ever* on pgsql-hackers) it's not quite all of SCSI. iSCSI has joined btrfs by settling on CRC-32C with the Castagnoli polynomial, as mentioned in that first reference. CRC-32C is also the one with the SSE4.2 instructions to help too. All the work around the T10/Data Integrity Field standard that's going on is nice. I think it's going to leave a lot of PostgreSQL users behind though. I'd bet a large sum of money that five years from now, there will still be more than 10X as many PostgreSQL servers on EC2 as on T10/DIF capable hardware. I feel pretty good that this new FNV-1a implementation is a good trade-off spot that balances error detection and performance impact. If you want a 16 bit checksum that seems ready for beta today, we can't do much better. Fletcher-16 had too many detection holes, the WAL checksum was way too expensive. Optimized FNV-1a is even better than unoptimized Fletcher-16 without as many detection issues. Can't even complain about the code bloat for this part either--checksum.c is only 68 lines if you take out its documentation. The WAL logging of hint bits is where the scary stuff to me for this feature has always been at. My gut feel is that doing that needed to start being available as an option anyway. Just this month we've had two customer issues pop up where we had to look for block differences between a master and a standby. The security update forced some normal update stragglers to where they now have the 9.1.6 index corruption fix, and we're looking for cases where standby indexes might have been corrupted by it. In this case the comparisons can just avoid anything but indexes, so hint bits are thankfully not involved. But having false positives pop out of comparing a master and standby due to hint bits makes this sort of process much harder in general. Being able to turn checksums on, and then compare more things between master and standby without expecting any block differences, that will make both routine quality auditing and forensics of broken clusters so much easier. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
Orthogonal to this thread, but: On Tue, Apr 30, 2013 at 06:39:09PM -0400, Greg Smith wrote: The WAL logging of hint bits is where the scary stuff to me for this feature has always been at. My gut feel is that doing that needed to start being available as an option anyway. Just this month we've had two customer issues pop up where we had to look for block differences between a master and a standby. The security update forced some normal update stragglers to where they now have the 9.1.6 index corruption fix, and we're looking for cases where standby indexes might have been corrupted by it. In this case the comparisons can just avoid anything but indexes, so hint bits are thankfully not involved. B-tree indexes have hints; see callers of ItemIdMarkDead(). -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incomplete description of pg_start_backup?
On Tue, Apr 30, 2013 at 3:24 PM, Dmitry Koterov dmi...@koterov.ru wrote: I think that at http://www.postgresql.org/docs/current/static/functions-admin.html and http://www.postgresql.org/docs/current/static/continuous-archiving.html two important points on how pg_start_backup() works are missing: 1. After pg_start_backup() and till pg_stop_backup() VACUUM is denied (e.g. autovacuum is turned off), so the new data is always appended to data files, is never written at their middle. This is not the case. Autovacuum continues to run during the backup. This allows to archive the data directory using any copying tools (rsync, tar, cp etc.). If you forget to call pg_stop_backup() by accident, data files will grow forever. So pg_start_backup() switches the database to append-only mode which is safe to backup without stopping (data files temporarily become append-only, WAL logs are append-only always). No, it doesn't work that way. I don't know why appending would be any safer than normal updates would be anyway. WAL replay fixes up any problems that might arise. 2. After pg_start_backup() and till pg_stop_backup() full_page_writes is forced to be ON. Effectively yes, this is documented in one of your links above (and is one of the reasons vacuuming during the backup is not a problem) Cheers, Jeff
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
On 2013-04-30 18:39:09 -0400, Greg Smith wrote: The WAL logging of hint bits is where the scary stuff to me for this feature has always been at. My gut feel is that doing that needed to start being available as an option anyway. Just this month we've had two customer issues pop up where we had to look for block differences between a master and a standby. The security update forced some normal update stragglers to where they now have the 9.1.6 index corruption fix, and we're looking for cases where standby indexes might have been corrupted by it. In this case the comparisons can just avoid anything but indexes, so hint bits are thankfully not involved. But having false positives pop out of comparing a master and standby due to hint bits makes this sort of process much harder in general. Being able to turn checksums on, and then compare more things between master and standby without expecting any block differences, that will make both routine quality auditing and forensics of broken clusters so much easier. I don't think the current implementation helps you with that. We only log the first hint bit set after a checkpoint, you will still get inconsistent bits set after that. So you might have some fewer inconsistencies but not enough to weed them out manually or such. c.f. MarkBufferDirtyHint() and XLogSaveBufferForHint(). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Should pg_upgrade use --quote-all-identifiers?
Seems like this might be a good idea to avoid the type of failure exhibited in bug #8128. We don't care too much about the readability of the dump script created during an upgrade, so it's hard to see a downside. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should pg_upgrade use --quote-all-identifiers?
On Wed, May 1, 2013 at 12:55 AM, Tom Lane t...@sss.pgh.pa.us wrote: Seems like this might be a good idea to avoid the type of failure exhibited in bug #8128. We don't care too much about the readability of the dump script created during an upgrade, so it's hard to see a downside. Huh. I thought you were talking about quoting identifiers in an SQL dump. But you're not, you're talking about quoting identifiers in sql being sent to the server during the pg_dump process. Why did pg_dump ever not quote all such identifiers? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should pg_upgrade use --quote-all-identifiers?
Greg Stark st...@mit.edu writes: Huh. I thought you were talking about quoting identifiers in an SQL dump. But you're not, you're talking about quoting identifiers in sql being sent to the server during the pg_dump process. Why did pg_dump ever not quote all such identifiers? Well, readability of those commands is worth something too, but in any case the short answer is that pg_dump has only one quote-an-identifier function, not different ones for server commands and final output. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remaining beta blockers
On Tue, Apr 30, 2013 at 10:19 PM, Kevin Grittner kgri...@ymail.com wrote: Clearly we would need to change how we do recovery of unlogged relations to both allow unlogged matviews and keep the populated state in pg_class. I don't think we can really make the two places technique work, where the recovery state of the unlogged matview is used to trigger a pg_class change. For one thing, it was really messy -- far more so than current code. For another thing, the matview would show as good until it was first opened, which was nasty. Can I ask what is the design goal of unlogged relations? Are they just an optimization so you can load lots of data without generating piles of wal log? Or are they intended to generate zero wal traffic so they can be populated on a standby for example, or outside a transaction entirely? Because if it's the former then I don't see any problem generating wal traffic to update the pg_class entry. If they can satisfy the latter that gets a bit trickier. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should pg_upgrade use --quote-all-identifiers?
On Wed, May 1, 2013 at 2:23 AM, Tom Lane t...@sss.pgh.pa.us wrote: Well, readability of those commands is worth something too, but in any case the short answer is that pg_dump has only one quote-an-identifier function, not different ones for server commands and final output. Well for the final output one reason it's nice not to quote is that it makes it easier to use the SQL generated by --inserts on a non-postgres database. Mainly I'm thinking of the case issue but also some databases use strange quoting rules. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add regression test for bug fixed by recent refactoring.
Kevin Grittner kgri...@postgresql.org writes: Add regression test for bug fixed by recent refactoring. Test case by Andres Freund for bug fixed by Tom Lane's refactoring in commit 5194024d72f33fb209e10f9ab0ada7cc67df45b7 Hm, that actually has got nothing much to do with matviews: regression=# create view vv1 as select false; CREATE VIEW regression=# create view vv2 as select false where false; CREATE VIEW regression=# create user joe; CREATE ROLE regression=# \c - joe You are now connected to database regression as user joe. regression= select * from vv1; ERROR: permission denied for relation vv1 regression= select * from vv2; bool -- (0 rows) Of course the select from vv2 should fail as well, but it doesn't, because vv2 is nowhere to be seen in the rangetable passed to the executor. I think the planner is probably trashing the rangetable once it realizes that the query is completely dummy; but this is wrong because we need to leave view rangetable entries behind for permissions checks, even when they're unreferenced in the finished plan. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers