Re: [HACKERS] Detach/attach table and index data files from one cluster to another
On Fri, Apr 12, 2013 at 9:52 PM, Andres Freund wrote: > On 2013-04-12 12:14:24 -0400, Tom Lane wrote: > > Andrew Dunstan writes: > > > On 04/12/2013 10:15 AM, Tom Lane wrote: > > >> There's 0 chance of making that work, because the two databases > wouldn't > > >> have the same notions of committed XIDs. > > > > > Yeah. Trying to think way outside the box, could we invent some sort of > > > fixup mechanism that could be applied to adopted files? > > > > Well, it wouldn't be that hard to replace XIDs with FrozenXID or > > InvalidXID as appropriate, if you had access to the source database's > > clog while you did the copying. It just wouldn't be very fast. > > >I think if one goes over the heap and hint bits everything (so the item > >pointers don't have to be immediately rewritten), freeze everything and > >such it should be doable at about disk speed unless you have a really > >fast disk subsystem. > >But it still is fairly complicated and I doubt its really necessary. > > >> I suppose it would still be faster than a COPY transfer, but I'm not > > >sure it'd be enough faster to justify the work and the additional > > >portability hits you'd be taking. > > >Using binary copy might already give quite a speedup, Sameer, did you > try that? > No we have not so far, was soliciting feedback first from the hackers and > possibly implement as a contrib module. Also i did misread the earlier post > on the subject. > >Also, do you really need parts of a cluster or would a base backup of > >the whole cluster do the trick? > We were looking at parts of cluster as an faster alternative to pg_dump > and restore > > -- > Andres Freund http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >
Re: [HACKERS] Hash Join cost estimates
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > I'm certainly curious about those, but I'm also very interested in the > > possibility of making NTUP_PER_BUCKET much smaller, or perhaps variable > > depending on the work_mem setting. > > Not sure about that. That would make the hash-bucket-header array > larger without changing the size of the rest of the hash table, thus > probably making the CPU cache situation worse not better (which would > manifest as more time at the first of these two statements relative to > the second). In the testing that I've done, I've yet to find a case where having a smaller NTUP_PER_BUCKET makes things worse and it can have a dramatic improvement. Regarding the memory situation, I'm not sure that using buckets really helps, though I'm no CPU architect. However, assuming that the CPU is smart enough to only pull in bits of memory that it needs, I'm not sure how having to pull in parts of a large array of pointers is worse than having to go fetch randomly placed entries in a bucket, especially since we have to step through an entire bucket each time and the bucket tuples are allocated independently, while the hash array is allocated once. Indeed, it seems you'd be more likely to get other things you need in a given pull into cache for the hash table array than with the bucket tuple entries which might well include unrelated garbage. > Can you add some instrumentation code to get us information about the > average/max length of the bucket chains? And maybe try to figure out > how many distinct hash values per bucket, which would give us a clue > whether your two-level-list idea is worth anything. I ended up implementing the two-level system and doing some testing with it. It ends up making hash table building take quite a bit longer and only improves the scan performance in very select cases. The improvement requires an individual bucket to have both lots of dups and a lot of distinct values, because it only helps when it can skip a significant number of tuples. If we move to a system where there's rarely more than one distinct value in a bucket then the chances of a serious improvment from the two-level system goes down that much more. As such, I've come up with this (trival) patch which simply modifies ExecChooseHashTableSize() to ignore NTUP_PER_BUCKET (essentially treating it as 1) when work_mem is large enough to fit the entire hash table (which also implies that there is only one batch). I'd love to hear feedback from others on what this does under different conditions. This also makes the "hash-the-small-table" case faster for the test case which I provided earlier (http://snowman.net/~sfrost/test_case2.sql), and it uses quite a bit less memory too. Now that I've convinced myself that the two-level system isn't practical and the "hash-the-small-table" case is actually faster than "hash-the-big-table", I'll start looking at improving on your proposal to change the costing to favor the smaller table being hashed more often. Thanks, Stephen colordiff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c new file mode 100644 index 6a2f236..a8a8168 *** a/src/backend/executor/nodeHash.c --- b/src/backend/executor/nodeHash.c *** ExecChooseHashTableSize(double ntuples, *** 489,496 /* We expect the hashtable to fit in memory */ double dbuckets; ! dbuckets = ceil(ntuples / NTUP_PER_BUCKET); ! dbuckets = Min(dbuckets, max_pointers); nbuckets = (int) dbuckets; nbatch = 1; --- 489,495 /* We expect the hashtable to fit in memory */ double dbuckets; ! dbuckets = Min(ntuples, max_pointers); nbuckets = (int) dbuckets; nbatch = 1; signature.asc Description: Digital signature
Re: [HACKERS] Small reduction in memory usage of index relcache entries
Heikki Linnakangas writes: > Looking closer at where that memory is spent, a lot of it goes into the > FmgrInfo structs in RelationAmInfo. But some of them are outright unused > (ambuild, ambuildempty, amcostestimate, amoptions), and a few others > hardly are called so seldom that they hardly need to be cached > (ambuildelete, ambacuumcleanup). Removing those fields, patch attached, > reduces the memory usage nicely: +1, but it would look nicer if you could encapsulate the code in some macro analogous to GET_REL_PROCEDURE(), perhaps call it GET_UNCACHED_REL_PROCEDURE()? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Sat, Apr 13, 2013 at 12:38 AM, Jeff Davis wrote: > On Fri, 2013-04-12 at 23:03 +0300, Heikki Linnakangas wrote: >> I think this is a bad idea. It complicates the WAL format significantly. >> Simon's patch didn't include the changes to recovery to validate the >> checksum, but I suspect it would be complicated. And it reduces the >> error-detection capability of WAL recovery. Keep in mind that unlike >> page checksums, which are never expected to fail, so even if we miss a >> few errors it's still better than nothing, the WAL checkum is used to >> detect end-of-WAL. There is expected to be a failure every time we do >> crash recovery. This far, we've considered the probability of one in >> 1^32 small enough for that purpose, but IMHO one in 1^16 is much too weak. > > One thing that just occurred to me is that we could make the SIMD > checksum a 32-bit checksum, and reduce it down to 16 bits for the data > pages. That might give us more flexibility to later use it for WAL > without compromising on the error detection nearly as much (though > obviously that wouldn't work with Simon's current proposal which uses > the same data page checksum in a WAL backup block). The simple 32bit version of the algorithm would need CPU capability checks for the fast version and would work only on CPUs produced in the last few years. Not a show stopper but but more complex code and less applicability for sure. An alternative would be to calculate 2 16 bit checksums, concat them for the 32bit checksum and add them for the 16 bit one. In this case we wouldn't need to change the current algorithm. A future change could just factor out everything until the last add as the common function. But keep in mind that we are talking about sharing about 400 bytes of machine code here. > In general, we have more flexibility with WAL because there is no > upgrade issue. It would be nice to share code with the data page > checksum algorithm; but really we should just use whatever offers the > best trade-off in terms of complexity, performance, and error detection > rate. > > I don't think we need to decide all of this right now. Personally, I'm > satisfied having SIMD checksums on data pages now and leaving WAL > optimization until later. +1 I feel quite uneasy about reducing the effectiveness of WAL end detection. There are many ways to improve WAL performance and I have no idea what would be the best one. At the very least some performance tests are in order. As this is not an essential part of having usable checksums, but a general performance optimization I feel that it is not fair to others to postpone the release to resolve this now. I'd be more than happy to research this for 9.4. Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Fri, 2013-04-12 at 23:03 +0300, Heikki Linnakangas wrote: > I think this is a bad idea. It complicates the WAL format significantly. > Simon's patch didn't include the changes to recovery to validate the > checksum, but I suspect it would be complicated. And it reduces the > error-detection capability of WAL recovery. Keep in mind that unlike > page checksums, which are never expected to fail, so even if we miss a > few errors it's still better than nothing, the WAL checkum is used to > detect end-of-WAL. There is expected to be a failure every time we do > crash recovery. This far, we've considered the probability of one in > 1^32 small enough for that purpose, but IMHO one in 1^16 is much too weak. One thing that just occurred to me is that we could make the SIMD checksum a 32-bit checksum, and reduce it down to 16 bits for the data pages. That might give us more flexibility to later use it for WAL without compromising on the error detection nearly as much (though obviously that wouldn't work with Simon's current proposal which uses the same data page checksum in a WAL backup block). In general, we have more flexibility with WAL because there is no upgrade issue. It would be nice to share code with the data page checksum algorithm; but really we should just use whatever offers the best trade-off in terms of complexity, performance, and error detection rate. I don't think we need to decide all of this right now. Personally, I'm satisfied having SIMD checksums on data pages now and leaving WAL optimization until later. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Fri, 2013-04-12 at 21:28 +0200, Andres Freund wrote: > That means we will have to do the verification for this in > ValidXLogRecord() *not* in RestoreBkpBlock or somesuch. Otherwise we > won't always recognize the end of WAL correctly. > And I am a bit wary of reducing the likelihood of noticing the proper > end-of-recovery by reducing the crc width. Good point. > Why again are we doing this now? Just to reduce the overhead of CRC > computation for full page writes? Or are we forseeing issues with the > page checksums being wrong because of non-zero data in the hole being > zero after the restore from bkp blocks? That shouldn't be a problem, because the block is not expected to have a proper checksum in WAL, and it will be recalculated before being written. So I see these changes as mostly independent. The reason we're discussing right now is because, when choosing the checksum algorithm, I was hoping that it might be usable in the future for WAL backup blocks. I'm convinced that they can be; and the primary question now seems to be "should they be", which does not need to be settled right now in my opinion. Anyway, I would be perfectly happy if we just got the SIMD algorithm in for data pages. The support for changing the WAL checksums seems lukewarm, and there might be quite a few alternatives (e.g. optimizing the CRC for backup blocks as Heikki suggested) to achieve that performance goal. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Fri, 2013-04-12 at 15:21 -0400, Bruce Momjian wrote: > > * When we do PageSetChecksumInplace(), we need to be 100% sure that the > > hole is empty; otherwise the checksum will fail when we re-expand it. It > > might be worth a memset beforehand just to be sure. > > Do we write the page holes to the WAL for full-page writes? I hope we > don't. No, but the page hole is included in the checksum. Let's say that the page hole contains some non-zero value, and we calculate a checksum. When we eliminate the page hole, and then reconstitute the page using zeros for the page hole later, then the page will not match the checksum any more. So, we need to be sure the original page hole is all-zero when we calculate the checksum. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 12 April 2013 21:03, Heikki Linnakangas wrote: > No, the patch has to compute the 16-bit checksum for the page when the > full-page image is added to the WAL record. There would otherwise be no need > to calculate the page checksum at that point, but only later when the page > is written out from shared buffer cache. > > I think this is a bad idea. It complicates the WAL format significantly. > Simon's patch didn't include the changes to recovery to validate the > checksum, but I suspect it would be complicated. And it reduces the > error-detection capability of WAL recovery. Keep in mind that unlike page > checksums, which are never expected to fail, so even if we miss a few errors > it's still better than nothing, the WAL checkum is used to detect > end-of-WAL. There is expected to be a failure every time we do crash > recovery. This far, we've considered the probability of one in 1^32 small > enough for that purpose, but IMHO one in 1^16 is much too weak. > > If you want to speed up the CRC calculation of full-page images, you could > have an optimized version of the WAL CRC algorithm, using e.g. SIMD > instructions. Because typical WAL records are small, max 100-200 bytes, and > it consists of several even smaller chunks, the normal WAL CRC calculation > is quite resistant to common optimization techniques. But it might work for > the full-page images. Let's not conflate it with the page checksums, though. I accept the general tone of that as a reasonable perspective and in many ways am on the fence myself. This is sensitive stuff. A few points * The code to validate the checksum isn't complex, though it is more than the current one line. Lets say about 10 lines of clear code. I'll work on that to show its true. I don't see that as a point of objection. * WAL checksum is not used as the sole basis for end-of-WAL discovery. We reuse the WAL files, so the prev field in each WAL record shows what the previous end of WAL was. Hence if the WAL checksums give a false positive we still have a double check that the data really is wrong. It's unbelievable that you'd get a false positive and then have the prev field match as well, even though it was the genuine end-of-WAL. Yes, we could also have a second SIMD calculation optimised for WAL CRC32 on an 8192 byte block, rather than just one set of SIMD code for both. We could also have a single set of SIMD code producing a 32-bit checksum, then take the low 16 bits as we do currently. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 12.04.2013 22:31, Bruce Momjian wrote: On Fri, Apr 12, 2013 at 09:28:42PM +0200, Andres Freund wrote: Only point worth discussing is that this change would make backup blocks be covered by a 16-bit checksum, not the CRC-32 it is now. i.e. the record header is covered by a CRC32 but the backup blocks only by 16-bit. That means we will have to do the verification for this in ValidXLogRecord() *not* in RestoreBkpBlock or somesuch. Otherwise we won't always recognize the end of WAL correctly. And I am a bit wary of reducing the likelihood of noticing the proper end-of-recovery by reducing the crc width. Why again are we doing this now? Just to reduce the overhead of CRC computation for full page writes? Or are we forseeing issues with the page checksums being wrong because of non-zero data in the hole being zero after the restore from bkp blocks? I thought the idea is that we were going to re-use the already-computed CRC checksum on the page, and we only have 16-bits of storage for that. No, the patch has to compute the 16-bit checksum for the page when the full-page image is added to the WAL record. There would otherwise be no need to calculate the page checksum at that point, but only later when the page is written out from shared buffer cache. I think this is a bad idea. It complicates the WAL format significantly. Simon's patch didn't include the changes to recovery to validate the checksum, but I suspect it would be complicated. And it reduces the error-detection capability of WAL recovery. Keep in mind that unlike page checksums, which are never expected to fail, so even if we miss a few errors it's still better than nothing, the WAL checkum is used to detect end-of-WAL. There is expected to be a failure every time we do crash recovery. This far, we've considered the probability of one in 1^32 small enough for that purpose, but IMHO one in 1^16 is much too weak. If you want to speed up the CRC calculation of full-page images, you could have an optimized version of the WAL CRC algorithm, using e.g. SIMD instructions. Because typical WAL records are small, max 100-200 bytes, and it consists of several even smaller chunks, the normal WAL CRC calculation is quite resistant to common optimization techniques. But it might work for the full-page images. Let's not conflate it with the page checksums, though. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Small reduction in memory usage of index relcache entries
While debugging an out-of-memory situation, I noticed that the relcache entry of each index consumes 2k of memory. From a memory dump: ... pg_database_datname_index: 2048 total in 1 blocks; 712 free (0 chunks); 1336 used pg_trigger_tgrelid_tgname_index: 2048 total in 1 blocks; 576 free (0 chunks); 1472 used pg_rewrite_rel_rulename_index: 2048 total in 1 blocks; 576 free (0 chunks); 1472 used pg_amproc_fam_proc_index: 2048 total in 1 blocks; 232 free (0 chunks); 1816 used pg_opclass_oid_index: 2048 total in 1 blocks; 664 free (0 chunks); 1384 used pg_index_indexrelid_index: 2048 total in 1 blocks; 664 free (0 chunks); 1384 used pg_attribute_relid_attnum_index: 2048 total in 1 blocks; 528 free (0 chunks); 1520 used pg_class_oid_index: 2048 total in 1 blocks; 664 free (0 chunks); 1384 used Looking closer at where that memory is spent, a lot of it goes into the FmgrInfo structs in RelationAmInfo. But some of them are outright unused (ambuild, ambuildempty, amcostestimate, amoptions), and a few others hardly are called so seldom that they hardly need to be cached (ambuildelete, ambacuumcleanup). Removing those fields, patch attached, reduces the memory usage nicely: ... pg_database_datname_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used pg_opclass_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used pg_index_indexrelid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used pg_class_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used This brings most index entries from two 1k blocks down to one 1k block. A backend seems to load about 60 indexes into the cache for system tables, so that saves about ~60 kB of memory for every backend at a minimum. This isn't a huge difference, unless you access a huge number of indexes. But the patch is trivial, and every little helps. Any objections if I commit this to master? - Heikki diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c index ce94649..b9e519a 100644 --- a/src/backend/access/index/indexam.c +++ b/src/backend/access/index/indexam.c @@ -671,14 +671,19 @@ index_bulk_delete(IndexVacuumInfo *info, void *callback_state) { Relation indexRelation = info->index; - FmgrInfo *procedure; + RegProcedure procOid; + FmgrInfo procedure; IndexBulkDeleteResult *result; RELATION_CHECKS; - GET_REL_PROCEDURE(ambulkdelete); + + procOid = indexRelation->rd_am->ambulkdelete; + if (!RegProcedureIsValid(procOid)) + elog(ERROR, "invalid ambulkdelete regproc"); + fmgr_info(procOid, &procedure); result = (IndexBulkDeleteResult *) - DatumGetPointer(FunctionCall4(procedure, + DatumGetPointer(FunctionCall4(&procedure, PointerGetDatum(info), PointerGetDatum(stats), PointerGetDatum((Pointer) callback), @@ -698,14 +703,19 @@ index_vacuum_cleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats) { Relation indexRelation = info->index; - FmgrInfo *procedure; + RegProcedure procOid; + FmgrInfo procedure; IndexBulkDeleteResult *result; RELATION_CHECKS; - GET_REL_PROCEDURE(amvacuumcleanup); + + procOid = indexRelation->rd_am->amvacuumcleanup; + if (!RegProcedureIsValid(procOid)) + elog(ERROR, "invalid amvacuumcleanup regproc"); + fmgr_info(procOid, &procedure); result = (IndexBulkDeleteResult *) - DatumGetPointer(FunctionCall2(procedure, + DatumGetPointer(FunctionCall2(&procedure, PointerGetDatum(info), PointerGetDatum(stats))); diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h index a4daf77..05f2011 100644 --- a/src/include/utils/rel.h +++ b/src/include/utils/rel.h @@ -47,8 +47,8 @@ typedef LockInfoData *LockInfo; /* - * Cached lookup information for the index access method functions defined - * by the pg_am row associated with an index relation. + * Cached lookup information for the frequently used index access method + * functions, defined by the pg_am row associated with an index relation. */ typedef struct RelationAmInfo { @@ -60,13 +60,7 @@ typedef struct RelationAmInfo FmgrInfo amendscan; FmgrInfo ammarkpos; FmgrInfo amrestrpos; - FmgrInfo ambuild; - FmgrInfo ambuildempty; - FmgrInfo ambulkdelete; - FmgrInfo amvacuumcleanup; FmgrInfo amcanreturn; - FmgrInfo amcostestimate; - FmgrInfo amoptions; } RelationAmInfo; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 2013-04-12 15:31:36 -0400, Bruce Momjian wrote: > On Fri, Apr 12, 2013 at 09:28:42PM +0200, Andres Freund wrote: > > > Only point worth discussing is that this change would make backup blocks > > > be > > > covered by a 16-bit checksum, not the CRC-32 it is now. i.e. the record > > > header is covered by a CRC32 but the backup blocks only by 16-bit. > > > > That means we will have to do the verification for this in > > ValidXLogRecord() *not* in RestoreBkpBlock or somesuch. Otherwise we > > won't always recognize the end of WAL correctly. > > And I am a bit wary of reducing the likelihood of noticing the proper > > end-of-recovery by reducing the crc width. > > > > Why again are we doing this now? Just to reduce the overhead of CRC > > computation for full page writes? Or are we forseeing issues with the > > page checksums being wrong because of non-zero data in the hole being > > zero after the restore from bkp blocks? > > I thought the idea is that we were going to re-use the already-computed > CRC checksum on the page, and we only have 16-bits of storage for that. Well, but the proposal seems to be to do this also for non-checksum enabled datadirs, so ... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to make pgindent work cleanly
On Fri, Apr 12, 2013 at 3:26 PM, Bruce Momjian wrote: > On Fri, Apr 12, 2013 at 01:34:49PM -0400, Gurjeet Singh wrote: > > Can you also improve the output when it dies upon failure to fetch > something? > > Currently the only error message it emits is "fetching xyz", and leaves > the > > user confused as to what really the problem was. The only indication of a > > problem might be the exit code, but I'm not sure of that, and that > doesn't > > help the interactive user running it on terminal. > > Good point. I have reviewed all the error messages and improved them > with the attached, applied patch, e.g.: > > cannot locate typedefs file "xyz" at /usr/local/bin/pgindent line > 121. > Thanks! -- Gurjeet Singh http://gurjeet.singh.im/ EnterpriseDB Inc.
Re: [HACKERS] Enabling Checksums
On Fri, Apr 12, 2013 at 09:28:42PM +0200, Andres Freund wrote: > > Only point worth discussing is that this change would make backup blocks be > > covered by a 16-bit checksum, not the CRC-32 it is now. i.e. the record > > header is covered by a CRC32 but the backup blocks only by 16-bit. > > That means we will have to do the verification for this in > ValidXLogRecord() *not* in RestoreBkpBlock or somesuch. Otherwise we > won't always recognize the end of WAL correctly. > And I am a bit wary of reducing the likelihood of noticing the proper > end-of-recovery by reducing the crc width. > > Why again are we doing this now? Just to reduce the overhead of CRC > computation for full page writes? Or are we forseeing issues with the > page checksums being wrong because of non-zero data in the hole being > zero after the restore from bkp blocks? I thought the idea is that we were going to re-use the already-computed CRC checksum on the page, and we only have 16-bits of storage for that. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 2013-04-11 20:12:59 +0100, Simon Riggs wrote: > On 11 April 2013 04:27, Jeff Davis wrote: > > > On Wed, 2013-04-10 at 20:17 +0100, Simon Riggs wrote: > > > > > OK, so we have a single combined "calculate a checksum for a block" > > > function. That uses Jeff's zeroing trick and Ants' bulk-oriented > > > performance optimization. > > > > > > > > > For buffer checksums we simply calculate for the block. > > > > Sounds good. > > > > > For WAL full page writes, we first set the checksums for all defined > > > buffers, then calculate the checksum of remaining data plus the > > > pd_checksum field from each block using the normal WAL CRC32. > > > > > > Seems good to me. One set of fast code. And it avoids the weirdness > > > that the checksum stored on the full page is actually wrong. > > > > Oh, that's a nice benefit. > > > So, if we apply a patch like the one attached, we then end up with the WAL > checksum using the page checksum as an integral part of its calculation. > (There is no increase in code inside WALInsertLock, nothing at all touched > in that area). > > Then all we need to do is make PageSetChecksumInplace() use Ants' algo and > we're done. > > Only point worth discussing is that this change would make backup blocks be > covered by a 16-bit checksum, not the CRC-32 it is now. i.e. the record > header is covered by a CRC32 but the backup blocks only by 16-bit. That means we will have to do the verification for this in ValidXLogRecord() *not* in RestoreBkpBlock or somesuch. Otherwise we won't always recognize the end of WAL correctly. And I am a bit wary of reducing the likelihood of noticing the proper end-of-recovery by reducing the crc width. Why again are we doing this now? Just to reduce the overhead of CRC computation for full page writes? Or are we forseeing issues with the page checksums being wrong because of non-zero data in the hole being zero after the restore from bkp blocks? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to make pgindent work cleanly
On Fri, Apr 12, 2013 at 01:34:49PM -0400, Gurjeet Singh wrote: > Can you also improve the output when it dies upon failure to fetch something? > Currently the only error message it emits is "fetching xyz", and leaves the > user confused as to what really the problem was. The only indication of a > problem might be the exit code, but I'm not sure of that, and that doesn't > help the interactive user running it on terminal. Good point. I have reviewed all the error messages and improved them with the attached, applied patch, e.g.: cannot locate typedefs file "xyz" at /usr/local/bin/pgindent line 121. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent index 2e9d443..73237ca 100755 --- a/src/tools/pgindent/pgindent +++ b/src/tools/pgindent/pgindent @@ -30,7 +30,7 @@ my %options = ( "excludes=s" => \$excludes, "indent=s"=> \$indent, "build" => \$build,); -GetOptions(%options) || die "bad command line"; +GetOptions(%options) || die "bad command line argument"; run_build($code_base) if ($build); @@ -118,10 +118,11 @@ sub load_typedefs if (-f "$tdtry/src/tools/pgindent/typedefs.list"); $tdtry = "$tdtry/.."; } - die "no typedefs file" unless $typedefs_file && -f $typedefs_file; + die "cannot locate typedefs file \"$typedefs_file\"" + unless $typedefs_file && -f $typedefs_file; open(my $typedefs_fh, '<', $typedefs_file) - || die "opening $typedefs_file: $!"; + || die "cannot open typedefs file \"$typedefs_file\": $!"; my @typedefs = <$typedefs_fh>; close($typedefs_fh); @@ -143,7 +144,7 @@ sub process_exclude { if ($excludes && @files) { - open(my $eh, '<', $excludes) || die "opening $excludes"; + open(my $eh, '<', $excludes) || die "cannot open exclude file \"$excludes\""; while (my $line = <$eh>) { chomp $line; @@ -162,7 +163,7 @@ sub read_source my $source; open(my $src_fd, '<', $source_filename) - || die "opening $source_filename: $!"; + || die "cannot open file \"$source_filename\": $!"; local ($/) = undef; $source = <$src_fd>; close($src_fd); @@ -177,7 +178,7 @@ sub write_source my $source_filename = shift; open(my $src_fh, '>', $source_filename) - || die "opening $source_filename: $!"; + || die "cannot open file \"$source_filename\": $!"; print $src_fh $source; close($src_fh); } @@ -436,25 +437,25 @@ sub run_build $code_base = "$code_base/.."; } - die "no src/tools/pgindent directory in $code_base" + die "cannot locate src/tools/pgindent directory in \"$code_base\"" unless -d "$code_base/src/tools/pgindent"; chdir "$code_base/src/tools/pgindent"; - my $rv = getstore("http://buildfarm.postgresql.org/cgi-bin/typedefs.pl";, - "tmp_typedefs.list"); + my $typedefs_list_url = "http://buildfarm.postgresql.org/cgi-bin/typedefs.pl";; - die "fetching typedefs.list" unless is_success($rv); + my $rv = getstore($typedefs_list_url, "tmp_typedefs.list"); + + die "cannot fetch typedefs list from $typedefs_list_url" unless is_success($rv); $ENV{PGTYPEDEFS} = abs_path('tmp_typedefs.list'); - my $pg_bsd_indent_name = "pg_bsd_indent-" . $INDENT_VERSION . ".tar.gz"; + my $pg_bsd_indent_url = "ftp://ftp.postgresql.org/pub/dev/pg_bsd_indent-"; . + $INDENT_VERSION . ".tar.gz"; - $rv = - getstore("ftp://ftp.postgresql.org/pub/dev/$pg_bsd_indent_name";, - "pg_bsd_indent.tgz"); + $rv = getstore($pg_bsd_indent_url, "pg_bsd_indent.tgz"); - die "fetching $pg_bsd_indent_name" unless is_success($rv); + die "cannot fetch BSD indent tarfile from $pg_bsd_indent_url" unless is_success($rv); # XXX add error checking here @@ -484,7 +485,7 @@ sub build_clean $code_base = "$code_base/.."; } - die "no src/tools/pgindent directory in $code_base" + die "cannot locate src/tools/pgindent directory in \"$code_base\"" unless -d "$code_base/src/tools/pgindent"; chdir "$code_base"; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Fri, Apr 12, 2013 at 12:07:36PM -0700, Jeff Davis wrote: > > (Attached patch is discussion only. Checking checksum in recovery > > isn't coded at all.) > > I like it. > > A few points: > > * Given that setting the checksum is unconditional in a backup block, do > we want to zero the checksum field when the backup block is restored if > checksums are disabled? Otherwise we would have a strange situation > where some blocks have a checksum on disk even when checksums are > disabled. > > * When we do PageSetChecksumInplace(), we need to be 100% sure that the > hole is empty; otherwise the checksum will fail when we re-expand it. It > might be worth a memset beforehand just to be sure. Do we write the page holes to the WAL for full-page writes? I hope we don't. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] (auto)vacuum truncate exclusive lock
Kevin Grittner writes: > For now what I'm suggesting is generating statistics in all the > cases it did before, plus the case where it starts truncation but > does not complete it. The fact that before this patch there were > cases where the autovacuum worker was killed, resulting in not > generating needed statistics seems like a bug, not a behavior we > need to preserve. Well, in the case where it gets killed, it's still not gonna generate statistics. What we've really got here is a new case that did not exist before, namely that it voluntarily stops truncating. But I agree that modeling that case's behavior on the kill case was poorly thought out. In other words, yes, I think we're on the same page. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Thu, 2013-04-11 at 20:12 +0100, Simon Riggs wrote: > So, if we apply a patch like the one attached, we then end up with the > WAL checksum using the page checksum as an integral part of its > calculation. (There is no increase in code inside WALInsertLock, > nothing at all touched in that area). > > > Then all we need to do is make PageSetChecksumInplace() use Ants' algo > and we're done. > > > Only point worth discussing is that this change would make backup > blocks be covered by a 16-bit checksum, not the CRC-32 it is now. i.e. > the record header is covered by a CRC32 but the backup blocks only by > 16-bit. FWIW, that's fine with me. > (Attached patch is discussion only. Checking checksum in recovery > isn't coded at all.) I like it. A few points: * Given that setting the checksum is unconditional in a backup block, do we want to zero the checksum field when the backup block is restored if checksums are disabled? Otherwise we would have a strange situation where some blocks have a checksum on disk even when checksums are disabled. * When we do PageSetChecksumInplace(), we need to be 100% sure that the hole is empty; otherwise the checksum will fail when we re-expand it. It might be worth a memset beforehand just to be sure. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detach/attach table and index data files from one cluster to another
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > The big win here over a binary COPY is pulling through the indexes as-is > > as well- without having to rebuild them. [... lots of reasons this is hard ...] I agree that it's quite a bit more difficult, to the point that logical replication which can be selective (eg: give me only table X + indexes) might end up being the only answer, but otherwise this approach will likely only be a modest improvement over binary COPY FREEZE- and there only because we essentially end up skipping the type validation (which we could just provide as an option, similar to COPY FREEZE...). Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [sepgsql 2/3] Add db_schema:search permission checks
>> A problem regarding to validation of sepgsql-regtest policy module >> is originated by semodule commands that takes root privilege to >> list up installed policy modules. So, I avoided to use this command >> in the test_sepgsql script. >> However, I have an idea that does not raise script fail even if "sudo >> semodule -l" returned an error, except for a case when it can run >> correctly and the policy version is not expected one. >> How about your opinion for this check? > > Not sure that's too useful. And I don't like the idea of putting sudo > commands in a test harness script. That seems too much like the sort > of thing bad people do. > OK, I also doubt whether my idea make sense. The attached patch omitted the portion to check the version of sepgsql-regtest, and add some notice in the document instead. Also, it moves current directory to the contrib/sepgsql on top of the script, to avoid the problem when we run test_sepgsql on the directory except for contring/sepgsql. Thanks, -- KaiGai Kohei sepgsql-v9.3-test-script-fixup.v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sepgsql 2/3] Add db_schema:search permission checks
2013/4/12 Robert Haas : > On Fri, Apr 12, 2013 at 10:42 AM, Alvaro Herrera > wrote: >> Robert Haas escribió: >>> On Mon, Apr 8, 2013 at 12:28 PM, Kohei KaiGai wrote: >> >>> > Also, the attached function-execute-permission patch is a rebased >>> > version. I rethought its event name should be OAT_FUNCTION_EXECUTE, >>> > rather than OAT_FUNCTION_EXEC according to the manner without >>> > abbreviation. Other portion is same as previous ones. >>> >>> Great. This looks fine to me, committed. I also committed your >>> getObjectIdentity patch, which caused some regression test output >>> changes. I applied the necessary correction before committing, I >>> think, but please check. >> >> I think the function-execute code path is still using >> getObjectDescription rather than identity. > > Yeah, I think so. KaiGai, can you provide a follow-on patch to fix that? > Yes, of course. The attached one replaces the getObjectDescription in sepgsql/proc.c, and relative changes in regression test. Thanks, -- KaiGai Kohei sepgsql-v9.3-replace-get-object-description.v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] (auto)vacuum truncate exclusive lock
[some relevant dropped bits of the thread restored] Tom Lane wrote: > Kevin Grittner writes: >> Tom Lane wrote: >>> Kevin Grittner writes: Jeff Janes wrote: I propose to do the following: (1) Restore the prior behavior of the VACUUM command. This was only ever intended to be a fix for a serious autovacuum problem which caused many users serious performance problems (2) If autovacuum decides to try to truncate but the lock cannot be initially acquired, and analyze is requested, skip the truncation and do the autoanalyze. >>> I think that the minimum appropriate fix here is to [...] take >>> out the suppression of stats reporting and analysis. >> >> I'm not sure I understand -- are you proposing that is all we do >> for both the VACUUM command and autovacuum? > > No, I said that was the minimum fix. OK, I suggested that and more, so I wasn't sure what you were getting at. > OK, I see that now. In the old behavior, of the lock was > acquired, but then we were shoved off from it, the analyze > was not done. But, in the old behavior if the lock was never > acquired at all, then it would go ahead to do the > autoanalyze, Ah, I see now. So the actual worst case for the old code, in terms of both head-banging and statistics, was if autovacuum was able to acquire the lock but then many tasks all piled up behind its lock. If the system was even *more* busy it would not acquire the lock at all, and would behave better. > and I suppose the rationale for suppressing the stats report was > this same idea of lying to the stats collector in order to > encourage a new vacuum attempt to happen right away. I think Jan expressed some such sentiment back during the original discussion. I was not persuaded by that; but he pointed out that if the deadlock killer killed an autovacuum process which was doing a truncate, the it did not get to the statistics phase; so I agreed that any change in that behavior should be a separate patch. I missed the fact that if it failed to initially get the lock it did proceed to the statistics phase. I explained this earlier in this thread. No need to cast about for hypothetical explanations. > Now I'm not sure that that's a good idea at all I'm pretty sure it isn't; that's why I proposed changing it. > But if it is reasonable, we need a redesign of the reporting > messages, not just a hack to not tell the stats collector what we > did. The idea was to try to make as small a change in previous behavior as possible. Jan pointed out that when the deadlock detection code killed an autovacuum worker which was trying to truncate, the statistics were not updated, leading to retries. This was an attempt to *not change* existing behavior. It was wrong, because we both missed the fact that if it didn't get the lock in the first place it went ahead with statistics generation. That being the case, I was proposing we always generate statistics if we were supposed to. That would be a change toward *more* up-to-date statistics and *fewer* truncation retries than we've had. I'm OK with that because a table hot enough to hit the issue will likely need the space again or need another vacuum soon. > Are you saying you intend to revert that whole concept? No. I was merely asking what you were suggesting. As I said earlier: I have seen cases where the old logic head-banged for hours or days without succeeding at the truncation attempt in autovacuum, absolutely killing performance until the user ran an explicit VACUUM. And in the meantime, since the deadlock detection logic was killing autovacuum before it got to the analyze phase, the autoanalyze was never done. > Otherwise we need some thought about how to inform the stats > collector what's really happening. I think we can probably improve that on some future release. I don't think a new scheme for that makes sense for back-patching or 9.3. For now what I'm suggesting is generating statistics in all the cases it did before, plus the case where it starts truncation but does not complete it. The fact that before this patch there were cases where the autovacuum worker was killed, resulting in not generating needed statistics seems like a bug, not a behavior we need to preserve. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detach/attach table and index data files from one cluster to another
Stephen Frost writes: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> I suppose it would still be faster than a COPY transfer, but I'm not >> sure it'd be enough faster to justify the work and the additional >> portability hits you'd be taking. > The big win here over a binary COPY is pulling through the indexes as-is > as well- without having to rebuild them. Meh. That raises the ante another substantial multiple with respect to the amount of portability risk (eg, you're now absolutely dependent on locale sort orders to be identical in both databases). And I think you'd have to freeze all updates to the table while you were copying the table+indexes, if you wanted them to be consistent. I can't imagine that we'd accept a patch that says to the recipient database, "here are some large binary blobs, please believe that they represent a valid table and associated indexes. Oh, and don't you dare try to actually check them, because that would be slow." Some other interesting things to think about here would be toast-table OIDs embedded in toast pointers, data type OIDs embedded in arrays (and maybe records too, I forget), enum value OIDs, btree vacuum cycle IDs, GiST NSNs ... not sure what else, but I bet that's not a complete list. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detach/attach table and index data files from one cluster to another
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Well, it wouldn't be that hard to replace XIDs with FrozenXID or > InvalidXID as appropriate, if you had access to the source database's > clog while you did the copying. It just wouldn't be very fast. If you're doing that in a streaming method, it strikes me that it'd be plenty fast. > I suppose it would still be faster than a COPY transfer, but I'm not > sure it'd be enough faster to justify the work and the additional > portability hits you'd be taking. The big win here over a binary COPY is pulling through the indexes as-is as well- without having to rebuild them. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] (auto)vacuum truncate exclusive lock
Tom Lane escribió: > Are you saying you intend to revert that whole concept? That'd be > okay with me, I think. Otherwise we need some thought about how to > inform the stats collector what's really happening. Maybe what we need is to consider table truncation as a separate activity from vacuuming. Then autovacuum could invoke it without having to do a full-blown vacuum. For this to work, I guess we would like to separately store the status of the back-scan in pgstat somehow (I think a boolean flag suffices: "were we able to truncate all pages that appeared to be empty?") -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] (auto)vacuum truncate exclusive lock
Kevin Grittner writes: > Tom Lane wrote: >> I think that the minimum appropriate fix here is to revert the hunk >> I quoted, ie take out the suppression of stats reporting and analysis. > I'm not sure I understand -- are you proposing that is all we do > for both the VACUUM command and autovacuum? No, I said that was the minimum fix. Looking again at the patch, I note this comment: /* +* We failed to establish the lock in the specified number of +* retries. This means we give up truncating. Suppress the +* ANALYZE step. Doing an ANALYZE at this point will reset the +* dead_tuple_count in the stats collector, so we will not get +* called by the autovacuum launcher again to do the truncate. +*/ and I suppose the rationale for suppressing the stats report was this same idea of lying to the stats collector in order to encourage a new vacuum attempt to happen right away. Now I'm not sure that that's a good idea at all --- what's the reasoning for thinking the table will be any less hot in thirty seconds? But if it is reasonable, we need a redesign of the reporting messages, not just a hack to not tell the stats collector what we did. Are you saying you intend to revert that whole concept? That'd be okay with me, I think. Otherwise we need some thought about how to inform the stats collector what's really happening. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to make pgindent work cleanly
On Fri, Apr 12, 2013 at 11:44 AM, Bruce Momjian wrote: > On Tue, Feb 19, 2013 at 04:50:45PM -0500, Gurjeet Singh wrote: > > Please find attached the patch for some cleanup and fix bit rot in > pgindent > > script. > > > > There were a few problems with the script. > > > > .) It failed to use the $ENV{PGENTAB} even if it was set. > > .) The file it tries to download from Postgres' ftp site is no longer > present. > > ftp://ftp.postgresql.org/pub/dev/indent.netbsd.patched.tgz > > .) The tar command extracted the above-mentioned file to a child > directory, but > > did not descend into it before running make on it. > > I think it expected a tarbomb, but clearly the current .tgz file on > ftp > > site is not a tarbomb. > > > > I don't like the fact that it dies with a message "fetching xyz" rather > than > > saying "Could not fetch xyz", but this patch does not address that since > it > > doesn't really affect the output when script does work. > > > > With this patch in place I could very easily run it on any arbitrary > file, > > which seemed like a black-magic before the patch. > > > > src/tools/pgindent/pgindent --build > > I have applied this patch. However, I modified the tarball name to > reference $INDENT_VERSION, rather than hard-coding "1.2". Thanks! Can you also improve the output when it dies upon failure to fetch something? Currently the only error message it emits is "fetching xyz", and leaves the user confused as to what really the problem was. The only indication of a problem might be the exit code, but I'm not sure of that, and that doesn't help the interactive user running it on terminal. -- Gurjeet Singh http://gurjeet.singh.im/ EnterpriseDB Inc.
Re: [HACKERS] [ADMIN] after 9.2.4 patch vacuumdb -avz not analyzing all tables
Looks like psql> vacuum (verbose, analyze) is not reflecting in pg_stat_user_tables as well in some cases. In this scenario I run the command, it outputs all the deleted pages etc (unlike the vacuumdb -avz analyze that seemed to be skipped in the log), but it does not update pg_stat_user_tables. Thats probably expected based on the description previously reported, but I wanted to confirm what I was seeing. On Fri, Apr 12, 2013 at 10:36 AM, Kevin Grittner wrote: > Scott Marlowe wrote: > > > Does this behavior only affect the 9.2 branch? Or was it ported > > to 9.1 or 9.0 or 8.4 as well? > > After leaving it on master for a while to see if anyone reported > problems in development, I back-patched as far as 9.0 in time for > the 9.2.3 (and related) patches. Prior to that the code was too > different for it to be the same patch, and (perhaps not entirely > coincidentally) I had not seen the problems before 9.0. From 9.0 > on I have seen multiple sites (all using queuing from Slony or a > JMS implementation) with recurring problems when the queue > temporarily got large, shrank again, and then wrapped around to the > beginning of the table's file space. In some cases performance was > so impaired that when such an event was triggered they would shut > down their application until a manual VACUUM could be run. > > -- > Kevin Grittner > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: [HACKERS] Enabling Checksums
On Wed, Apr 10, 2013 at 11:19:56AM -0700, Jeff Davis wrote: > On Wed, 2013-04-10 at 11:01 +0300, Ants Aasma wrote: > > I think we should first deal with using it for page checksums and if > > future versions want to reuse some of the code for WAL checksums then > > we can rearrange the code. > > Sounds good to me, although I expect we at least want any assembly to be > in a separate file (if the specialization makes it in 9.3). Sounds good. Simon has done a good job shepherding this to completion. My only question is whether the 16-bit page checksums stored in WAL reduce our ability to detect failed/corrupt writes to WAL? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] (auto)vacuum truncate exclusive lock
Andres Freund writes: > On 2013-04-12 13:09:02 -0400, Tom Lane wrote: >> However, we're still thinking too small. I've been wondering whether we >> couldn't entirely remove the dirty, awful kluges that were installed in >> the lock manager to kill autovacuum when somebody blocked behind it. >> This mechanism should ensure that AV never takes an exclusive lock >> for long enough to be a serious problem, so do we need that anymore? > Wouldn't that make DROP TABLE stop working while autovac is processing > the table? Meh ... I guess you're right. I wasn't thinking about exclusive locks being taken elsewhere. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] (auto)vacuum truncate exclusive lock
Tom Lane wrote: > Kevin Grittner writes: >> OK, will review that to confirm;but assuming that's right, and >> nobody else is already working on a fix, I propose to do the >> following: > >> (1) Restore the prior behavior of the VACUUM command. This was >> only ever intended to be a fix for a serious autovacuum problem >> which caused many users serious performance problems -- in some >> cases including unscheduled down time. I also saw sites where, >> having been bitten by this, they disabled autovacuum and later ran >> into problems with bloat and/or xid wraparound. > >> (2) If autovacuum decides to try to truncate but the lock cannot >> be initially acquired, and analyze is requested, skip the >> truncation and do the autoanalyze. If the table is so hot that we >> cannot get the lock, the space may get re-used soon, and if not >> there is a good chance another autovacuum will trigger soon. If >> the user really wants the space released to the OS immediately, >> they can run a manual vacuum to force the issue. > > I think that the minimum appropriate fix here is to revert the hunk > I quoted, ie take out the suppression of stats reporting and analysis. I'm not sure I understand -- are you proposing that is all we do for both the VACUUM command and autovacuum? (i.e., we don't try to full restore the old VACUUM command behavior; just the troublesome failure to generate statistics?) > However, we're still thinking too small. I've been wondering whether we > couldn't entirely remove the dirty, awful kluges that were installed in > the lock manager to kill autovacuum when somebody blocked behind it. > This mechanism should ensure that AV never takes an exclusive lock > for long enough to be a serious problem, so do we need that anymore? Are you suggesting that just in master/HEAD or back to 9.0? If the latter, what existing problem does it cure (besides ugly code)? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Object oriented programming language native to EAV
Programming language environment whose parser and interpreter are written with plpgsql. Proof of concept prototype has been tested. An object oriented programming language implemented with self-describing Entity Attribute Value model that stores objects and object metadata descriptions. The environment has features of a column-oriented database. High level language commands are mapped to sets of key-value pairs that comprise low level language code. http://sproutpl.wordpress.com
Re: [HACKERS] (auto)vacuum truncate exclusive lock
On 2013-04-12 13:09:02 -0400, Tom Lane wrote: > Kevin Grittner writes: > > OK, will review that to confirm;but assuming that's right, and > > nobody else is already working on a fix, I propose to do the > > following: > > > (1)� Restore the prior behavior of the VACUUM command.� This was > > only ever intended to be a fix for a serious autovacuum problem > > which caused many users serious performance problems -- in some > > cases including unscheduled down time.� I also saw sites where, > > having been bitten by this, they disabled autovacuum and later ran > > into problems with bloat and/or xid wraparound. > > > (2)� If autovacuum decides to try to truncate but the lock cannot > > be initially acquired, and analyze is requested, skip the > > truncation and do the autoanalyze.� If the table is so hot that we > > cannot get the lock, the space may get re-used soon, and if not > > there is a good chance another autovacuum will trigger soon.� If > > the user really wants the space released to the OS immediately, > > they can run a manual vacuum to force the issue. > > I think that the minimum appropriate fix here is to revert the hunk > I quoted, ie take out the suppression of stats reporting and analysis. > > However, we're still thinking too small. I've been wondering whether we > couldn't entirely remove the dirty, awful kluges that were installed in > the lock manager to kill autovacuum when somebody blocked behind it. > This mechanism should ensure that AV never takes an exclusive lock > for long enough to be a serious problem, so do we need that anymore? Wouldn't that make DROP TABLE stop working while autovac is processing the table? Considering how long e.g. a full table vacuum on a huge table can take that doesn't seem to be acceptable. Sure, you can manually kill the autovac process, but that will soon be restarted. So you have to know that you need to start the DROP TABLE beforehand so it will get the lock quicker and such. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] (auto)vacuum truncate exclusive lock
Kevin Grittner writes: > OK, will review that to confirm;but assuming that's right, and > nobody else is already working on a fix, I propose to do the > following: > (1) Restore the prior behavior of the VACUUM command. This was > only ever intended to be a fix for a serious autovacuum problem > which caused many users serious performance problems -- in some > cases including unscheduled down time. I also saw sites where, > having been bitten by this, they disabled autovacuum and later ran > into problems with bloat and/or xid wraparound. > (2) If autovacuum decides to try to truncate but the lock cannot > be initially acquired, and analyze is requested, skip the > truncation and do the autoanalyze. If the table is so hot that we > cannot get the lock, the space may get re-used soon, and if not > there is a good chance another autovacuum will trigger soon. If > the user really wants the space released to the OS immediately, > they can run a manual vacuum to force the issue. I think that the minimum appropriate fix here is to revert the hunk I quoted, ie take out the suppression of stats reporting and analysis. However, we're still thinking too small. I've been wondering whether we couldn't entirely remove the dirty, awful kluges that were installed in the lock manager to kill autovacuum when somebody blocked behind it. This mechanism should ensure that AV never takes an exclusive lock for long enough to be a serious problem, so do we need that anymore? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detach/attach table and index data files from one cluster to another
On Fri, Apr 12, 2013 at 10:22:38PM +0530, Pavan Deolasee wrote: > > > > On Fri, Apr 12, 2013 at 9:44 PM, Tom Lane wrote: > > Andrew Dunstan writes: > > On 04/12/2013 10:15 AM, Tom Lane wrote: > >> There's 0 chance of making that work, because the two databases > wouldn't > >> have the same notions of committed XIDs. > > > Yeah. Trying to think way outside the box, could we invent some sort of > > fixup mechanism that could be applied to adopted files? > > Well, it wouldn't be that hard to replace XIDs with FrozenXID or > InvalidXID as appropriate, if you had access to the source database's > clog while you did the copying. It just wouldn't be very fast. > > > > Would it be possible to fix the XIDs *after* copying the data files, > potentially on a different server so as to avoid any additional overhead on > the > main server ? I guess so, though we will probably need some mechanism to lock > out access to the table (which seems easy), flush all its data pages to the > disk and some way to reliably flush all clog pages as well so that they can be > copied along with the data files. The page LSNs seem to be easy to handle and > can be easily zeroed out outside the server. > > I wonder though if this all look like a material for something like pg_reorg > (pack) though some kind of support from the core may be required. Uh, now that you mention it, pg_upgrade in non-link mode does something similer, in that it copies the data files and clog. You could use pg_upgrade in non-link mode, run VACUUM FREEZE on the upgraded cluster, and then copy the data files. The only problem is that pg_upgrade can't upgrade tablespaces with the same system catalog version because the tablespace directory names would conflict. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pg_regress and non-default unix socket path
Robert Haas writes: > The hunk that changes the messages might need some thought so that it > doesn't cause a translation regression. But in general I see no > reason not to do this before we release beta1. It seems safe enough, > and changes that reduce the need for packagers to carry private > patches are, I think, generally a good thing. It looks to me like this is asking for pg_regress to adopt a nonstandard interpretation of PGHOST, which doesn't seem like a wise thing at all, especially if it's not documented. FWIW, the equivalent thing in the Red Hat/Fedora packages can be seen in this patch: http://pkgs.fedoraproject.org/cgit/postgresql.git/plain/postgresql-var-run-socket.patch which would not get noticeably shorter if we hacked pg_regress in the suggested way. AFAICT, instead of touching pg_regress.c, Red Hat's patch would need to do something to the regression Makefiles if we wanted to use this implementation. I'm not convinced that'd be better at all. TBH, if this is committed, the Red Hat patches will probably end up reverting it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detach/attach table and index data files from one cluster to another
On Fri, Apr 12, 2013 at 9:44 PM, Tom Lane wrote: > Andrew Dunstan writes: > > On 04/12/2013 10:15 AM, Tom Lane wrote: > >> There's 0 chance of making that work, because the two databases wouldn't > >> have the same notions of committed XIDs. > > > Yeah. Trying to think way outside the box, could we invent some sort of > > fixup mechanism that could be applied to adopted files? > > Well, it wouldn't be that hard to replace XIDs with FrozenXID or > InvalidXID as appropriate, if you had access to the source database's > clog while you did the copying. It just wouldn't be very fast. > > Would it be possible to fix the XIDs *after* copying the data files, potentially on a different server so as to avoid any additional overhead on the main server ? I guess so, though we will probably need some mechanism to lock out access to the table (which seems easy), flush all its data pages to the disk and some way to reliably flush all clog pages as well so that they can be copied along with the data files. The page LSNs seem to be easy to handle and can be easily zeroed out outside the server. I wonder though if this all look like a material for something like pg_reorg(pack) though some kind of support from the core may be required. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Detach/attach table and index data files from one cluster to another
On 2013-04-12 12:14:24 -0400, Tom Lane wrote: > Andrew Dunstan writes: > > On 04/12/2013 10:15 AM, Tom Lane wrote: > >> There's 0 chance of making that work, because the two databases wouldn't > >> have the same notions of committed XIDs. > > > Yeah. Trying to think way outside the box, could we invent some sort of > > fixup mechanism that could be applied to adopted files? > > Well, it wouldn't be that hard to replace XIDs with FrozenXID or > InvalidXID as appropriate, if you had access to the source database's > clog while you did the copying. It just wouldn't be very fast. I think if one goes over the heap and hint bits everything (so the item pointers don't have to be immediately rewritten), freeze everything and such it should be doable at about disk speed unless you have a really fast disk subsystem. But it still is fairly complicated and I doubt its really necessary. > I suppose it would still be faster than a COPY transfer, but I'm not > sure it'd be enough faster to justify the work and the additional > portability hits you'd be taking. Using binary copy might already give quite a speedup, Sameer, did you try that? Also, do you really need parts of a cluster or would a base backup of the whole cluster do the trick? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detach/attach table and index data files from one cluster to another
Andrew Dunstan writes: > On 04/12/2013 10:15 AM, Tom Lane wrote: >> There's 0 chance of making that work, because the two databases wouldn't >> have the same notions of committed XIDs. > Yeah. Trying to think way outside the box, could we invent some sort of > fixup mechanism that could be applied to adopted files? Well, it wouldn't be that hard to replace XIDs with FrozenXID or InvalidXID as appropriate, if you had access to the source database's clog while you did the copying. It just wouldn't be very fast. I suppose it would still be faster than a COPY transfer, but I'm not sure it'd be enough faster to justify the work and the additional portability hits you'd be taking. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detach/attach table and index data files from one cluster to another
Andrew Dunstan escribió: > > On 04/12/2013 10:15 AM, Tom Lane wrote: > >Sameer Thakur writes: > >>The proposed tool tries to make migration faster for tables and indices > >>only by copying their binary data files. > >There's 0 chance of making that work, because the two databases wouldn't > >have the same notions of committed XIDs. > > Yeah. Trying to think way outside the box, could we invent some sort > of fixup mechanism that could be applied to adopted files? Of > course, that could slow things down so much that it wouldn't be > worth it, but it might be a nice research project. I think the fixup procedure involves freezing Xids (prior to the transporting), which the OP said he didn't want to do. If you don't freeze beforehand, there's not enough info in the new cluster to know which tuples are dead/alive. Another option would be to have a "private" copy of pg_clog/pg_subtrans for the transported table(s), but that seems very difficult to arrange. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] (auto)vacuum truncate exclusive lock
Jeff Janes wrote: >> If we're going to have the message, we should make it useful. >> My biggest question here is not whether we should add this info, >> but whether it should be DEBUG instead of LOG > I like it being LOG. If it were DEBUG, I don't think anyone > would be likely to see it when they needed to, as it happens > sporadically on busy servers and I don't think people would run > those with DEBUG on. I figure it is analogous to an autovacuum > cancel message it partially replaces, and those are LOG. OK, sold. >>> Also, I think that permanently boycotting doing autoanalyze >>> because someone is camping out on an access share lock (or >>> because there are a never-ending stream of overlapping locks) >>> and so the truncation cannot be done is a bit drastic, >>> especially for inclusion in a point release. >> That much is not a change in the event that the truncation does >> not complete. > OK, I see that now. In the old behavior, of the lock was > acquired, but then we were shoved off from it, the analyze was > not done. But, in the old behavior if the lock was never > acquired at all, then it would go ahead to do the autoanalyze, > and that has changed. That is they way I was testing it > (camping out on an access shared lock so the access exclusive > could never be granted in the first place; because intercepting > it during the truncate phase was hard to do) and I just assumed > the behavior I saw would apply to both cases, but it does not. Ah, I see now. So the actual worst case for the old code, in terms of both head-banging and statistics, was if autovacuum was able to acquire the lock but then many tasks all piled up behind its lock. If the system was even *more* busy it would not acquire the lock at all, and would behave better. >> I have seen cases where the old logic head-banged for >> hours or days without succeeding at the truncation attempt in >> autovacuum, absolutely killing performance until the user ran an >> explicit VACUUM. And in the meantime, since the deadlock >> detection logic was killing autovacuum before it got to the >> analyze phase, the autoanalyze was never done. > OK, so there three problems. It would take a second to yield, in > doing so it would abandon all the progress it had made in that > second rather than saving it, and it would tight loop (restricted > by naptime) on this because of the lack of analyze. So it fixed > the first two in a way that seems an absolute improvement for the > auto case, but it made the third one worse in a common case, > where it never acquires the lock in the first place, and so > doesn't analyze when before it did in that one case. Yeah, I see that now. >> Perhaps the new logic should go ahead and get its lock even on a >> busy system (like the old logic), > As far as I can tell, the old logic was always conditional on the > AccessExlusive lock acquisition, whether it was manual or auto. OK, will review that to confirm;but assuming that's right, and nobody else is already working on a fix, I propose to do the following: (1) Restore the prior behavior of the VACUUM command. This was only ever intended to be a fix for a serious autovacuum problem which caused many users serious performance problems -- in some cases including unscheduled down time. I also saw sites where, having been bitten by this, they disabled autovacuum and later ran into problems with bloat and/or xid wraparound. (2) If autovacuum decides to try to truncate but the lock cannot be initially acquired, and analyze is requested, skip the truncation and do the autoanalyze. If the table is so hot that we cannot get the lock, the space may get re-used soon, and if not there is a good chance another autovacuum will trigger soon. If the user really wants the space released to the OS immediately, they can run a manual vacuum to force the issue. If I don't hear anything within the next day or two, I'll write that up and post it here before applying (and back-patching to affected branches). -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Analyzing bug 8049
"Dickson S. Guedes" writes: > In my tests, after ANALYZE _bug_header and _bug_line, the query plan > changes and query results returns as expected. Is this a chance that > things isn't too bad? No, it just means that in this particular scenario, the bug only manifests if a nestloop plan is chosen --- without that, there's no possibility of pushing a join clause down to the scan level anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detach/attach table and index data files from one cluster to another
On 04/12/2013 10:15 AM, Tom Lane wrote: Sameer Thakur writes: The proposed tool tries to make migration faster for tables and indices only by copying their binary data files. There's 0 chance of making that work, because the two databases wouldn't have the same notions of committed XIDs. Yeah. Trying to think way outside the box, could we invent some sort of fixup mechanism that could be applied to adopted files? Of course, that could slow things down so much that it wouldn't be worth it, but it might be a nice research project. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to make pgindent work cleanly
On Tue, Feb 19, 2013 at 04:50:45PM -0500, Gurjeet Singh wrote: > Please find attached the patch for some cleanup and fix bit rot in pgindent > script. > > There were a few problems with the script. > > .) It failed to use the $ENV{PGENTAB} even if it was set. > .) The file it tries to download from Postgres' ftp site is no longer present. > ftp://ftp.postgresql.org/pub/dev/indent.netbsd.patched.tgz > .) The tar command extracted the above-mentioned file to a child directory, > but > did not descend into it before running make on it. > I think it expected a tarbomb, but clearly the current .tgz file on ftp > site is not a tarbomb. > > I don't like the fact that it dies with a message "fetching xyz" rather than > saying "Could not fetch xyz", but this patch does not address that since it > doesn't really affect the output when script does work. > > With this patch in place I could very easily run it on any arbitrary file, > which seemed like a black-magic before the patch. > > src/tools/pgindent/pgindent --build I have applied this patch. However, I modified the tarball name to reference $INDENT_VERSION, rather than hard-coding "1.2". -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Analyzing bug 8049
Em Sex, 2013-04-12 às 10:58 -0400, Tom Lane escreveu: > Robert Haas writes: > > On Thu, Apr 11, 2013 at 1:25 PM, Tom Lane wrote: > >> The plan I'm considering is to get this written and committed to HEAD > >> in the next week, so that it can go out in 9.3beta1. After the patch > >> has survived a reasonable amount of beta testing, I'd be more comfortable > >> about back-patching into 9.2. > > > I'm not very sanguine about the chances that back-patching this won't > > provoke any screams of agony ... but I don't have a better idea, > > either. Letting queries return wrong answers isn't a superior > > solution, for sure. > > The only alternative I can see is to make a back-patch that just teaches > get_eclass_for_sort_expr() to compute valid nullable_relids for the sort > expression. In my tests, after ANALYZE _bug_header and _bug_line, the query plan changes and query results returns as expected. Is this a chance that things isn't too bad? []s -- Dickson S. Guedes mail/xmpp: gue...@guedesoft.net - skype: guediz http://guedesoft.net - http://www.postgresql.org.br http://www.rnp.br/keyserver/pks/lookup?search=0x8F3E3C06D428D10A signature.asc Description: This is a digitally signed message part
Re: [HACKERS] [PATCH] pg_regress and non-default unix socket path
Robert Haas escribió: > On Tue, Apr 9, 2013 at 8:08 AM, Christoph Berg wrote: > > Debian has been patching pg_regress for years because our default unix > > socket directory is /var/run/postgresql, but that is not writable by > > the build user at build time. This used to be a pretty ugly "make- > > patch-make check-unpatch-make install" patch dance, but now it is a > > pretty patch that makes pg_regress accept --host=/tmp. > > > > The patch is already part of the .deb files on apt.postgresql.org and > > passes all regression test suites there. > > > > Please consider it for 9.3. (And maybe backpatch it all the way...) > > The hunk that changes the messages might need some thought so that it > doesn't cause a translation regression. FWIW I don't think we translate pg_regress at all, and I have my doubts that it makes much sense. I'd go as far as suggest we get rid of the _() decorations in the lines we're changing. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] after 9.2.4 patch vacuumdb -avz not analyzing all tables
Scott Marlowe wrote: > Does this behavior only affect the 9.2 branch? Or was it ported > to 9.1 or 9.0 or 8.4 as well? After leaving it on master for a while to see if anyone reported problems in development, I back-patched as far as 9.0 in time for the 9.2.3 (and related) patches. Prior to that the code was too different for it to be the same patch, and (perhaps not entirely coincidentally) I had not seen the problems before 9.0. From 9.0 on I have seen multiple sites (all using queuing from Slony or a JMS implementation) with recurring problems when the queue temporarily got large, shrank again, and then wrapped around to the beginning of the table's file space. In some cases performance was so impaired that when such an event was triggered they would shut down their application until a manual VACUUM could be run. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to use EXPLAIN (TIMING)
On 2013-04-12 20:44:02 +0530, Robins Tharakan wrote: > Hi, > > While creating regression tests for EXPLAIN I am (somehow) unable to get > (TIMING) option to work with EXPLAIN! > > I must be doing something stupid but all these options below just didn't > work. Could someone point me to the right direction? EXPLAIN (ANALYZE, TIMING on/off) ...; Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] How to use EXPLAIN (TIMING)
Hi, While creating regression tests for EXPLAIN I am (somehow) unable to get (TIMING) option to work with EXPLAIN! I must be doing something stupid but all these options below just didn't work. Could someone point me to the right direction? mpf2=# explain (TIMING) SELECT 1; ERROR: EXPLAIN option TIMING requires ANALYZE mpf2=# EXPLAIN (TIMING FALSE) SELECT 1; QUERY PLAN -- Result (cost=0.00..0.01 rows=1 width=0) (1 row) mpf2=# EXPLAIN ANALYSE (TIMING) SELECT 1; ERROR: syntax error at or near "TIMING" LINE 1: EXPLAIN ANALYSE (TIMING) SELECT 1; ^ mpf2=# EXPLAIN ANALYSE (TIMING TRUE) SELECT 1; ERROR: syntax error at or near "TIMING" LINE 1: EXPLAIN ANALYSE (TIMING TRUE) SELECT 1; ^ mpf2=# EXPLAIN ANALYSE TIMING TRUE SELECT 1; ERROR: syntax error at or near "TIMING" LINE 1: EXPLAIN ANALYSE TIMING TRUE SELECT 1; ^ mpf2=# EXPLAIN ANALYSE TIMING SELECT 1; ERROR: syntax error at or near "TIMING" LINE 1: EXPLAIN ANALYSE TIMING SELECT 1; ^ mpf2=# EXPLAIN TIMING ANALYSE SELECT 1; ERROR: syntax error at or near "TIMING" LINE 1: EXPLAIN TIMING ANALYSE SELECT 1; ^ mpf2=# EXPLAIN (TIMING) ANALYSE SELECT 1; ERROR: syntax error at or near "ANALYSE" LINE 1: EXPLAIN (TIMING) ANALYSE SELECT 1; ^ mpf2=# EXPLAIN (TIMING) ANALYZE SELECT 1; ERROR: syntax error at or near "ANALYZE" LINE 1: EXPLAIN (TIMING) ANALYZE SELECT 1; ^ mpf2=# EXPLAIN (TIMING FALSE) ANALYZE SELECT 1; ERROR: syntax error at or near "ANALYZE" LINE 1: EXPLAIN (TIMING FALSE) ANALYZE SELECT 1; ^ mpf2=# SELECT version(); version -- PostgreSQL 9.2.3 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20120 305 (Red Hat 4.4.6-4), 64-bit (1 row) Thanks -- Robins Tharakan
Re: [HACKERS] [PATCH] pg_regress and non-default unix socket path
On Tue, Apr 9, 2013 at 8:08 AM, Christoph Berg wrote: > Debian has been patching pg_regress for years because our default unix > socket directory is /var/run/postgresql, but that is not writable by > the build user at build time. This used to be a pretty ugly "make- > patch-make check-unpatch-make install" patch dance, but now it is a > pretty patch that makes pg_regress accept --host=/tmp. > > The patch is already part of the .deb files on apt.postgresql.org and > passes all regression test suites there. > > Please consider it for 9.3. (And maybe backpatch it all the way...) The hunk that changes the messages might need some thought so that it doesn't cause a translation regression. But in general I see no reason not to do this before we release beta1. It seems safe enough, and changes that reduce the need for packagers to carry private patches are, I think, generally a good thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Analyzing bug 8049
Robert Haas writes: > On Thu, Apr 11, 2013 at 1:25 PM, Tom Lane wrote: >> The plan I'm considering is to get this written and committed to HEAD >> in the next week, so that it can go out in 9.3beta1. After the patch >> has survived a reasonable amount of beta testing, I'd be more comfortable >> about back-patching into 9.2. > I'm not very sanguine about the chances that back-patching this won't > provoke any screams of agony ... but I don't have a better idea, > either. Letting queries return wrong answers isn't a superior > solution, for sure. The only alternative I can see is to make a back-patch that just teaches get_eclass_for_sort_expr() to compute valid nullable_relids for the sort expression. That's necessary code in any case, and it would fix the immediately complained-of bug. The thing that scares me is that it is now clear that equivclass.c is capable of considering two expressions to be equivalent when they should not be; that is, I'm afraid there are variants of the problem that would not be cured by such a limited back-patch. But maybe I should try to create such an example before proposing the more invasive approach. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] after 9.2.4 patch vacuumdb -avz not analyzing all tables
Does this behavior only affect the 9.2 branch? Or was it ported to 9.1 or 9.0 or 8.4 as well? On Thu, Apr 11, 2013 at 7:48 PM, Kevin Grittner wrote: > Tom Lane wrote: > > > However I've got to say that both of those side-effects of > > exclusive-lock abandonment seem absolutely brain dead now that I > > see them. Why would we not bother to tell the stats collector > > what we've done? Why would we think we should not do ANALYZE > > when we were told to? > > > > Would someone care to step forward and defend this behavior? > > Because it's not going to be there very long otherwise. > > I'm pretty sure that nobody involved noticed the impact on VACUUM > ANALYZE command; all discussion was around autovacuum impact; and > Jan argued that this was leaving things in a status quo for that, > so I conceded the point and left it for a follow-on patch if > someone felt the behavior needed to change. Sorry for the miss. > > http://www.postgresql.org/message-id/50bb700e.8060...@yahoo.com > > As far as I'm concerned all effects on the explicit command were > unintended and should be reverted. > > -- > Kevin Grittner > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > -- > Sent via pgsql-admin mailing list (pgsql-ad...@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-admin > -- To understand recursion, one must first understand recursion.
Re: [HACKERS] [sepgsql 2/3] Add db_schema:search permission checks
On Fri, Apr 12, 2013 at 10:42 AM, Alvaro Herrera wrote: > Robert Haas escribió: >> On Mon, Apr 8, 2013 at 12:28 PM, Kohei KaiGai wrote: > >> > Also, the attached function-execute-permission patch is a rebased >> > version. I rethought its event name should be OAT_FUNCTION_EXECUTE, >> > rather than OAT_FUNCTION_EXEC according to the manner without >> > abbreviation. Other portion is same as previous ones. >> >> Great. This looks fine to me, committed. I also committed your >> getObjectIdentity patch, which caused some regression test output >> changes. I applied the necessary correction before committing, I >> think, but please check. > > I think the function-execute code path is still using > getObjectDescription rather than identity. Yeah, I think so. KaiGai, can you provide a follow-on patch to fix that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Analyzing bug 8049
On Thu, Apr 11, 2013 at 1:25 PM, Tom Lane wrote: > This idea needs more fleshing out, but it's seeming awfully attractive > right now. The big problem with it is that it's going to be a more > invasive patch than I feel terribly comfortable about back-patching. > However, I'm not sure there's much choice, because I don't see any narrow > fix for 9.2 that would not result in very substantial degradation of its > optimization ability. We can't just lobotomize equivalence-class > processing. > > The plan I'm considering is to get this written and committed to HEAD > in the next week, so that it can go out in 9.3beta1. After the patch > has survived a reasonable amount of beta testing, I'd be more comfortable > about back-patching into 9.2. I'm not very sanguine about the chances that back-patching this won't provoke any screams of agony ... but I don't have a better idea, either. Letting queries return wrong answers isn't a superior solution, for sure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sepgsql 2/3] Add db_schema:search permission checks
Robert Haas escribió: > On Mon, Apr 8, 2013 at 12:28 PM, Kohei KaiGai wrote: > > Also, the attached function-execute-permission patch is a rebased > > version. I rethought its event name should be OAT_FUNCTION_EXECUTE, > > rather than OAT_FUNCTION_EXEC according to the manner without > > abbreviation. Other portion is same as previous ones. > > Great. This looks fine to me, committed. I also committed your > getObjectIdentity patch, which caused some regression test output > changes. I applied the necessary correction before committing, I > think, but please check. I think the function-execute code path is still using getObjectDescription rather than identity. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detach/attach table and index data files from one cluster to another
Sameer Thakur writes: > The proposed tool tries to make migration faster for tables and indices > only by copying their binary data files. There's 0 chance of making that work, because the two databases wouldn't have the same notions of committed XIDs. You apparently don't understand what you read in the other discussion --- the steps you are objecting to are not optional, whether copying a whole tablespace or only one table. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] after 9.2.4 patch vacuumdb -avz not analyzing all tables
On further review this particular server skipped from 9.2.2 to 9.2.4. This is my most busy and downtime sensitive server and I was waiting on a maintenance window to patch to 9.2.3 when 9.2.4 dropped and bumped up the urgency. However, I have 3 other less busy production servers that were all running 9.2.3 for a while, didnt exhibit the problem, and still dont on 9.2.4. psql> analyze seems to work ok in the meantime, I'll report back if I notice any problems with that. Thanks very much for the response and investigation, it is much appreciated! On Thu, Apr 11, 2013 at 8:48 PM, Kevin Grittner wrote: > Tom Lane wrote: > > > However I've got to say that both of those side-effects of > > exclusive-lock abandonment seem absolutely brain dead now that I > > see them. Why would we not bother to tell the stats collector > > what we've done? Why would we think we should not do ANALYZE > > when we were told to? > > > > Would someone care to step forward and defend this behavior? > > Because it's not going to be there very long otherwise. > > I'm pretty sure that nobody involved noticed the impact on VACUUM > ANALYZE command; all discussion was around autovacuum impact; and > Jan argued that this was leaving things in a status quo for that, > so I conceded the point and left it for a follow-on patch if > someone felt the behavior needed to change. Sorry for the miss. > > http://www.postgresql.org/message-id/50bb700e.8060...@yahoo.com > > As far as I'm concerned all effects on the explicit command were > unintended and should be reverted. > > -- > Kevin Grittner > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: [HACKERS] [sepgsql 2/3] Add db_schema:search permission checks
On Mon, Apr 8, 2013 at 12:28 PM, Kohei KaiGai wrote: > Thanks. I could find two obvious wording stuffs here, please see smaller > one of the attached patches. I didn't fixup manner to use "XXX" in source > code comments. Committed. > Also, the attached function-execute-permission patch is a rebased > version. I rethought its event name should be OAT_FUNCTION_EXECUTE, > rather than OAT_FUNCTION_EXEC according to the manner without > abbreviation. Other portion is same as previous ones. Great. This looks fine to me, committed. I also committed your getObjectIdentity patch, which caused some regression test output changes. I applied the necessary correction before committing, I think, but please check. >> BTW, if it were possible to set things up so that the test_sepgsql >> script could validate the version of the sepgsql-regtest policy >> installed, that would eliminate a certain category of errors. I >> notice also that the regression tests themselves seem to fail if you >> invoke the script as contrib/sepgsql/test_sepgsql rather than >> ./test_sepgsql, which suggests another possible pre-validation step. >> > Please see the test-script-fixup patch. > I added "cd `dirname $0`" on top of the script. It makes pg_regress to > avoid this troubles. Probably, pg_regress was unavailable to read > sql commands to run. > > A problem regarding to validation of sepgsql-regtest policy module > is originated by semodule commands that takes root privilege to > list up installed policy modules. So, I avoided to use this command > in the test_sepgsql script. > However, I have an idea that does not raise script fail even if "sudo > semodule -l" returned an error, except for a case when it can run > correctly and the policy version is not expected one. > How about your opinion for this check? Not sure that's too useful. And I don't like the idea of putting sudo commands in a test harness script. That seems too much like the sort of thing bad people do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pg_regress and non-default unix socket path
Re: To PostgreSQL Hackers 2013-04-09 <20130409120807.gd26...@msgid.df7cb.de> If the patch looks too intrusive at this stage of the release, it would be enough if the last chunk got included, which should really be painless: > diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c > new file mode 100644 > index b632326..d4d4279 > *** a/src/test/regress/pg_regress.c > --- b/src/test/regress/pg_regress.c [...] > *** regression_main(int argc, char *argv[], > *** 2249,2255 >*/ > header(_("starting postmaster")); > snprintf(buf, sizeof(buf), > ! SYSTEMQUOTE "\"%s/postgres\" -D \"%s/data\" > -F%s -c \"listen_addresses=%s\" > \"%s/log/postmaster.log\" 2>&1" SYSTEMQUOTE, >bindir, temp_install, >debug ? " -d 5" : "", >hostname ? hostname : "", > --- 2254,2262 >*/ > header(_("starting postmaster")); > snprintf(buf, sizeof(buf), > ! hostname && *hostname == '/' > ! ? SYSTEMQUOTE "\"%s/postgres\" -D > \"%s/data\" -F%s -k \"%s\" > \"%s/log/postmaster.log\" 2>&1" SYSTEMQUOTE > ! : SYSTEMQUOTE "\"%s/postgres\" -D > \"%s/data\" -F%s -c \"listen_addresses=%s\" > \"%s/log/postmaster.log\" 2>&1" > SYSTEMQUOTE, >bindir, temp_install, >debug ? " -d 5" : "", >hostname ? hostname : "", Christoph -- c...@df7cb.de | http://www.df7cb.de/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GSOC] questions about idea "rewrite pg_dump as library"
On 04/11/2013 12:17 AM, Tom Lane wrote: Alvaro Herrera writes: Hannu Krosing wrote: Natural solution to this seems to move most of pg_dump functionality into backend as functions, so we have pg_dump_xxx() for everything we want to dump plus a topological sort function for getting the objects in right order. This idea doesn't work because of back-patch considerations (i.e. we would not be able to create the functions in back branches, and so this new style of pg_dump would only work with future server versions). So pg_dump itself would have to retain capability to dump stuff from old servers. This seems unlikely to fly at all, because we'd be then effectively maintaining pg_dump in two places, both backend and the pg_dump source code. There are other issues too, in particular that most of the backend's code tends to work on SnapshotNow time whereas pg_dump would really prefer it was all done according to the transaction snapshot. I was just thinking of moving the queries the pg_dump currently uses into UDF-s, which do _not_ use catalog cache, but will use the same SQL to query catalogs as pg_dump currently does using whatever snapshot mode is currently set . the pg_dump will need to still have the same queries for older versions of postgresql but for new versions pg_dump can become catalog-agnostic. and I think that we can retire pg_dump support for older postgresql versions the same way we drop support for older versions of postgresql itself. Hannu We have got bugs of that ilk already in pg_dump, but we shouldn't introduce a bunch more. Doing this right would therefore mean that we'd have to write a lot of duplicative code in the backend, ie, it's not clear that we gain any synergy by pushing the functionality over. It might simplify cross-backend-version issues (at least for backend versions released after we'd rewritten all that code) but otherwise I'm afraid it'd just be pushing the problems somewhere else. In any case, "push it to the backend" offers no detectable help with the core design issue here, which is figuring out what functionality needs to be exposed with what API. main things I see would be * get_list_of_objects(object_type, pattern or namelist) * get_sql_def_for_object(object_type, object_name) * sort_by_dependency(list of [obj_type, obj_name]) from this you could easily construct most uses, especially if sort_by_dependency(list of [obj_type, obj_name]) would be smart enough to break circular dependencies, like turning to tables with mutual FK-s into tabledefs without FKs + separate constraints. Or we could always have constraints separately, so that the ones depending on non-exported objects would be easy to leave out My be the dependency API analysis itself is something worth a GSOC effort ? Hannu regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inconsistent DB data in Streaming Replication
On 2013-04-12 16:58:44 +0530, Pavan Deolasee wrote: > On Fri, Apr 12, 2013 at 4:29 PM, Andres Freund wrote: > > > > > > > I don't think that holds true at all. If you look at pg_stat_bgwriter in > > any remotely bugs cluster with a hot data set over shared_buffers you'll > > notice that a large percentage of writes will have been done by backends > > themselves. > > > > Even if what you are saying it true, which I am sure is, the pages that the > backend is evicting mustn't be recently used by the LRU algorithm which > means that the WAL pertaining to the last change to the page in most > likelihood is already replicated, unless the replication is really lagging > much behind. Of course, if the standby is not able to keep pace with the > master in a realistic manner then we have a problem with the approach. It frequently takes time in the sub to few second range for usagecounts in zero. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inconsistent DB data in Streaming Replication
On Fri, Apr 12, 2013 at 4:29 PM, Andres Freund wrote: > > > I don't think that holds true at all. If you look at pg_stat_bgwriter in > any remotely bugs cluster with a hot data set over shared_buffers you'll > notice that a large percentage of writes will have been done by backends > themselves. > Even if what you are saying it true, which I am sure is, the pages that the backend is evicting mustn't be recently used by the LRU algorithm which means that the WAL pertaining to the last change to the page in most likelihood is already replicated, unless the replication is really lagging much behind. Of course, if the standby is not able to keep pace with the master in a realistic manner then we have a problem with the approach. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Inconsistent DB data in Streaming Replication
On 2013-04-12 11:18:01 +0530, Pavan Deolasee wrote: > On Thu, Apr 11, 2013 at 8:39 PM, Ants Aasma wrote: > > > On Thu, Apr 11, 2013 at 5:33 PM, Hannu Krosing > > wrote: > > > On 04/11/2013 03:52 PM, Ants Aasma wrote: > > >> > > >> On Thu, Apr 11, 2013 at 4:25 PM, Hannu Krosing > > >> wrote: > > >>> > > >>> The proposed fix - halting all writes of data pages to disk and > > >>> to WAL files while waiting ACK from standby - will tremendously > > >>> slow down all parallel work on master. > > >> > > >> This is not what is being proposed. The proposed fix halts writes of > > >> only data pages that are modified within the window of WAL that is not > > >> yet ACKed by the slave. This means pages that were recently modified > > >> and where the clocksweep or checkpoint has decided to evict them. This > > >> only affects the checkpointer, bgwriter and backends doing allocation. > > >> Furthermore, for the backend clocksweep case it would be reasonable to > > >> just pick another buffer to evict. The slowdown for most actual cases > > >> will be negligible. > > > > > > You also need to hold back all WAL writes, including the ones by > > > parallel async and locally-synced transactions. Which means that > > > you have to make all locally synced transactions to wait on the > > > syncrep transactions committed before them. > > > After getting the ACK from slave you then have a backlog of stuff > > > to write locally, which then also needs to be sent to slave. Basically > > > this turns a nice smooth WAL write-and-stream pipeline into a > > > chunky wait-and-write-and-wait-and-stream-and-wait :P > > > This may not be a problem in slight write load cases, which is > > > probably the most widely happening usecase for postgres, but it > > > will harm top performance and also force people to get much > > > better (and more expensive) hardware than would otherways > > > be needed. > > > > Why would you need to hold back WAL writes? WAL is written on master > > first and then steamed to slave as it is done now. You would only need > > hold back dirty page evictions having a recent enough LSN to not yet > > be replicated. This holding back is already done to wait for local WAL > > flushes, see bufmgr.c:1976 and bufmgr.c:669. When a page gets dirtied > > it's usage count gets bumped, so it will not be considered for > > eviction for at least one clocksweep cycle. In normal circumstances > > that will be enough time to get an ACK from the slave. When WAL is > > generated at an higher rate than can be replicated this will not be > > true. In that case backends that need to bring in new pages will have > > to wait for WAL to be replicated before they can continue. That will > > hopefully include the backends that are doing the dirtying, throttling > > the WAL generation rate. This would definitely be optional behavior, > > not something turned on by default. > > > > > I agree. I don't think the proposes change would cause a lot of performance > bottleneck since the proposal is to hold back writing of dirty pages until > the WAL is replicated successfully to the standby. The heap pages are > mostly written by the background threads often much later than the WAL for > the change is written. So in all likelihood, there will be no wait > involved. Of course, this will not be true for very frequently updated > pages that must be written at a checkpoint. I don't think that holds true at all. If you look at pg_stat_bgwriter in any remotely bugs cluster with a hot data set over shared_buffers you'll notice that a large percentage of writes will have been done by backends themselves. Yes, we need to improve on this, and we are talking about it right now in another thread, but until thats solved this argumentation seems to fall flat on its face. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inconsistent DB data in Streaming Replication
On 2013-04-12 02:29:01 +0900, Fujii Masao wrote: > On Thu, Apr 11, 2013 at 10:25 PM, Hannu Krosing wrote: > > > > You just shut down the old master and let the standby catch > > up (takas a few microseconds ;) ) before you promote it. > > > > After this you can start up the former master with recovery.conf > > and it will follow nicely. > > No. When you shut down the old master, it might not have been > able to send all the WAL records to the standby. I have observed > this situation several times. So in your approach, new standby > might fail to catch up with the master nicely. It seems most of this thread is focusing on the wrong thing then. If we really are only talking about planned failover then we need to solve *that* not some ominous "don't flush data too early" which has noticeable performance and implementation complexity problems. I guess youre observing that not everything is replicated because youre doing an immediate shutdown - probably because performing the shutdown checkpoint would take too long. This seems solveable by implementing a recovery connection command which initiates a shutdown that just disables future WAL inserts and returns the last lsn that has been written. Then you can fall over as soon as that llsn has been reached and can make the previous master follow from there on without problems. You could even teach the standby not to increment the timeline in that case since thats safe. The biggest issue seems to be how to implement this without another spinlock acquisition for every XLogInsert(), but that seems possible. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GSOC] questions about idea "rewrite pg_dump as library"
On 04/11/2013 11:48 PM, Andrew Dunstan wrote: It could be interesting to have a library that would output database metadata in some machine readable and manipulatable format such as JSON or XML. One thing that's annoying about the text output pg_dump produces is that it's not at all structured, so if you want, for example, to restore a table but to a table of a different name, or to a different schema, then you're reduced to having to mangle the SQL by using hand editing or regular expression matching. Something with the potential to ease that pain would be worth having. Yes. This is really interesting. Current code in pg_dump, supports 4 ArchiveFormat, e.g. archCustom, archTar, archNull and archDirectory. These formats are implementation of interface "pg_backup". Maybe I could try to add two implementation of "XML" and "JSON". It is worth to mention that I wrote a program to parse XML format file into csv one using library "libxercesc" a month ago, Although, this program is just like helloworld. But, maybe I could get benefit from that small program, because both of them use XML format. And what I need to do is try another xml library. I had a look at JSON on wiki. The format is a little like XML. Both of them are nested. And there are some library could be used, e.g. libjson (or json.c, or other json library writting in C) and libxml2 (or something else). BTW, could it be an idea for GSOC? If so, I can have a try. Add XML and JSON output format for pg_dump. Thank you all for your attention. Best regards, Shuai -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GSOC] questions about idea "rewrite pg_dump as library"
Michael Paquier writes: > I recall discussions about reverse engineering of a parsed query tree in > the event trigger threads but nothing has been committed I think. Also, you Yes. The name used in there was "Normalized Command String". > need to consider that implementing such reverse engineering mechanism in > core might not be a good thing for new features and maintenance, as it > would mean that it is necessary to change those APIs consistently with what > is added on the parsing side. The goal is to retain the same API, which is quite simple: function get_command_string(Node *parsetree) returns text At the SQL level, the Node * is of datatype "internal" and you can't forge it, you need to be given it in some ways. In the Event Trigger case we though of a TG_PARSETREE magic variable, or maybe another function get_current_parsetree() that only work when called from an event trigger. The other part of the API of course is how to represent the data, and as we're talking about a Normalized Command String, there's no choice but issuing a valid SQL command string that the server would know how to execute and which would have the same side effects. So of course a 9.3 and a 9.4 server equiped with that hypothetical function would behave differently when the syntax did change. And that's exactly why I think it the best choice here is to have that code embedded and maintained in core. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GSOC] questions about idea "rewrite pg_dump as library"
Tom Lane writes: > Alvaro Herrera writes: >> This idea doesn't work because of back-patch considerations (i.e. we >> would not be able to create the functions in back branches, and so this >> new style of pg_dump would only work with future server versions). So That is a policy question, not a technical one. We could either add the new functions in the backend binary itself, or provide it as an extension that pg_dump would know to install when needed, if we decided it's ok. My understanding is that will need to change that policy anyways the day we have a page disk format change and pg_upgrade needs to flag the old cluster pages with the old page version number before being able to run, or something. > There are other issues too, in particular that most of the backend's > code tends to work on SnapshotNow time whereas pg_dump would really > prefer it was all done according to the transaction snapshot. We have Would that be solved by having MVCC catalogs, or the backend code you're talking about wouldn't be included in there? (which would be surprising to me, as much as trumping the benefits of MVCC catalogs, but well). > In any case, "push it to the backend" offers no detectable help with the > core design issue here, which is figuring out what functionality needs > to be exposed with what API. Andrew did begin to work on that parts with the "Retail DDL" project. We know of several "getddl" implementation, and you can also have a look at the pg_dump -Fs (split format) patch that didn't make it for 9.3, where some API work has been done. The need exists and some thinking over the API to get here did happen. Some more certainly needs to be done, granted. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add sql_drop event for event triggers
Tom Lane writes: > Yeah, I was just looking at the IfSupported variant. In the structure > I just suggested (separate ProcessSlowUtility function), we could make > that work by having switch cases for some statements in both functions, I've done it the way you propose here, and then in the Slow variant we have two set of cases again: those with some manual transactionnal behavior or some other code complexities, and the really simple ones. The attached patch involves a second layer of distinction to simplify the code fully and remove all the Event Trigger related macrology that I didn't much like. Maybe that's going a tad too far, see what you think. Of course the patch passes make check. Also, some switch cases are now duplicated for the sole benefit of keeping some compiler help, but I don't know how much help we're talking about now that we have 3 different switches. It seemed cleaner to do it that way as it allows to ERROR out in the very last default case. Finally, I've been surprised to find out that those cases are only triggering for "ddl_event_start" (and not "ddl_event_end"), I think that's a bug we should be fixing: case T_AlterDomainStmt: case T_DefineStmt: case T_IndexStmt: /* CREATE INDEX */ The T_IndexStmt should be triggering ddl_event_end when stmt->concurrent is false, and that's not the case in the code in the master's branch. The two other cases should just be moved to the later switch so that both start and end triggers happen. Or maybe I'm missing something here. Given that it might as well be the case, I did only refactor the code keeping its current behavior rather than fixing what I think is a bug while doing so. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support *** a/src/backend/tcop/utility.c --- b/src/backend/tcop/utility.c *** *** 68,73 --- 68,80 /* Hook for plugins to get control in ProcessUtility() */ ProcessUtility_hook_type ProcessUtility_hook = NULL; + /* local function declarations */ + void standard_ProcessSlowUtility(Node *parsetree, + const char *queryString, + ParamListInfo params, + DestReceiver *dest, + char *completionTag, + ProcessUtilityContext context); /* * Verify user has ownership of specified relation, else ereport. *** *** 342,411 ProcessUtility(Node *parsetree, dest, completionTag, context); } - #define InvokeDDLCommandEventTriggers(parsetree, fncall) \ - do { \ - if (isCompleteQuery) \ - { \ - EventTriggerDDLCommandStart(parsetree); \ - } \ - fncall; \ - if (isCompleteQuery) \ - { \ - EventTriggerSQLDrop(parsetree); \ - EventTriggerDDLCommandEnd(parsetree); \ - } \ - } while (0) - - #define InvokeDDLCommandEventTriggersIfSupported(parsetree, fncall, objtype) \ - do { \ - bool _supported = EventTriggerSupportsObjectType(objtype); \ - \ - if (_supported) \ - { \ - EventTriggerDDLCommandStart(parsetree); \ - } \ - fncall; \ - if (_supported) \ - { \ - EventTriggerSQLDrop(parsetree); \ - EventTriggerDDLCommandEnd(parsetree); \ - } \ - } while (0) - /* ! * UTILITY_BEGIN_QUERY and UTILITY_END_QUERY are a pair of macros to enclose ! * execution of a single DDL command, to ensure the event trigger environment ! * is appropriately set up before starting, and tore down after completion or ! * error. */ - #define UTILITY_BEGIN_QUERY(isComplete) \ - do { \ - bool _needCleanup; \ - \ - _needCleanup = (isComplete) && EventTriggerBeginCompleteQuery(); \ - \ - PG_TRY(); \ - { \ - /* avoid empty statement when followed by a semicolon */ \ - (void) 0 - - #define UTILITY_END_QUERY() \ - } \ - PG_CATCH(); \ - { \ - if (_needCleanup) \ - { \ - EventTriggerEndCompleteQuery(); \ - } \ - PG_RE_THROW(); \ - } \ - PG_END_TRY(); \ - if (_needCleanup) \ - { \ - EventTriggerEndCompleteQuery(); \ - } \ - } while (0) - void standard_ProcessUtility(Node *parsetree, const char *queryString, --- 349,364 dest, completionTag, context); } /* ! * Processing a Utility statement is now divided into two functions. A fast ! * path is taken for statements for which we have not Event Trigger support, so ! * that there's no need to do any Event Trigger specific setup. ! * ! * It's not just a performance requirement, mind you, because refreshing the ! * Event Triggers cache (see src/backend/utils/cache/evtcache.c) is prone to ! * taking a snapshot, which can not occur with in a START TRANSACTION statement ! * for example. */ void standard_ProcessUtility(Node *parsetree, const char *queryString, *** *** 415,429 standard_ProcessUtility(Node *parsetree, ProcessUtilityContext context) { bool isTopLevel = (context == PROCESS_UTILITY_TOPLEVEL);
[HACKERS] Detach/attach table and index data files from one cluster to another
Hello, The current process of transferring data files from one cluster to another by using pg_dump and pg_restore is time consuming. The proposed tool tries to make migration faster for tables and indices only by copying their binary data files. This is like pg_upgrade but used for migration of table and indices * *The discussion here @ http://www.postgresql.org/message-id/caa-alv5cqf09zvfrcb1xxuqlsp-oux0s_hq6ryscd6ctami...@mail.gmail.com speaks of possibility detaching/attaching databases as an alternative to dump/restore. But the process of freezing XID’s and zeroing out LSN’s make the solution equally time consuming if not more. But if we consider just tables and indexes to be detached/reattached, would this be a viable alternative to dump and restore of tables? The same discussion indicates it could be done but is more complicated as one has to deal with system catalogs of the newly mounted table and map old OID’s to new ones. This is required to ensure consistency in roles, and objects owned by those roles. We would also need to ensure LSN values of the reattached pages are less than the current WAL endpoint in receiver. Are there any more issues we need to be aware of? regards Sameer
Re: [HACKERS] Inconsistent DB data in Streaming Replication
On 04/11/2013 07:29 PM, Fujii Masao wrote: On Thu, Apr 11, 2013 at 10:25 PM, Hannu Krosing wrote: You just shut down the old master and let the standby catch up (takas a few microseconds ;) ) before you promote it. After this you can start up the former master with recovery.conf and it will follow nicely. No. When you shut down the old master, it might not have been able to send all the WAL records to the standby. In what cases (other than a standby lagging too much or not listening at all) have you observed this ? I have observed this situation several times. So in your approach, new standby might fail to catch up with the master nicely. the page http://wiki.postgresql.org/wiki/Streaming_Replication claims this: * Graceful shutdown When smart/fast shutdown is requested, the primary waits to exit until XLOG records have been sent to the standby, up to the shutdown checkpoint record. Maybe you were requesting immediate shutdown ? Regards Hannu Krosing -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers