Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL
On Friday, November 23, 2012 10:10 PM Fujii Masao wrote: On Fri, Nov 23, 2012 at 6:56 PM, Amit Kapila wrote: >> When I remove postgresql.auto.conf, SET PERSISTENT failed. >> >>> =# SET PERSISTENT synchronous_commit = 'local'; >>> ERROR: failed to open "postgresql.auto.conf" file > >> There can be 2 ways to handle this, either we recreate the >> "postgresql.auto.conf" file or give error. >> I am not sure if user tries to delete internal files, what should be exact >> behavior? >> Any suggestion? > I prefer to recreate it. >$PGDATA/config_dir is specified in include_dir by default. Users might >create and remove the configuration files in that directory many times. >So I'm not surprised even if a user accidentally removes >postgresql.auto.conf in that directory. Also users might purposely remove >that file to reset all the settings by SET PERSISTENT. One of the suggestion in this mail chain was if above happens then at restart, it should display/log message. However I think if such a situation happens then we should provide some mechanism to users so that the settings through command should work. > So I think that SET >PERSISTENT should handle the case where postgresql.auto.conf doesn't >exist. If we directly create a file user might not be aware that his settings done in previous commands will not work. So how about if we display NOTICE also when internally for SET PERSISTENT new file needs to be created. Notice should suggest that the settings done by previous commands are lost due to manual deletion of internal file. >We might be able to expect that postgresql.auto.conf is not deleted by >a user if it's in $PGDATA/global or base directory. So do you mean to say that if we don't find file in config_dir, then we should search in $PGDATA/global or base directory? Can't we mandate to user that even if it is deleted, the file has to be only expected in config_dir. >>> We should implement "RESET PERSISTENT"? Otherwise, there is no way >>> to get rid of the parameter setting from postgresql.auto.conf, via SQL. >>> Also >>> We should implement "SET PERSISTENT name TO DEFAULT"? > >> Till now, I have not implemented this in patch, thinking that it can be done >> as a 2nd part if basic stuff is ready. >> However I think you are right without one of "RESET PERSISTENT" or "SET >> PERSISTENT name TO DEFAULT", it is difficult for user >> to get rid of parameter. >> Will "SET PERSISTENT name TO DEFAULT" be sufficient or do you think both are >> necessary, because RESET PERSISTENT also internally might need >> to behave similarly. > >> For implementation of "SET PERSISTENT name TO DEFAULT", there can be 2 ways >> 1) Delete the entry from postgresql.auto.conf >> 2) Update the entry value in postgresql.auto.conf to default value >Both seems to be useful. I think that "SET ... TO DEFAULT" is suitable for >2), and "RESET PERSISTENT ..." is suitable for 1). For other commands behavior for "SET ... TO DEFAULT" and "RESET ..." is same. In the docs it is mentioned "RESET "is an alternative name for "SET ... TO DEFAULT" However still the way you are telling can be done. Others, any objection to it, else I will go ahead with it? > Another comment is: > What happens if the server crashes while SET PERSISTENT is writing the > setting to the file? A partial write occurs and restart of the server would > fail > because of corrupted postgresql.auto.conf? This situation will not happen as SET PERSISTENT command will first write to ".lock" file and then at commit time, rename it to ".auto.conf". With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: In pg_upgrade, copy fsm, vm, and extent files by checking for fi
On Wed, Nov 14, 2012 at 06:28:41PM -0500, Bruce Momjian wrote: > On Wed, Nov 14, 2012 at 06:15:30PM -0500, Tom Lane wrote: > > Bruce Momjian writes: > > > On Wed, Nov 14, 2012 at 05:39:29PM -0500, Tom Lane wrote: > > >> You would have been better off keeping the array and sorting it so you > > >> could use binary search, instead of passing the problem off to the > > >> filesystem. > > > > > Well, testing showed using open() was a big win. > > > > ... on the filesystem you tested on. I'm concerned that it might not > > look so good on other platforms. > > True. I am on ext3. So I need to generate a proof-of-concept patch and > have others test it? OK, test patch attached. The patch will only work if your database uses a single tablespace. Doing multiple tablespaces seemed too complex for the test patch. Here are my results: # tables gitbsearch patch 111.16 10.99 100019.13 19.26 200026.78 27.11 400043.81 42.15 800079.96 77.38 16000 165.26162.24 32000 378.18368.49 64000 1083.35 1086.77 As you can see, the bsearch code doesn't see to make much difference. Sometimes it is faster, other times, slower --- seem to be just measurement noise. This code uses the same method of file lookup as git head, meaning it looks for specific files rather than patterns --- this simplified the bsearch code. Can anyone see a consistent improvement with the bsearch patch? Attached is my test shell script. There is no reason we can't use bsearch(), except not using it allows us to remove the pg_upgrade directory listing function. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/file.c b/contrib/pg_upgrade/file.c new file mode 100644 index d8cd8f5..a5d92c6 *** a/contrib/pg_upgrade/file.c --- b/contrib/pg_upgrade/file.c *** copy_file(const char *srcfile, const cha *** 221,226 --- 221,281 #endif + /* + * load_directory() + * + * Read all the file names in the specified directory, and return them as + * an array of "char *" pointers. The array address is returned in + * *namelist, and the function result is the count of file names. + * + * To free the result data, free each (char *) array member, then free the + * namelist array itself. + */ + int + load_directory(const char *dirname, char ***namelist) + { + DIR *dirdesc; + struct dirent *direntry; + int count = 0; + int allocsize = 64; /* initial array size */ + + *namelist = (char **) pg_malloc(allocsize * sizeof(char *)); + + if ((dirdesc = opendir(dirname)) == NULL) + pg_log(PG_FATAL, "could not open directory \"%s\": %s\n", + dirname, getErrorText(errno)); + + while (errno = 0, (direntry = readdir(dirdesc)) != NULL) + { + if (count >= allocsize) + { + allocsize *= 2; + *namelist = (char **) + pg_realloc(*namelist, allocsize * sizeof(char *)); + } + + (*namelist)[count++] = pg_strdup(direntry->d_name); + } + + #ifdef WIN32 + /* + * This fix is in mingw cvs (runtime/mingwex/dirent.c rev 1.4), but not in + * released version + */ + if (GetLastError() == ERROR_NO_MORE_FILES) + errno = 0; + #endif + + if (errno) + pg_log(PG_FATAL, "could not read directory \"%s\": %s\n", + dirname, getErrorText(errno)); + + closedir(dirdesc); + + return count; + } + + void check_hard_link(void) { diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h new file mode 100644 index ace56e5..0248d40 *** a/contrib/pg_upgrade/pg_upgrade.h --- b/contrib/pg_upgrade/pg_upgrade.h *** *** 7,12 --- 7,13 #include #include + #include #include #include *** const char *setupPageConverter(pageCnvCt *** 365,370 --- 366,372 typedef void *pageCnvCtx; #endif + int load_directory(const char *dirname, char ***namelist); const char *copyAndUpdateFile(pageCnvCtx *pageConverter, const char *src, const char *dst, bool force); const char *linkAndUpdateFile(pageCnvCtx *pageConverter, const char *src, diff --git a/contrib/pg_upgrade/relfilenode.c b/contrib/pg_upgrade/relfilenode.c new file mode 100644 index 7dbaac9..b5b7380 *** a/contrib/pg_upgrade/relfilenode.c --- b/contrib/pg_upgrade/relfilenode.c *** *** 18,24 static void transfer_single_new_db(pageCnvCtx *pageConverter, FileNameMap *maps, int size); static void transfer_relfile(pageCnvCtx *pageConverter, FileNameMap *map, ! const char *suffix); /* --- 18,24 static void transfer_single_new_db(pageCnvCtx *pageConverter, FileNameMap *maps, int size); static void transfer_relfile(pageCnvCtx *pageConverter, FileNameMap *map, ! const char *suffix,
Use of fsync; was Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 19, 2012 at 12:11:26PM -0800, Jeff Janes wrote: [ Sorry for the delay in replying.] > On Wed, Nov 14, 2012 at 3:55 PM, Bruce Momjian wrote: > > On Mon, Nov 12, 2012 at 10:29:39AM -0800, Jeff Janes wrote: > >> > >> Is turning off synchronous_commit enough? What about turning off fsync? > > > > I did some testing with the attached patch on a magnetic disk with no > > BBU that turns off fsync; > > With which file system? I wouldn't expect you to see a benefit with > ext2 or ext3, it seems to be a peculiarity of ext4 that inhibits > "group fsync" of new file creations but rather does each one serially. > Whether it is worth applying a fix that is only needed for that one > file system, I don't know. The trade-offs are not all that clear to > me yet. That only ext4 shows the difference seems possible. > > I got these results > > > > sync_com=off fsync=off > > 115.90 13.51 > > 100026.09 24.56 > > 200033.41 31.20 > > 400057.39 57.74 > > 8000 102.84116.28 > > 16000 189.43207.84 > > > > It shows fsync faster for < 4k, and slower for > 4k. Not sure why this > > is the cause but perhaps the buffering of the fsync is actually faster > > than doing a no-op fsync. > > synchronous-commit=off turns off not only the fsync at each commit, > but also the write-to-kernel at each commit; so it is not surprising > that it is faster at large scale. I would specify both > synchronous-commit=off and fsync=off. I would like to see actual numbers showing synchronous-commit=off is also useful if we use fsync=off. > >> When I'm doing a pg_upgrade with thousands of tables, the shutdown > >> checkpoint after restoring the dump to the new cluster takes a very > >> long time, as the writer drains its operation table by opening and > >> individually fsync-ing thousands of files. This takes about 40 ms per > >> file, which I assume is a combination of slow lap-top disk drive, and > >> a strange deal with ext4 which makes fsyncing a recently created file > >> very slow. But even with faster hdd, this would still be a problem > >> if it works the same way, with every file needing 4 rotations to be > >> fsynced and this happens in serial. > > > > Is this with the current code that does synchronous_commit=off? If not, > > can you test to see if this is still a problem? > > Yes, it is with synchronous_commit=off. (or if it wasn't originally, > it is now, with the same result) > > Applying your fsync patch does solve the problem for me on ext4. > Having the new cluster be on ext3 rather than ext4 also solves the > problem, without the need for a patch; but it would be nice to more > friendly to ext4, which is popular even though not recommended. Do you have numbers with synchronous-commit=off, fsync=off, and both, on ext4? > >> Anyway, the reason I think turning fsync off might be reasonable is > >> that as soon as the new cluster is shut down, pg_upgrade starts > >> overwriting most of those just-fsynced file with other files from the > >> old cluster, and AFAICT makes no effort to fsync them. So until there > >> is a system-wide sync after the pg_upgrade finishes, your new cluster > >> is already in mortal danger anyway. > > > > pg_upgrade does a cluster shutdown before overwriting those files. > > Right. So as far as the cluster is concerned, those files have been > fsynced. But then the next step is go behind the cluster's back and > replace those fsynced files with different files, which may or may not > have been fsynced. This is what makes me thing the new cluster is in > mortal danger. Not only have the new files perhaps not been fsynced, > but the cluster is not even aware of this fact, so you can start it > up, and then shut it down, and it still won't bother to fsync them, > because as far as it is concerned they already have been. > > Given that, how much extra danger would be added by having the new > cluster schema restore run with fsync=off? > > In any event, I think the documentation should caution that the > upgrade should not be deemed to be a success until after a system-wide > sync has been done. Even if we use the link rather than copy method, > are we sure that that is safe if the directories recording those links > have not been fsynced? OK, the above is something I have been thinking about, and obviously you have too. If you change fsync from off to on in a cluster, and restart it, there is no guarantee that the dirty pages you read from the kernel are actually on disk, because Postgres doesn't know they are dirty. They probably will be pushed to disk by the kernel in less than one minute, but still, it doesn't seem reliable. Should this be documented in the fsync section? Again, another reason not to use fsync=off, though your example of the file copy is a good one. As you stated, this is a problem with the file copy/link, independen
Re: [HACKERS] splitting *_desc routines
Alvaro Herrera writes: > So incorporating these ideas, the layout now looks like this: > ./commands/seq_desc.c > ./commands/dbase_desc.c > ./commands/tablespace_desc.c > ./catalog/storage_desc.c > ./utils/cache/relmap_desc.c > ./access/rmgrdesc/mxact_desc.c > ./access/rmgrdesc/spgdesc.c > ./access/rmgrdesc/xact_desc.c > ./access/rmgrdesc/heapdesc.c > ./access/rmgrdesc/tupdesc.c > ./access/rmgrdesc/xlog_desc.c > ./access/rmgrdesc/gistdesc.c > ./access/rmgrdesc/clog_desc.c > ./access/rmgrdesc/hashdesc.c > ./access/rmgrdesc/gindesc.c > ./access/rmgrdesc/nbtdesc.c > ./storage/ipc/standby_desc.c FWIW, I'd vote for dumping all of these into *one* rmgrdesc directory (which may as well be under access/ since that's where the xlog code is), regardless of where the corresponding replay code is in the source tree. I don't think splitting them up into half a dozen directories adds anything except confusion. If you want, the header comment for each file could mention where the corresponding replay code lives. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] splitting *_desc routines
On 23 November 2012 22:52, Alvaro Herrera wrote: > ./access/rmgrdesc/xact_desc.c > ./access/rmgrdesc/heapdesc.c Could we have either all underscores _ or none at all for the naming? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Further pg_upgrade analysis for many tables
On Thu, Nov 15, 2012 at 7:05 PM, Jeff Janes wrote: > On Wed, Nov 14, 2012 at 11:49 AM, Tom Lane wrote: >> Jeff Janes writes: >>> On Thu, Nov 8, 2012 at 9:50 PM, Tom Lane wrote: There are at least three ways we could whack that mole: ... * Keep a separate list (or data structure of your choice) so that relcache entries created in the current xact could be found directly rather than having to scan the whole relcache. That'd add complexity though, and could perhaps be a net loss for cases where the relcache isn't so bloated. >> >>> Maybe a static list that can overflow, like the ResourceOwner/Lock >>> table one recently added. The overhead of that should be very low. >> >>> Are the three places where "need_eoxact_work = true;" the only places >>> where things need to be added to the new structure? >> >> Yeah. The problem is not so much the number of places that do that, >> as that places that flush entries from the relcache would need to know >> to remove them from the separate list, else you'd have dangling >> pointers. > > If the list is of hash-tags rather than pointers, all we would have to > do is ignore entries that are not still in the hash table, right? > I've attached a proof-of-concept patch to implement this. I got rid of need_eoxact_work entirely and replaced it with a short list that fulfills the functions of indicating that work is needed, and suggesting which rels might need that work. There is no attempt to prevent duplicates, nor to remove invalidated entries from the list. Invalid entries are skipped when the hash entry is not found, and processing is idempotent so duplicates are not a problem. Formally speaking, if MAX_EOXACT_LIST were 0, so that the list overflowed the first time it was accessed, then it would be identical to the current behavior or having only a flag. So formally all I did was increase the max from 0 to 10. I wasn't so sure about the idempotent nature of Sub transaction processing, so I chickened out and left that part alone. I know of no workflow for which that was a bottleneck. AtEOXact_release is oddly indented because that makes the patch smaller and easier to read. This makes the non "-1" restore of large dumps very much faster (and makes them faster than "-1" restores, as well) I added a "create temp table foo (x integer) on commit drop;" line to the default pgbench transaction and tested that. I was hoping to see a performance improvement there was well (the transaction has ~110 entries in the RelationIdCache at eoxact each time), but the performance was too variable (probably due to the intense IO it causes) to detect any changes. At least it is not noticeably slower. If I hack pgbench to bloat the RelationIdCache by touching 20,000 useless tables as part of the connection start up process, then this patch does show a win. It is not obvious what value to set the MAX list size to. Since this array is only allocated once per back-end, and since it not groveled through to invalidate relations at each invalidation, there is no particular reason it must be small. But if the same table is assigned new filenodes (or forced index lists, whatever those are) repeatedly within a transaction, the list could become bloated with replicate entries, potentially becoming even larger than the hash table whose scan it is intended to short-cut. In any event, 10 seems to be large enough to overcome the currently known bottle-neck. Maybe 100 would be a more principled number, as that is about where the list could start to become as big as the basal size of the RelationIdCache table. I don't think this patch replaces having some mechanism for restricting how large RelationIdCache can get or how LRU entries in it can get as Robert suggested. But this approach seems like it is easier to implement and agree upon; and doesn't preclude doing other optimizations later. Cheers, Jeff relcache_list_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] splitting *_desc routines
Robert Haas escribió: > On Thu, Oct 25, 2012 at 4:25 AM, Simon Riggs wrote: > > I'd put these in a separate directory to avoid annoyance. Transam is > > already too large. > > +1. A further advantage of that is that it means that that everything > in that new directory will be intended to end up as > all-separately-compilable, which should make it easier to remember > what the rules are, and easier to design an automated framework that > continuously verifies that it still works that way. So incorporating these ideas, the layout now looks like this: ./commands/seq_desc.c ./commands/dbase_desc.c ./commands/tablespace_desc.c ./catalog/storage_desc.c ./utils/cache/relmap_desc.c ./access/rmgrdesc/mxact_desc.c ./access/rmgrdesc/spgdesc.c ./access/rmgrdesc/xact_desc.c ./access/rmgrdesc/heapdesc.c ./access/rmgrdesc/tupdesc.c ./access/rmgrdesc/xlog_desc.c ./access/rmgrdesc/gistdesc.c ./access/rmgrdesc/clog_desc.c ./access/rmgrdesc/hashdesc.c ./access/rmgrdesc/gindesc.c ./access/rmgrdesc/nbtdesc.c ./storage/ipc/standby_desc.c There are two files that are two subdirs down from the backend directory: ./storage/ipc/standby_desc.c ./utils/cache/relmap_desc.c We could keep them in there, or we could alternatively create more "rmgrdesc" subdirs for them, so ./storage/rmgrdesc/standby_desc.c ./utils/rmgrdesc/relmap_desc.c This approach appears more future-proof if we ever want to create desc routines in other subdirs in storage/ and utils/, though it's a bit annoying to have subdirs that will only hold a single file each (and a very tiny file to boot). -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum truncate exclusive lock round two
Alvaro Herrera wrote: > Are you posting an updated patch? Well, there wasn't exactly a consensus on what should change, so I'll throw some initial review comments out even though I still need to check some things in the code and do more testing. The patch applied cleanly, compiled without warnings, and passed all tests in `make check-world`. The previous discussion seemed to me to achieve consensus on the need for the feature, but a general dislike for adding so many new GUC settings to configure it. I concur on that. I agree with the feelings of others that just using deadlock_timeout / 10 (probably clamped to a minimum of 10ms) would be good instead of autovacuum_truncate_lock_check. I agree that the values of the other two settings probably aren't too critical as long as they are fairly reasonable, which I would define as being 20ms to 100ms with with retries lasting no more than 2 seconds. I'm inclined to suggest a total time of deadlock_timeout * 2 clamped to 2 seconds, but maybe that is over-engineering it. There was also a mention of possibly inserting a vacuum_delay_point() call outside of the AccessExclusiveLock. I don't feel strongly about it, but if the page scanning cost can be tracked easily, I guess it is better to do that. Other than simplifying the code based on eliminating the new GUCs, the coding issues I found were: TRUE and FALSE were used as literals. Since true and false are used about 10 times as often and current code in the modified files is mixed, I would recommend the lowercase form. We decided it wasn't worth the trouble to convert the minority usage over, but I don't see any reason to add new instances of the minority case. In LockHasWaiters() the partition lock is acquired using LW_EXCLUSIVE. I don't see why that can't be LW_SHARED. Was there a reason for this, or was it just not changed after copy/paste? I still need to review the timing calls, since I'm not familiar with them so it wasn't immediately obvious to me whether they were being used correctly. I have no reason to believe that they aren't, but feel I should check. Also, I want to do another pass looking just for off-by-one errors on blkno. Again, I have no reason to believe that there is a problem; it just seems like it would be a particularly easy type of mistake to make and miss when a key structure has this field: BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */ And I want to test more. But if a new version could be created based on the above, that would be great. Chances seem relatively slim at this point that there will be much else to change, if anything. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add -Wlogical-op to standard compiler options?
On Wed, 2012-11-21 at 11:46 -0800, Jeff Janes wrote: > Using gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), I get dozens of > warnings, all apparently coming from somewhere in the MemSet macro. OK, reverted. It looks like they dialed it down to a more useful volume in GCC 4.5, but that doesn't really help us. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: Deparsing DDL command strings
Pavel Stehule writes: > * some wrong in deparsing - doesn't support constraints > > postgres=# alter table a add column bbbsss int check (bbbsss > 0); > NOTICE: event: ddl_command_start, context: TOPLEVEL, tag: ALTER > TABLE, operation: ALTER, type: TABLE, schema: , name: > NOTICE: command: > NOTICE: event: ddl_command_end, context: TOPLEVEL, tag: ALTER TABLE, > operation: ALTER, type: TABLE, schema: public, name: a > NOTICE: command: ALTER TABLE public.a ADD COLUMN bbbsss pg_catalog.int4, ; > ALTER TABLE So apparently to be able to decipher what ALTER TABLE did actually do you need more than just hook yourself after transformAlterTableStmt() have been done, because the interesting stuff is happening in the alter table "work queue", see ATController in src/backend/commands/tablecmds.c for details. Exporting that data, I'm now able to implement constraint rewriting this way: alter table a add column bbbsss int check (bbbsss > 0); NOTICE: AT_AddConstraint: a_bbbsss_check NOTICE: event: ddl_command_end, context: TOPLEVEL, tag: ALTER TABLE, operation: ALTER, type: TABLE, schema: public, name: a NOTICE: command: ALTER TABLE public.a ADD COLUMN bbbsss pg_catalog.int4, ADD CONSTRAINT a_bbbsss_check CHECK ((bbbsss > 0)); ALTER TABLE I did publish that work on the github repository (that I rebase every time I'm pulling from master, beware): https://github.com/dimitri/postgres/compare/evt_add_info This implementation also shows to me that it's not possible to get the command string from the parsetree directly nor after the DDL transform step if you want such things as the constraint name that's been produced automatically by the backend. And I don't see any way to implement that from an extension, without first patching the backend. As that's the kind of code we want to be able to break at will in between releases (or to fix an important bug in a minor update), I think we need to have the facility to provide users with the normalized command string in core. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggestion for --truncate-tables to pg_restore
On Wed, Nov 21, 2012 at 5:48 AM, Karl O. Pinc wrote: >> On Fri, Sep 21, 2012 at 8:54 AM, Karl O. Pinc wrote: >> > On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote: > OT: > After looking at the code I found a number of "conflicting" > option combinations are not tested for or rejected. I can't > recall what they are now. The right way to fix these would be > to send in a separate patch for each "conflict" all attached > to one email/commitfest item? Or one patch that just gets > adjusted until it's right? Typically I think it's easiest for the reviewer+committer to consider a bunch of such similar changes altogether in one patch, rather than in a handful of separate patches, though that's just my own preference. >> > >> > >> > More serious objections were raised regarding semantics. >> > >> > What if, instead, the initial structure looked like: >> > >> > CREATE TABLE schemaA.foo >> > (id PRIMARY KEY, data INT); >> > >> > CREATE TABLE schemaB.bar >> > (id INT CONSTRAINT "bar_on_foo" REFERENCES foo >> > , moredata INT); >> > >> > With a case like this, in most real-world situations, you'd >> > have to use pg_restore with --disable-triggers if you wanted >> > to use --data-only and --truncate-tables. The possibility of >> > foreign key referential integrity corruption is obvious. >> >> Why would --disable-triggers be used in this example? I don't think >> you could use --truncate-tables to restore only table "foo", because >> you would get: >> >> ERROR: cannot truncate a table referenced in a foreign key >> constraint >> >> (At least, I think so, not having tried with the actual patch.) You >> could use --disable-triggers when restoring "bar". > > I tried it and you're quite right. (I thought I'd tried this > before, but clearly I did not -- proving the utility of the review > process. :-) My assumption was that since triggers > were turned off that constraints, being triggers, would be off as > well and therefore tables with foreign keys could be truncated. > Obviously not, since the I get the above error. > > I just tried it. --disable-triggers does not disable constraints. Just to be clear, I believe the problem in this example is that --disable-triggers does not disable any foreign key constraints of other tables when you are restoring a single table. So with: pg_restore -1 --truncate-tables --disable-triggers --table=foo \ --data-only my.dump ... you would get the complaint ERROR: cannot truncate a table referenced in a foreign key constraint which is talking about bar's referencing foo, and there was no ALTER TABLE bar DISABLE TRIGGER ALL done, since "bar" isn't being restored. I don't have a quibble with this existing behavior -- you are instructing pg_restore to only mess with "bar", after all. See below, though, for a comparison of --clean and --truncate-tables when restoring "foo" and "bar" together. >> For your first example, --truncate-tables seems to have some use, so >> that the admin isn't forced to recreate various views which may be >> dependent on the table. (Though it's not too difficult to work around >> this case today.) > > As an aside: I never have an issue with this, having planned ahead. > I'm curious what the not-too-difficult work-around is that you have > in mind. I'm not coming up with a tidy way to, e.g, pull a lot > of views out of a dump. Well, for the first example, there was one table and only one view which depended on the table, so manually editing the list file like so: pg_restore --list --file=my.dump > output.list # manually edit file "output.list", select only entries pertaining # to the table and dependent view pg_restore -1 --clean --use-list=output.list ... isn't too arduous, though it would become so if you had more dependent views to worry about. I'm willing to count this use-case as a usability win for --truncate-tables, since with that option the restore can be boiled down to a short and sweet: pg_restore -1 --data-only --truncate-tables --table=mytable my.dump ... Though note this may not prove practical for large tables, since --data-only leaves constraints and indexes intact during the restore, and can be massively slower compared to adding the constraints only after the data has been COPYed in, as pg_restore otherwise does. >> For the second example involving FKs, I'm a little bit more hesitant >> about endorsing the use of --truncate-tables combined with >> --disable-triggers to potentially allow bogus FKs. I know this is >> possible to some extent today using the --disable-triggers option, >> but >> it makes me nervous to be adding a mode to pg_restore if it's >> primarily designed to work together with --disable-triggers as a >> larger foot-gun. > > This is the crux of the issue, and where I was thinking of > taking this patch. I very often am of the mindset that > foreign keys are no more or less special than other > more complex data integrity rules enforced with triggers. > (I suppose others
Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management
On Friday, November 23, 2012 11:15 AM Pavan Deolasee wrote: On Thu, Nov 22, 2012 at 2:05 PM, Amit Kapila wrote: >>>Sorry, I haven't followed this thread at all, but the numbers (43171 and >>>57920) in the last two runs of @mv-free-list for 32 clients look >>>aberrations, no ? I wonder if >>>that's skewing the average. >>Yes, that is one of the main reasons, but in all runs this is consistent that >>for 32 clients or above this kind of numbers are observed. >>Even Jeff has pointed the similar thing in one of his mails and suggested to >>run the tests such that first test should run “with patch” and then “without >>patch”. >>After doing what he suggested the observations are still similar. >Are we convinced that the jump that we are seeing is a real one then ? Still not convinced, as the data has been collected in only my setup. >I'm a bit surprised because it happens only with the patch and only for 32 >clients. How >would you explain that ? The reason this patch can improve performance is due to reduce contention for BufFreeListLock and PartitionLock (which it takes in BufferAlloc a. to remove old page from buffer or b. to see if block is already in buffer pool) in backends. As the number of backends increase the chances of improved performance is much better. In particular for 32 clients when tests run for longer time results are not that skewed. For 32 clients, as mentioned in previous mail when the test has ran for 1 hr, the differrence is not very skewed. 32 client /32 thread for 1 hour @mv-free-lst@9.3devl Single-run:9842.019229 8050.357981 >>> I also looked at the the Results.htm file down thread. There seem to be a >>> steep degradation when the shared buffers are increased from 5GB to 10GB, >>> both with and >>> without the patch. Is that expected ? If so, isn't that worth investigating >>> and possibly even fixing before we do anything else ? >> The reason for decrease in performance is that when shared buffers are >> increased from 5GB to 10GB, the I/O starts as after increasing it cannot >> hold all >> the data in OS buffers. >Shouldn't that data be in the shared buffers if not the OS cache and hence >approximately same IO will be required? I don't think so as the data in OS cache or PG Shared buffers doesn't have any direct relation, OS can flush its buffers based on its scheduler algorithm. Let us try to see by example: Total RAM - 22G Database size - 16G Case -1 (Shared Buffers - 5G) a. Load all the files in OS buffers. Chances are good that all 16G data will be there in OS buffers as OS has still 17G of memory available. b. Try to load all in Shared buffers. Last 5G will be there in shared buffers. c. Chances are high that remaining 11G buffers access will not lead to IO as they will be in OS buffers. Case -2 (Shared Buffers - 10G) a. Load all the files in OS buffers. In best case OS buffers can contain10-12G data as OS has 12G of memory available. b. Try to load all in Shared buffers. Last 10G will be there in shared buffers. c. Now as there is no direct correlation of data between Shared Buffers and OS buffers, so whenever PG has to access any data which is not there in Shared Buffers, good chances are there that it can lead to IO. > Again, the drop in the performance is so severe that it seems worth > investigating that further, especially because you can reproduce it reliably. Yes, I agree that it is worth investigating, but IMO this is a different problem which might not be addressed with the Patch in discussion. The 2 reasons I can think for dip in performance when Shared Buffers increase beyond certain threshhold percentage of RAM are, a. either the algorithm of Buffer Management has some bottleneck b. due to the way data is managed in Shared Buffers and OS buffer cache Any Suggestion/Comments? With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] False indication in pg_stat_replication.sync_state
On 23.11.2012 19:55, Tom Lane wrote: Heikki Linnakangas writes: Committed, thanks. Doesn't seem to have been pushed to master? On purpose. Per commit message: 9.3 doesn't have this problem because XLogRecPtr is now a single 64-bit integer, so XLogRecPtrIsInvalid() does the right thing. Apply to 9.2, and 9.1 where pg_stat_replication view was introduced. I considered applying it to master anyway, just to keep the branches in sync. But XLogRecPtrIsInvalid(var) seems slightly more readable than XLByteEQ(var, InvalidXLogRecPtr), so I decided not to. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] False indication in pg_stat_replication.sync_state
Heikki Linnakangas writes: > Committed, thanks. Doesn't seem to have been pushed to master? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: Deparsing DDL command strings
Hello 2012/11/22 Dimitri Fontaine : > Hi, > > Thank you Pavel for reviewing my patch! :) > > Pavel Stehule writes: >> * issues >> >> ** missing doc > > Yes. I wanted to reach an agreement on why we need the de parsing for. > You can read the Last comment we made about it at the following link, I > didn't have any answer to my proposal. > > http://archives.postgresql.org/message-id/m2391fu79m@2ndquadrant.fr > > In that email, I said: >> Also, I'm thinking that publishing the normalized command string is >> something that must be maintained in the core code, whereas the external >> stable format might be done as a contrib extension, coded in C, working >> with the Node *parsetree. Because it needs to ouput a compatible format >> when applied to different major versions of PostgreSQL, I think it suits >> quite well the model of C coded extensions. > > I'd like to hear what people think about that position. I could be > working on the docs meanwhile I guess, but a non trivial amount of what > I'm going to document is to be thrown away if we end up deciding that we > don't want the normalized command string… > I agree with you, so for PL languages just plain text is best format - it is enough for all tasks that, I can imagine, can be solved by all supported PL. And for complex tasks - exported parsetree is perfect. My starting point is strategy, so everything should by possible from PL/pgSQL, that is our most used PL - and there any complex format is bad - the most work is doable via plan text and regular expressions. There is precedent - EXPLAIN - we support more formats, but in PL, we use usually just plain format. >> ** statements are not deparsd for ddl_command_start event > > Yes, that's by design, and that's why we have the ddl_command_trace > event alias too. Tom is right in saying that until the catalogs are > fully populated we can not rewrite most of the command string if we want > normalized output (types, constraints, namespaces, all the jazz). > ok >> ** CREATE FUNCTION is not supported > > Not yet. The goal of this patch in this CF is to get a green light on > providing a full implementation of what information to add to event > triggers and in particular about rewriting command strings. I don't have a objection. Any other ??? Regards Pavel > > That said, if you think it would help you as a reviewer, I may attack > the CREATE FUNCTION support right now. > >> * some wrong in deparsing - doesn't support constraints >> >> postgres=# alter table a add column bbbsss int check (bbbsss > 0); >> NOTICE: event: ddl_command_start, context: TOPLEVEL, tag: ALTER >> TABLE, operation: ALTER, type: TABLE, schema: , name: >> NOTICE: command: >> NOTICE: event: ddl_command_end, context: TOPLEVEL, tag: ALTER TABLE, >> operation: ALTER, type: TABLE, schema: public, name: a >> NOTICE: command: ALTER TABLE public.a ADD COLUMN bbbsss pg_catalog.int4, ; >> ALTER TABLE > > Took me some time to figure out how the constraint processing works in > CREATE TABLE, and apparently I didn't finish ALTER TABLE support yet. > Working on that, thanks. > > Regards, > -- > Dimitri Fontaine > http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] False indication in pg_stat_replication.sync_state
On 09.11.2012 15:28, Fujii Masao wrote: On Fri, Nov 9, 2012 at 4:06 AM, Alvaro Herrera wrote: Fujii Masao escribió: On Fri, Oct 19, 2012 at 10:29 PM, Fujii Masao wrote: However, I've forgotten to treat other three portions in walsender.c and syncrep.c also does XLogRecPtrIsInvalid(). This new patch includes the changes for them. Good catch. Does any commiter pick up this? If not, please add to next commitfest so that we don't forget. Yep, I added this to next CF. This is just a bug fix, so please feel free to pick up this even before CF. Committed, thanks. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Further pg_upgrade analysis for many tables
On Thu, Nov 15, 2012 at 07:05:00PM -0800, Jeff Janes wrote: > On Wed, Nov 14, 2012 at 11:49 AM, Tom Lane wrote: > > Jeff Janes writes: > >> On Thu, Nov 8, 2012 at 9:50 PM, Tom Lane wrote: > >>> There are at least three ways we could whack that mole: ... > >>> > >>> * Keep a separate list (or data structure of your choice) so that > >>> relcache entries created in the current xact could be found directly > >>> rather than having to scan the whole relcache. That'd add complexity > >>> though, and could perhaps be a net loss for cases where the relcache > >>> isn't so bloated. > > > >> Maybe a static list that can overflow, like the ResourceOwner/Lock > >> table one recently added. The overhead of that should be very low. > > > >> Are the three places where "need_eoxact_work = true;" the only places > >> where things need to be added to the new structure? > > > > Yeah. The problem is not so much the number of places that do that, > > as that places that flush entries from the relcache would need to know > > to remove them from the separate list, else you'd have dangling > > pointers. > > If the list is of hash-tags rather than pointers, all we would have to > do is ignore entries that are not still in the hash table, right? > > > On a related thought, is a shame that "create temp table on commit > drop" sets "need_eoxact_work", because by the time we get to > AtEOXact_RelationCache upon commit, the entry is already gone and so > there is actual work to do (unless a non-temp table was also > created). But on abort, the entry is still there. I don't know if > there is an opportunity for optimization there for people who use temp > tables a lot. If we go with a caching list, that would render it moot > unless they use so many as to routinely overflow the cache. I added the attached C comment last year to mention why temp tables are not as isolated as we think, and can't be optimized as much as you would think. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + commit f458c90bff45ecae91fb55ef2b938af37d977af3 Author: Bruce Momjian Date: Mon Sep 5 22:08:14 2011 -0400 Add C comment about why we send cache invalidation messages for session-local objects. diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c new file mode 100644 index 337fe64..98dc3ad *** a/src/backend/utils/cache/inval.c --- b/src/backend/utils/cache/inval.c *** ProcessCommittedInvalidationMessages(Sha *** 812,817 --- 812,821 * about CurrentCmdInvalidMsgs too, since those changes haven't touched * the caches yet. * + * We still send invalidation messages for session-local objects to other + * backends because, while other backends cannot see any tuples, they can + * drop tables that are session-local to another session. + * * In any case, reset the various lists to empty. We need not physically * free memory here, since TopTransactionContext is about to be emptied * anyway. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL
On Fri, Nov 23, 2012 at 6:56 PM, Amit Kapila wrote: >> When I remove postgresql.auto.conf, SET PERSISTENT failed. >> >> =# SET PERSISTENT synchronous_commit = 'local'; >> ERROR: failed to open "postgresql.auto.conf" file > > There can be 2 ways to handle this, either we recreate the > "postgresql.auto.conf" file or give error. > I am not sure if user tries to delete internal files, what should be exact > behavior? > Any suggestion? I prefer to recreate it. $PGDATA/config_dir is specified in include_dir by default. Users might create and remove the configuration files in that directory many times. So I'm not surprised even if a user accidentally removes postgresql.auto.conf in that directory. Also users might purposely remove that file to reset all the settings by SET PERSISTENT. So I think that SET PERSISTENT should handle the case where postgresql.auto.conf doesn't exist. We might be able to expect that postgresql.auto.conf is not deleted by a user if it's in $PGDATA/global or base directory. >> We should implement "RESET PERSISTENT"? Otherwise, there is no way >> to get rid of the parameter setting from postgresql.auto.conf, via SQL. >> Also >> We should implement "SET PERSISTENT name TO DEFAULT"? > > Till now, I have not implemented this in patch, thinking that it can be done > as a 2nd part if basic stuff is ready. > However I think you are right without one of "RESET PERSISTENT" or "SET > PERSISTENT name TO DEFAULT", it is difficult for user > to get rid of parameter. > Will "SET PERSISTENT name TO DEFAULT" be sufficient or do you think both are > necessary, because RESET PERSISTENT also internally might need > to behave similarly. > > For implementation of "SET PERSISTENT name TO DEFAULT", there can be 2 ways > 1) Delete the entry from postgresql.auto.conf > 2) Update the entry value in postgresql.auto.conf to default value Both seems to be useful. I think that "SET ... TO DEFAULT" is suitable for 2), and "RESET PERSISTENT ..." is suitable for 1). Another comment is: What happens if the server crashes while SET PERSISTENT is writing the setting to the file? A partial write occurs and restart of the server would fail because of corrupted postgresql.auto.conf? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Re] [Re] Re: [HACKERS] PANIC: could not write to log file
"Cyril VELTER" writes: >I follow up on my previous message. Just got two more crash today very > similar to the first ones : > PANIC: could not write to log file 118, segment 237 at offset 2686976, > length 5578752: Invalid argument > STATEMENT: COMMIT > PANIC: could not write to log file 118, segment 227 at offset 9764864, > length 57344: Invalid argument > STATEMENT: COMMIT > for memory the first ones are > PANIC: could not write to log file 117, segment 117 at offset 5660672, > length 4096000: Invalid argument > STATEMENT: COMMIT > PANIC: could not write to log file 118, segment 74 at offset 12189696, > length 475136: Invalid argument > STATEMENT: COMMIT Hm, those write lengths seem rather large. What have you got wal_buffers set to? Does it help if you back it off to whatever you were using in the 8.2 installation? (The default back in 8.2 days seems to have been 64kB, if that helps.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)
Alvaro Herrera writes: > If the bgworker developer gets really tense about this stuff (or > anything at all, really), they can create a completely new sigmask and > do sigaddset() etc. Since this is all C code, we cannot keep them from > doing anything, really; I think what we need to provide here is just a > framework to ease development of simple cases. An important point here is that if a bgworker does need to do its own signal manipulation --- for example, installing custom signal handlers --- it would be absolutely catastrophic for us to unblock signals before reaching worker-specific code; signals might arrive before the process had a chance to fix their handling. So I'm against Heikki's auto-unblock proposal. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum truncate exclusive lock round two
Jan, Are you posting an updated patch? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [pgsql-cluster-hackers] Question: Can i cut NON-HOT chain Pointers if there are no concurrent updates?
Henning, On 11/23/2012 03:17 PM, "Henning Mälzer" wrote: > Can somebody help me? Sure, but you might get better answers on the -hackers mailing list. I'm redirecting there. The cluster-hackers one is pretty low volume and low subscribers, I think. > Question: > What would be the loss if i cut NON-HOT chain Pointers, meaning i set > t_ctid=t_self in the case where it points to a tuple on another page? READ COMMITTED would stop to work correctly in the face of concurrent transactions, I guess. See the fine manual: http://www.postgresql.org/docs/devel/static/transaction-iso.html#XACT-READ-COMMITTED The problem essentially boils down to: READ COMMITTED transactions need to learn about tuples *newer* than what their snapshot would see. > I am working on a project based on "postgres (PostgreSQL) 8.5devel" with the > code from several master thesises befor me. Care to elaborate a bit? Can (part of) that code be released under an open license? Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?
Merlin Moncure wrote: > Kevin Grittner wrote: >> Merlin Moncure wrote: >> >>> Hm, I bet it's possible (although probably not easy) to deduce >>> volatility from the function body...maybe through the validator. >>> If you could do that (perhaps warning in cases where you can't), >>> then the performance regression-inducing-argument (which I agree >>> with) becomes greatly ameliorated. >> >> For about the last 10 years the Wisconsin Courts have been parsing >> SQL code to generate Java query classes, including "stored >> procedures", and determining information like this. For example, >> we set a readOnly property for the query class by examining the >> statements in the procedure and the readOnly status of each called >> procedure. It wasn't that hard for us, and I'm not sure what would >> make much it harder in PostgreSQL, if we can do it where a parse >> tree for the function is handy. > > hm, what do you do about 'after the fact' changes to things the > procedure body is pointing to? what would the server have to do? We did a regeneration of the whole set near the end of each release cycle, and that or smaller changes as needed during the release cycle. Of course, we didn't have any equivalent of pg_depend. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] pg_ping utility
On Sat, Nov 17, 2012 at 2:48 AM, Phil Sorber wrote: > On Thu, Nov 15, 2012 at 10:55 PM, Michael Paquier > wrote: > > On Fri, Nov 16, 2012 at 12:34 PM, Phil Sorber wrote: > >> On Thu, Nov 15, 2012 at 9:23 PM, Michael Paquier > >> wrote: > >> > 3) Having an output close to what ping actually does would also be > nice, > >> > the > >> > current output like Accepting/Rejecting Connections are not that > >> > >> Could you be more specific? Are you saying you don't want to see > >> accepting/rejecting info output? > > > > OK sorry. > > > > I meant something like that for an accessible server: > > $ pg_ping -c 3 -h server.com > > PING server.com (192.168.1.3) > > accept from 192.168.1.3: icmp_seq=0 time=0.241 ms > > accept from 192.168.1.3: icmp_seq=0 time=0.240 ms > > accept from 192.168.1.3: icmp_seq=0 time=0.242 ms > > > > Like that for a rejected connection: > > reject from 192.168.1.3: icmp_seq=0 time=0.241 ms > > > > Like that for a timeout: > > timeout from 192.168.1.3: icmp_seq=0 > > Then print 1 line for each ping taken to stdout. > > How does icmp_seq fit into this? Or was that an oversight? > > Also, in standard ping if you don't pass -c it will continue to loop > until interrupted. Would you suggest that pg_ping mimic that, or that > we add an additional flag for that behavior? > > FWIW, I would use 'watch' with the existing output for cases that I > would need something like that. > We waited a couple of days for feedback for this feature. So based on all the comments provided by everybody on this thread, perhaps we should move on and implement the options that would make pg_ping a better wrapper for PQPing. Comments? -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] fix ecpg core dump when there's a very long struct variable name in .pgc file
On Thu, Nov 22, 2012 at 06:09:20PM +0800, Chen Huajun wrote: > When use a struct variable whose name length is very very long such as 12KB > in .pgc source, > ecpg will core dump because of buffer overflow if precompile the .pgc file. How on earth did you run into this? :) I absolutely agree that this is better be fixed and cjust committed the second version of your patch. Thanks. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org Jabber: michael.meskes at gmail dot com VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Plugging fd leaks (was Re: [HACKERS] Switching timeline over streaming replication)
On 15.11.2012 17:16, Heikki Linnakangas wrote: On 15.11.2012 16:55, Tom Lane wrote: Heikki Linnakangas writes: This is a fairly general issue, actually. Looking around, I can see at least two similar cases in existing code, with BasicOpenFile, where we will leak file descriptors on error: Um, don't we automatically clean those up during transaction abort? Not the ones allocated with PathNameOpenFile or BasicOpenFile. Files allocated with AllocateFile() and OpenTemporaryFile() are cleaned up at abort. If we don't, we ought to think about that, not about cluttering calling code with certain-to-be-inadequate cleanup in error cases. Agreed. Cleaning up at end-of-xact won't help walsender or other non-backend processes, though, because they don't do transactions. But a top-level ResourceOwner that's reset in the sigsetjmp() cleanup routine would work. This is what I came up with. It adds a new function, OpenFile, that returns a raw file descriptor like BasicOpenFile, but the file descriptor is associated with the current subtransaction and automatically closed at end-of-xact, in the same way that AllocateFile and AllocateDir work. In other words, OpenFile is to open() what AllocateFile is to fopen(). BasicOpenFile is unchanged, it returns a raw fd and it's solely the caller's responsibility to close it, but many of the places that used to call BasicOpenFile now use the safer OpenFile function instead. This patch plugs three existing fd (or virtual fd) leaks: 1. copy_file() - fixed by by using OpenFile instead of BasicOpenFile 2. XLogFileLinit() - fixed by adding close() calls to the error cases. Can't use OpenFile here because the fd is supposed to persist over transaction boundaries. 3. lo_import/lo_export - fixed by using OpenFile instead of PathNameOpenFile. In addition, this replaces many BasicOpenFile() calls with OpenFile() that were not leaking, because the code meticulously closed the file on error. That wasn't strictly necessary, but IMHO it's good for robustness. One thing I'm not too fond of is the naming. I'm calling the new functions OpenFile and CloseFile. There's some danger of confusion there, as the function to close a virtual file opened with PathNameOpenFile is called FileClose. OpenFile is really the same kind of operation as AllocateFile and AllocateDir, but returns an unbuffered fd. So it would be nice if it was called AllocateSomething, too. But AllocateFile is already taken. And I don't much like the Allocate* naming for these anyway, you really would expect the name to contain "open". Do we want to backpatch this? We've had zero complaints, but this seems fairly safe to backpatch, and at least the leak in copy_file() can be quite annoying. If you run out of disk space in CREATE DATABASE, the target file is kept open even though it's deleted, so the space isn't reclaimed until you disconnect. - Heikki diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c index dd69c23..cd60dd8 100644 --- a/src/backend/access/transam/slru.c +++ b/src/backend/access/transam/slru.c @@ -531,7 +531,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruFlush fdata) int i; for (i = 0; i < fdata->num_files; i++) - close(fdata->fd[i]); + CloseFile(fdata->fd[i]); } /* Re-acquire control lock and update page state */ @@ -593,7 +593,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno) * SlruPhysicalWritePage). Hence, if we are InRecovery, allow the case * where the file doesn't exist, and return zeroes instead. */ - fd = BasicOpenFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR); + fd = OpenFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR); if (fd < 0) { if (errno != ENOENT || !InRecovery) @@ -614,7 +614,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno) { slru_errcause = SLRU_SEEK_FAILED; slru_errno = errno; - close(fd); + CloseFile(fd); return false; } @@ -623,11 +623,11 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno) { slru_errcause = SLRU_READ_FAILED; slru_errno = errno; - close(fd); + CloseFile(fd); return false; } - if (close(fd)) + if (CloseFile(fd)) { slru_errcause = SLRU_CLOSE_FAILED; slru_errno = errno; @@ -740,7 +740,7 @@ SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruFlush fdata) * don't use O_EXCL or O_TRUNC or anything like that. */ SlruFileName(ctl, path, segno); - fd = BasicOpenFile(path, O_RDWR | O_CREAT | PG_BINARY, + fd = OpenFile(path, O_RDWR | O_CREAT | PG_BINARY, S_IRUSR | S_IWUSR); if (fd < 0) { @@ -773,7 +773,7 @@ SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruFlush fdata) slru_errcause = SLRU_SEEK_FAILED; slru_errno = errno; if (!fdata) - close(fd); + CloseFile(fd); return false; } @@ -786,7 +786,7 @@ SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruFlush fdata) slru_errcause = SLRU_WRITE_FAILED; slru_e
Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management
>>Shouldn't that data be in the shared buffers if not the OS cache and hence approximately same IO will be required? >I don't think so as the data in OS cache or PG Shared buffers doesn't have any direct relation, >OS can flush its buffers based on its scheduler algorithm. >Let us try to see by example: >Total RAM - 22G >Database size - 16G >Case -1 (Shared Buffers - 5G) >a. Load all the files in OS buffers. Chances are good that all 16G data will be there in OS >buffers as OS has still 17G of memory available. >b. Try to load all in Shared buffers. Last 5G will be there in shared buffers. >c. Chances are high that remaining 11G buffers access will not lead to IO as they will be in OS >buffers. >Case -2 (Shared Buffers - 10G) >a. Load all the files in OS buffers. In best case OS buffers can contain10-12G data as OS has >12G of memory available. >b. Try to load all in Shared buffers. Last 10G will be there in shared buffers. >c. Now as there is no direct correlation of data between Shared Buffers and OS buffers, so >whenever PG has to access any data > which is not there in Shared Buffers, good chances are there that it can lead to IO. >> Again, the drop in the performance is so severe that it seems worth investigating that further, especially because you can reproduce it reliably. > Yes, I agree that it is worth investigating, but IMO this is a different problem which might >not be addressed with the Patch in discussion. >The 2 reasons I can think for dip in performance when Shared Buffers increase beyond certain > threshhold percentage of RAM are, > a. either the algorithm of Buffer Management has some bottleneck > b. due to the way data is managed in Shared Buffers and OS buffer cache The point I want to tell is explained at below link as well. http://blog.kimiensoftware.com/2011/05/postgresql-vs-oracle-differences-4-sh ared-memory-usage-257 So if above is true, I think the performance will regain if in the test Shared Buffers are set to 16G. I shall once try that setting for test run. With Regards, Amit Kapila.
Re: [HACKERS] PQconninfo function for libpq
On Fri, Nov 23, 2012 at 6:30 AM, Fujii Masao wrote: > On Thu, Nov 22, 2012 at 10:05 PM, Boszormenyi Zoltan wrote: >> 2012-11-22 12:44 keltezéssel, Magnus Hagander írta: Also, a question was buried in the other review which is - are we OK to remove the requiressl parameter. Both these patches do so, because the code becomes much simpler if we can do that. It has been deprecated since 7.2. Is it OK to remove it, or do we need to put back in the more complex code to deal with both? >> >> Just going to highlight that we're looking for at least one third >> party to comment on this :) > > > Yes, me too. A +1 for removing wouldn't count from me. ;-) > > +1 Actually, with the cleaner code that resulted from the rewrite, reintroducing it turned out to be pretty darn simple - if I'm not missing something. PFA a patch that comes *with* requiressl= support, without making the code that much more complex. It also fixes the fact that pg_service.conf was broken by the previous one. It also includes the small fixes from Zoltans latest round (the one that was for libpq, not the one for pg_receivexlog that turned out to be wrong). And a pgindent run :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ PQconninfo-mha-2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: plpgsql return a row-expression
Hello 2012/11/23 Asif Rehman : > Hi, > > I forgot to add documentation changes in the earlier patch. In the attached > patch, I have added documentation as well as fixed other issues you raised. > ok so * applied and compiled cleanly * All 133 tests passed. * there are doc * there are necessary regress tests * automatic conversion is works like plpgsql user expects * there are no performance issue now * code respects our coding standards + code remove redundant lines I have no any objection this patch is ready to commit! Regards Pavel > Regards, > --Asif > > > On Thu, Nov 22, 2012 at 10:47 PM, Asif Rehman > wrote: >> >> Thanks for the review Pavel. I have taken care of the points you raised. >> Please see the updated patch. >> >> Regards, >> --Asif >> >> >> On Wed, Nov 21, 2012 at 1:47 AM, Pavel Stehule >> wrote: >>> >>> related to >>> http://archives.postgresql.org/message-id/caauglxwpedfwae6dajmf7sxewfusa0f68p07retbbpf_fsa...@mail.gmail.com >>> >>> * patched and compiled withou warnings >>> >>> * All 133 tests passed. >>> >>> >>> but >>> >>> I don't like >>> >>> * call invalid function from anonymous block - it is messy (regress >>> tests) - there is no reason why do it >>> >>> +create or replace function foo() returns footype as $$ >>> +declare >>> + v record; >>> + v2 record; >>> +begin >>> + v := (1, 'hello'); >>> + v2 := (1, 'hello'); >>> + return (v || v2); >>> +end; >>> +$$ language plpgsql; >>> +DO $$ >>> +declare >>> + v footype; >>> +begin >>> + v := foo(); >>> + raise info 'x = %', v.x; >>> + raise info 'y = %', v.y; >>> +end; $$; >>> +ERROR: operator does not exist: record || record >>> +LINE 1: SELECT (v || v2) >>> + ^ >>> >>> * there is some performance issue >>> >>> create or replace function fx2(a int) >>> returns footype as $$ >>> declare x footype; >>> begin >>> x = (10,20); >>> return x; >>> end; >>> $$ language plpgsql; >>> >>> postgres=# select sum(fx2.x) from generate_series(1,10) g(i), >>> lateral fx2(i); >>>sum >>> - >>> 100 >>> (1 row) >>> >>> Time: 497.129 ms >>> >>> returns footype as $$ >>> begin >>> return (10,20); >>> end; >>> $$ language plpgsql; >>> >>> postgres=# select sum(fx2.x) from generate_series(1,10) g(i), >>> lateral fx2(i); >>>sum >>> - >>> 100 >>> (1 row) >>> >>> Time: 941.192 ms >>> >>> following code has same functionality and it is faster >>> >>> if (stmt->expr != NULL) >>> { >>> if (estate->retistuple) >>> { >>> TupleDesc tupdesc; >>> Datum retval; >>> Oid rettype; >>> boolisnull; >>> int32 tupTypmod; >>> Oid tupType; >>> HeapTupleHeader td; >>> HeapTupleData tmptup; >>> >>> retval = exec_eval_expr(estate, >>> >>> stmt->expr, >>> &isnull, >>> >>> &rettype); >>> >>> /* Source must be of RECORD or composite type */ >>> if (!type_is_rowtype(rettype)) >>> ereport(ERROR, >>> >>> (errcode(ERRCODE_DATATYPE_MISMATCH), >>> errmsg("cannot return >>> non-composite value from composite type returning function"))); >>> >>> if (!isnull) >>> { >>> /* Source is a tuple Datum, so safe to >>> do this: */ >>> td = DatumGetHeapTupleHeader(retval); >>> /* Extract rowtype info and find a >>> tupdesc */ >>> tupType = HeapTupleHeaderGetTypeId(td); >>> tupTypmod = HeapTupleHeaderGetTypMod(td); >>> tupdesc = >>> lookup_rowtype_tupdesc(tupType, tupTypmod); >>> >>> /* Build a temporary HeapTuple control >>> structure */ >>> tmptup.t_len = >>> HeapTupleHeaderGetDatumLength(td); >>> ItemPointerSetInvalid(&(tmptup.t_self)); >>> tmptup.t_tableOid = InvalidOid; >>> tmptup.t_data = td; >>> >>> estate->retval = >>> PointerGetDatum(heap_copytuple(&tmptup)); >>> estate->rettupdesc = >>> CreateTupleDescCopy(tupdesc); >>> ReleaseTupleDesc(tupdesc); >>> } >>> >>> estate->retisnull = isnull; >>> } >>> >>> >>> >>> * it is to restrictive (maybe) - almost all plpgsql' statements does >>> automatic conversions (IO conversions when is necessary) >>>
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL
On Thursday, November 22, 2012 10:09 PM Fujii Masao wrote: > On Thu, Nov 22, 2012 at 9:38 PM, Amit kapila > wrote: > > Patch to implement SET PERSISTENT command is attached with this mail. > > Now it can be reviewed. > > I got the following compile warnings: > xact.c:1847: warning: implicit declaration of function > 'AtEOXact_UpdateAutoConfFiles' > guc.c:5134: warning: enumeration value 'PGC_ENUM' not handled in switch > > "SET PERSISTENT name TO value" failed, though "SET PERSISTENT name = > value" > succeeded. > > =# SET PERSISTENT synchronous_commit TO 'local'; > ERROR: syntax error at or near "TO" > LINE 1: SET PERSISTENT synchronous_commit TO 'local'; > =# SET PERSISTENT synchronous_commit = 'local'; > SET > > SET PERSISTENT succeeded even if invalid setting value was set. > > =# SET PERSISTENT synchronous_commit = 'hoge'; > SET > > SET PERSISTENT syntax should be explained by "\help SET" psql command. Thank you. I shall take care of above in next patch and do more test. > > When I remove postgresql.auto.conf, SET PERSISTENT failed. > > =# SET PERSISTENT synchronous_commit = 'local'; > ERROR: failed to open "postgresql.auto.conf" file There can be 2 ways to handle this, either we recreate the "postgresql.auto.conf" file or give error. I am not sure if user tries to delete internal files, what should be exact behavior? Any suggestion? > We should implement "RESET PERSISTENT"? Otherwise, there is no way > to get rid of the parameter setting from postgresql.auto.conf, via SQL. > Also > We should implement "SET PERSISTENT name TO DEFAULT"? Till now, I have not implemented this in patch, thinking that it can be done as a 2nd part if basic stuff is ready. However I think you are right without one of "RESET PERSISTENT" or "SET PERSISTENT name TO DEFAULT", it is difficult for user to get rid of parameter. Will "SET PERSISTENT name TO DEFAULT" be sufficient or do you think both are necessary, because RESET PERSISTENT also internally might need to behave similarly. For implementation of "SET PERSISTENT name TO DEFAULT", there can be 2 ways 1) Delete the entry from postgresql.auto.conf 2) Update the entry value in postgresql.auto.conf to default value Incase we just do update, then user might not be able to modify postgresql.conf manually once the command is executed. So I think Delete is better option. Let me know if you think otherwise or you have some other idea to achieve it. > Is it helpful to output the notice message like 'Run pg_reload_conf() or > restart the server if you want new settings to take effect' always after > SET PERSISTENT command? Not sure because if someone uses SET PERSISTENT command, he should know the effect or behavior of same. I think it's good to put this in UM along with "PERSISTENT" option explanation. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PQconninfo function for libpq
2012-11-23 06:30 keltezéssel, Fujii Masao írta: On Thu, Nov 22, 2012 at 10:05 PM, Boszormenyi Zoltan wrote: 2012-11-22 12:44 keltezéssel, Magnus Hagander írta: Also, a question was buried in the other review which is - are we OK to remove the requiressl parameter. Both these patches do so, because the code becomes much simpler if we can do that. It has been deprecated since 7.2. Is it OK to remove it, or do we need to put back in the more complex code to deal with both? Just going to highlight that we're looking for at least one third party to comment on this :) Yes, me too. A +1 for removing wouldn't count from me. ;-) +1 Thanks. :-) The second one is the product of what caught my attention while I was looking at pg_receivexlog. The current coding may write beyond the end of the allocated arrays and the password may overwrite a previously set keyword/value pair. ISTM that such problem doesn't happen at all because argcount is incremented as follows. if (dbhost) argcount++; if (dbuser) argcount++; if (dbport) argcount++; Right, forget about the second patch. Regards, -- -- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: plpgsql return a row-expression
Hi, I forgot to add documentation changes in the earlier patch. In the attached patch, I have added documentation as well as fixed other issues you raised. Regards, --Asif On Thu, Nov 22, 2012 at 10:47 PM, Asif Rehman wrote: > Thanks for the review Pavel. I have taken care of the points you raised. > Please see the updated patch. > > Regards, > --Asif > > > On Wed, Nov 21, 2012 at 1:47 AM, Pavel Stehule wrote: > >> related to >> http://archives.postgresql.org/message-id/caauglxwpedfwae6dajmf7sxewfusa0f68p07retbbpf_fsa...@mail.gmail.com >> >> * patched and compiled withou warnings >> >> * All 133 tests passed. >> >> >> but >> >> I don't like >> >> * call invalid function from anonymous block - it is messy (regress >> tests) - there is no reason why do it >> >> +create or replace function foo() returns footype as $$ >> +declare >> + v record; >> + v2 record; >> +begin >> + v := (1, 'hello'); >> + v2 := (1, 'hello'); >> + return (v || v2); >> +end; >> +$$ language plpgsql; >> +DO $$ >> +declare >> + v footype; >> +begin >> + v := foo(); >> + raise info 'x = %', v.x; >> + raise info 'y = %', v.y; >> +end; $$; >> +ERROR: operator does not exist: record || record >> +LINE 1: SELECT (v || v2) >> + ^ >> >> * there is some performance issue >> >> create or replace function fx2(a int) >> returns footype as $$ >> declare x footype; >> begin >> x = (10,20); >> return x; >> end; >> $$ language plpgsql; >> >> postgres=# select sum(fx2.x) from generate_series(1,10) g(i), >> lateral fx2(i); >>sum >> - >> 100 >> (1 row) >> >> Time: 497.129 ms >> >> returns footype as $$ >> begin >> return (10,20); >> end; >> $$ language plpgsql; >> >> postgres=# select sum(fx2.x) from generate_series(1,10) g(i), >> lateral fx2(i); >>sum >> - >> 100 >> (1 row) >> >> Time: 941.192 ms >> >> following code has same functionality and it is faster >> >> if (stmt->expr != NULL) >> { >> if (estate->retistuple) >> { >> TupleDesc tupdesc; >> Datum retval; >> Oid rettype; >> boolisnull; >> int32 tupTypmod; >> Oid tupType; >> HeapTupleHeader td; >> HeapTupleData tmptup; >> >> retval = exec_eval_expr(estate, >> >> stmt->expr, >> &isnull, >> &rettype); >> >> /* Source must be of RECORD or composite type */ >> if (!type_is_rowtype(rettype)) >> ereport(ERROR, >> >> (errcode(ERRCODE_DATATYPE_MISMATCH), >> errmsg("cannot return >> non-composite value from composite type returning function"))); >> >> if (!isnull) >> { >> /* Source is a tuple Datum, so safe to >> do this: */ >> td = DatumGetHeapTupleHeader(retval); >> /* Extract rowtype info and find a >> tupdesc */ >> tupType = HeapTupleHeaderGetTypeId(td); >> tupTypmod = HeapTupleHeaderGetTypMod(td); >> tupdesc = >> lookup_rowtype_tupdesc(tupType, tupTypmod); >> >> /* Build a temporary HeapTuple control >> structure */ >> tmptup.t_len = >> HeapTupleHeaderGetDatumLength(td); >> ItemPointerSetInvalid(&(tmptup.t_self)); >> tmptup.t_tableOid = InvalidOid; >> tmptup.t_data = td; >> >> estate->retval = >> PointerGetDatum(heap_copytuple(&tmptup)); >> estate->rettupdesc = >> CreateTupleDescCopy(tupdesc); >> ReleaseTupleDesc(tupdesc); >> } >> >> estate->retisnull = isnull; >> } >> >> >> >> * it is to restrictive (maybe) - almost all plpgsql' statements does >> automatic conversions (IO conversions when is necessary) >> >> create type footype2 as (a numeric, b varchar) >> >> postgres=# create or replace function fx3(a int) >> returns footype2 as >> $$ >> begin >> return (1000.22234213412342143,'ewqerwqre'); >> end; >> $$ language plpgsql; >> CREATE FUNCTION >> postgres=# select fx3(10); >> ERROR: returned record type does not match expected record type >> DETAIL: Returned type unknown does not match expected type character >> varying in column 2. >> CONTEXT: PL/pgSQL function fx3(integer) while c