Re: [HACKERS] Re: [WIP] Performance Improvement by reducing WAL for Update Operation
On Thursday, September 27, 2012 10:19 AM > Noah Misch writes: > > You cannot assume executor-unmodified columns are also unmodified > from > > heap_update()'s perspective. Expansion in one column may instigate > TOAST > > compression of a logically-unmodified column, and that counts as a > change for > > xlog delta purposes. > > Um ... what about BEFORE triggers? This optimization will not apply in case Before triggers updates the tuple. > > Frankly, I think that expecting the executor to tell you which columns > have been modified is a non-starter. We have a solution for HOT and > it's silly to do the same thing differently just a few lines away. > My apprehension is that it can hit the performance advantage if we compare all attributes to check which have been modified and that to under Buffer Exclusive Lock. In case of HOT only the index attributes get compared. I agree that doing things differently at 2 nearby places is not good. So I will do it same way as for HOT and then take the performance data again and if there is no big impact then we can do it that way. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Noah Misch writes: > You cannot assume executor-unmodified columns are also unmodified from > heap_update()'s perspective. Expansion in one column may instigate TOAST > compression of a logically-unmodified column, and that counts as a change for > xlog delta purposes. Um ... what about BEFORE triggers? Frankly, I think that expecting the executor to tell you which columns have been modified is a non-starter. We have a solution for HOT and it's silly to do the same thing differently just a few lines away. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
On Thursday, September 27, 2012 6:30 AM Josh Berkus wrote: > > Yes that is correct. I thought timeline change happens only when > somebody > > does PITR. > > Can you please tell me why we change timeline after promotion, > because the > > original > > Timeline concept was for PITR and I am not able to trace from code > the > > reason > > why on promotion it is required? > > The idea behind the timeline switch is to prevent a server from > subscribing to a master which is actually behind it. For example, > consider this sequence: > > 1. M1->async->S1 > 2. M1 is at xid 2001 and fails. > 3. S1 did not receive transaction 2001 and is at xid 2000 > 4. S1 is promoted. > 5. S1 processed an new, different transaction 2001 > 6. M1 is repaired and brought back up > 7. M1 is subscribed to S1 > 8. M1 is now corrupt. > > That's why we need the timeline switch. Thanks. I understood this point, but currently in documentation of Timelines, this usecase is not documented (Section 24.3.5). With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)
Hi Alvaro, Let me volunteer for reviewing, of course, but now pgsql_fdw is in my queue... If some other folks can also volunteer it soon, it is welcome. 2012/9/26 Alvaro Herrera : > Excerpts from Alvaro Herrera's message of mié sep 26 13:04:34 -0300 2012: >> Excerpts from Kohei KaiGai's message of mié abr 25 06:40:23 -0300 2012: >> >> > I tried to implement a patch according to the idea. It allows extensions >> > to register an entry point of the self-managed daemon processes, >> > then postmaster start and stop them according to the normal manner. >> >> Here's my attempt at this. This is loosely based on your code, as well >> as parts of what Simon sent me privately. > > Actually please consider this version instead, in which the support to > connect to a database and run transactions actually works. I have also > added a new sample module (worker_spi) which talks to the server using > the SPI interface. > > -- > Álvaro Herrerahttp://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 64-bit API for large object
I checked this patch. It looks good, but here are still some points to be discussed. * I have a question. What is the meaning of INT64_IS_BUSTED? It seems to me a marker to indicate a platform without 64bit support. However, the commit 901be0fad4034c9cf8a3588fd6cf2ece82e4b8ce says as follows: | Remove all the special-case code for INT64_IS_BUSTED, per decision that | we're not going to support that anymore. * At inv_seek(), it seems to me it checks offset correctness with wrong way, as follows: | case SEEK_SET: | if (offset < 0) | elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset); | obj_desc->offset = offset; | break; It is a right assumption, if large object size would be restricted to 2GB. But the largest positive int64 is larger than expected limitation. So, it seems to me it should be compared with (INT_MAX * PAGE_SIZE) instead. * At inv_write(), it definitely needs a check to prevent data-write upper 4TB. In case when obj_desc->offset is a bit below 4TB, an additional 1GB write will break head of the large object because of "pageno" overflow. * Please also add checks on inv_read() to prevent LargeObjectDesc->offset unexpectedly overflows 4TB boundary. * At inv_truncate(), variable "off" is re-defined to int64. Is it really needed change? All its usage is to store the result of "len % LOBLKSIZE". Thanks, 2012/9/24 Nozomi Anzai : > Here is 64-bit API for large object version 2 patch. > >> I checked this patch. It can be applied onto the latest master branch >> without any problems. My comments are below. >> >> 2012/9/11 Tatsuo Ishii : >> > Ok, here is the patch to implement 64-bit API for large object, to >> > allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to >> > 32KB). The patch is based on Jeremy Drake's patch posted on September >> > 23, 2005 >> > (http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php) >> > and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai >> > for the backend part and Yugo Nagata for the rest(including >> > documentation patch). >> > >> > Here are changes made in the patch: >> > >> > 1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata) >> > >> > lo_initialize() gathers backend 64-bit large object handling >> > function's oid, namely lo_lseek64, lo_tell64, lo_truncate64. >> > >> > If client calls lo_*64 functions and backend does not support them, >> > lo_*64 functions return error to caller. There might be an argument >> > since calls to lo_*64 functions can automatically be redirected to >> > 32-bit older API. I don't know this is worth the trouble though. >> > >> I think it should definitely return an error code when user tries to >> use lo_*64 functions towards the backend v9.2 or older, because >> fallback to 32bit API can raise unexpected errors if application >> intends to seek the area over than 2GB. >> >> > Currently lo_initialize() throws an error if one of oids are not >> > available. I doubt we do the same way for 64-bit functions since this >> > will make 9.3 libpq unable to access large objects stored in pre-9.2 >> > PostgreSQL servers. >> > >> It seems to me the situation to split the case of pre-9.2 and post-9.3 >> using a condition of "conn->sversion >= 90300". > > Fixed so, and tested it by deleteing the lo_tell64's row from pg_proc. > > >> > To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr >> > is a pointer to 64-bit integer and actual data is placed somewhere >> > else. There might be other way: add new member to union u to store >> > 64-bit integer: >> > >> > typedef struct >> > { >> > int len; >> > int isint; >> > union >> > { >> > int*ptr;/* can't use void >> > (dec compiler barfs) */ >> > int integer; >> > int64 bigint; /* 64-bit integer */ >> > } u; >> > } PQArgBlock; >> > >> > I'm a little bit worried about this way because PQArgBlock is a public >> > interface. >> > >> I'm inclined to add a new field for the union; that seems to me straight >> forward approach. >> For example, the manner in lo_seek64() seems to me confusable. >> It set 1 on "isint" field even though pointer is delivered actually. >> >> + argv[1].isint = 1; >> + argv[1].len = 8; >> + argv[1].u.ptr = (int *) &len; > > Your proposal was not adopted per discussion. > > >> > Also we add new type "pg_int64": >> > >> > #ifndef NO_PG_INT64 >> > #define HAVE_PG_INT64 1 >> > typedef long long int pg_int64; >> > #endif >> > >> > in postgres_ext.h per suggestion from Tom Lane: >> > http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php >> > >> I'm uncertain about context of this discussion. >> >> Does it make matter if we include and use int64_t instead >> of the self define
[HACKERS] Re: [WIP] Performance Improvement by reducing WAL for Update Operation
On Mon, Sep 24, 2012 at 10:57:02AM +, Amit kapila wrote: > Rebased version of patch based on latest code. I like the direction you're taking with this patch; the gains are striking, especially considering the isolation of the changes. You cannot assume executor-unmodified columns are also unmodified from heap_update()'s perspective. Expansion in one column may instigate TOAST compression of a logically-unmodified column, and that counts as a change for xlog delta purposes. You do currently skip the optimization for relations having a TOAST table, but TOAST compression can still apply. Observe this with text columns of storage mode PLAIN. I see two ways out: skip the new behavior when need_toast=true, or compare all inline column data, not just what the executor modified. One can probably construct a benchmark favoring either choice. I'd lean toward the latter; wide tuples are the kind this change can most help. If the marginal advantage of ignoring known-unmodified columns proves important, we can always bring it back after designing a way to track which columns changed in the toaster. Given that, why not treat the tuple as an opaque series of bytes and not worry about datum boundaries? When several narrow columns change together, say a sequence of sixteen smallint columns, you will use fewer binary delta commands by representing the change with a single 32-byte substitution. If an UPDATE changes just part of a long datum, the delta encoding algorithm will still be able to save considerable space. That case arises in many forms: changing one word in a long string, changing one element in a long array, changing one field of a composite-typed column. Granted, this makes the choice of delta encoding algorithm more important. Like Heikki, I'm left wondering why your custom delta encoding is preferable to an encoding from the literature. Your encoding has much in common with VCDIFF, even sharing two exact command names. If a custom encoding is the right thing, code comments or a README section should at least discuss the advantages over an established alternative. Idle thought: it might pay off to use 1-byte sizes and offsets most of the time. Tuples shorter than 256 bytes are common; for longer tuples, we can afford wider offsets. The benchmarks you posted upthread were helpful. I think benchmarking with fsync=off is best if you don't have a battery-backed write controller or SSD. Otherwise, fsync time dominates a pgbench run. Please benchmark recovery. To do so, set up WAL archiving and take a base backup from a fresh cluster. Run pgbench for awhile. Finally, observe the elapsed time to recover your base backup to the end of archived WAL. > *** a/src/backend/access/common/heaptuple.c > --- b/src/backend/access/common/heaptuple.c > + /* > + * encode_xlog_update > + * Forms a diff tuple from old and new tuple with the modified > columns. > + * > + * att - attribute list. > + * oldtup - pointer to the old tuple. > + * heaptup - pointer to the modified tuple. > + * wal_tup - pointer to the wal record which needs to be formed > from old > + and new tuples by using the modified columns > list. > + * modifiedCols - modified columns list by the update command. > + */ > + void > + encode_xlog_update(Form_pg_attribute *att, HeapTuple oldtup, > +HeapTuple heaptup, HeapTuple wal_tup, > +Bitmapset *modifiedCols) This name is too generic for an extern function. Maybe "heap_delta_encode"? > + void > + decode_xlog_update(HeapTupleHeader htup, uint32 old_tup_len, char *data, > +uint32 *new_tup_len, char *waldata, uint32 > wal_len) Likwise, maybe "heap_delta_decode" here. > *** a/src/backend/access/heap/heapam.c > --- b/src/backend/access/heap/heapam.c > *** > *** 71,77 > #include "utils/syscache.h" > #include "utils/tqual.h" > > - > /* GUC variable */ > boolsynchronize_seqscans = true; > Spurious whitespace change. > *** > *** 3195,3204 l2: > /* XLOG stuff */ > if (RelationNeedsWAL(relation)) > { > ! XLogRecPtr recptr = log_heap_update(relation, buffer, > oldtup.t_self, > ! > newbuf, heaptup, > ! > all_visible_cleared, > ! > all_visible_cleared_new); > > if (newbuf != buffer) > { > --- 3203,3233 > /* XLOG stuff */ > if (RelationNeedsWAL(relation)) > { > ! XLogRecPtr recptr; > ! > ! /* > ! * Apply the xlog diff update algorithm only for hot updates. >
Re: [HACKERS] pg_upgrade does not completely honor --new-port
On Tue, Sep 25, 2012 at 05:36:54PM +0300, Devrim Gunduz wrote: > > Hi, > > I just performed a test upgrade from 9.1 to 9.2, and used --new-port > variable. However, the analyze_new_cluster.sh does not include the new > port, thus when I run it, it fails. Any chance to add the port number to > the script? Well, the reason people normally use the port number is to do a live check, but obviously when the script is created it isn't doing a check. I am worried that if I do embed the port number in there, then if they change the port after the upgrade, they now can't use the script. I assume users would have PGPORT set before running the script, no? > Also, is it worth to add the value specified in --new-bindir as a prefix > to vacuumdb command in the same script? Wow, I never thought of adding a path to those scripts, but it certainly makes sense. I have created the attached patch which does this. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c index bed10f8..2785eb7 100644 --- a/contrib/pg_upgrade/check.c +++ b/contrib/pg_upgrade/check.c @@ -477,7 +477,7 @@ create_script_for_cluster_analyze(char **analyze_script_file_name) ECHO_QUOTE, ECHO_QUOTE); fprintf(script, "echo %sthis script and run:%s\n", ECHO_QUOTE, ECHO_QUOTE); - fprintf(script, "echo %svacuumdb --all %s%s\n", ECHO_QUOTE, + fprintf(script, "echo %s\"%s/vacuumdb\" --all %s%s\n", ECHO_QUOTE, new_cluster.bindir, /* Did we copy the free space files? */ (GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ? "--analyze-only" : "--analyze", ECHO_QUOTE); @@ -498,7 +498,7 @@ create_script_for_cluster_analyze(char **analyze_script_file_name) ECHO_QUOTE, ECHO_QUOTE); fprintf(script, "echo %s--%s\n", ECHO_QUOTE, ECHO_QUOTE); - fprintf(script, "vacuumdb --all --analyze-only\n"); + fprintf(script, "\"%s/vacuumdb\" --all --analyze-only\n", new_cluster.bindir); fprintf(script, "echo%s\n", ECHO_BLANK); fprintf(script, "echo %sThe server is now available with minimal optimizer statistics.%s\n", ECHO_QUOTE, ECHO_QUOTE); @@ -519,7 +519,7 @@ create_script_for_cluster_analyze(char **analyze_script_file_name) ECHO_QUOTE, ECHO_QUOTE); fprintf(script, "echo %s---%s\n", ECHO_QUOTE, ECHO_QUOTE); - fprintf(script, "vacuumdb --all --analyze-only\n"); + fprintf(script, "\"%s/vacuumdb\" --all --analyze-only\n", new_cluster.bindir); fprintf(script, "echo%s\n\n", ECHO_BLANK); #ifndef WIN32 @@ -532,7 +532,7 @@ create_script_for_cluster_analyze(char **analyze_script_file_name) ECHO_QUOTE, ECHO_QUOTE); fprintf(script, "echo %s-%s\n", ECHO_QUOTE, ECHO_QUOTE); - fprintf(script, "vacuumdb --all %s\n", + fprintf(script, "\"%s/vacuumdb\" --all %s\n", new_cluster.bindir, /* Did we copy the free space files? */ (GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ? "--analyze-only" : "--analyze"); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Modest proposal: run check_keywords.pl on every build
On 09/26/2012 09:16 PM, Tom Lane wrote: I've had it with mopping up after oversights like this one: http://archives.postgresql.org/pgsql-hackers/2012-09/msg01057.php We made a script (src/tools/check_keywords.pl) to check for this type of error years ago, and even added it to the top-level "maintainer-check" target awhile back, but nonetheless people continue to screw it up on a regular basis (twice already this year alone). I think we should move the script into src/backend/parser and run it as part of the make gram.c target. (Doing it that way will not cause perl to become required for building from tarballs, since gram.c is supplied prebuilt in tarballs. And we already require perl for builds from raw git pulls.) Any objections? works for me. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
Josh: The good part is you are the first person to ask for a copy and I will send you the hook code that I have and you can be a good sport and put it on GitHub, that is great, you can give us both credit for a joint effort, I do the code, you put it GitHub. The not so good part is that the community has a bunch of other trigger work and other stuff going on, so there was not much interest in non-WAL replication hook code. I do not have time to debate implementation nor wait for release of 9.3 with my needs not met, so I will just keep patching the hook code into whatever release code base comes along. The bad news is that I have not implemented the logic of the external replication daemon. The other good and bad news is that you are free to receive the messages from the hook code thru the unix socket and implement replication any way you want and the bad news is that you are free to IMPLEMENT replication any way you want. I am going to implement master-master-master-master SELF HEALING replication, but that is just my preference. Should take about a week to get it operational and another week to see how it works in my geographically dispersed servers in the cloud. Send me a note if it is ok to send you a zip file with the source code files that I touched in the 9.2 code base so you can shove it up on GitHub. Cheers, marco On 9/26/2012 6:48 PM, Josh Berkus wrote: I was able to patch the 9.2.0 code base in 1 day and change my entire architecture strategy for replication into self healing async master-master-master and the tiniest bit of sharding code imaginable Sounds cool. Do you have a fork available on Github? I'll try it out. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Modest proposal: run check_keywords.pl on every build
I've had it with mopping up after oversights like this one: http://archives.postgresql.org/pgsql-hackers/2012-09/msg01057.php We made a script (src/tools/check_keywords.pl) to check for this type of error years ago, and even added it to the top-level "maintainer-check" target awhile back, but nonetheless people continue to screw it up on a regular basis (twice already this year alone). I think we should move the script into src/backend/parser and run it as part of the make gram.c target. (Doing it that way will not cause perl to become required for building from tarballs, since gram.c is supplied prebuilt in tarballs. And we already require perl for builds from raw git pulls.) Any objections? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
> Yes that is correct. I thought timeline change happens only when somebody > does PITR. > Can you please tell me why we change timeline after promotion, because the > original > Timeline concept was for PITR and I am not able to trace from code the > reason > why on promotion it is required? The idea behind the timeline switch is to prevent a server from subscribing to a master which is actually behind it. For example, consider this sequence: 1. M1->async->S1 2. M1 is at xid 2001 and fails. 3. S1 did not receive transaction 2001 and is at xid 2000 4. S1 is promoted. 5. S1 processed an new, different transaction 2001 6. M1 is repaired and brought back up 7. M1 is subscribed to S1 8. M1 is now corrupt. That's why we need the timeline switch. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Does the SQL standard actually define LATERAL anywhere?
Peter Eisentraut writes: > On Sat, 2012-09-01 at 00:29 -0400, Tom Lane wrote: >> this is a spec-defined syntax so surely the SQL standard ought to tell >> us what to do. But I'm darned if I see anything in the standard that >> defines the actual *behavior* of a LATERAL query. > I have it another read and couldn't find anything either. As written, > LATERAL is effectively a noise word, AFAICT. It's not a noise word --- its effects on scope of name visibility are spelled out clearly enough. What is not clear is what the query semantics are supposed to be in the presence of references that would be disallowed in the traditional understanding of a FROM clause. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
> I was able to patch the 9.2.0 code base in 1 day and change my entire > architecture strategy for replication > into self healing async master-master-master and the tiniest bit of > sharding code imaginable Sounds cool. Do you have a fork available on Github? I'll try it out. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EVENT Keyword and CREATE TABLE
Brian Weaver writes: > In some of our old tables going back several years we a column named > 'event' as in: > event character varying(1000) > I was working off trunk and the database refuses to create this table > any longer. Is this by design or is it a regression bug? It's a bug. The event-trigger patch added EVENT as a new keyword, but forgot to list it in the unreserved_keywords production, which is necessary to make it actually act unreserved. I've committed a fix, and manually verified there are no other such errors at present, but this isn't the first time this has happened. We probably need to put in some automated cross-check, or it won't be the last time either. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Does the SQL standard actually define LATERAL anywhere?
On Sat, 2012-09-01 at 00:29 -0400, Tom Lane wrote: > this is a spec-defined syntax so surely the SQL standard ought to tell > us what to do. But I'm darned if I see anything in the standard that > defines the actual *behavior* of a LATERAL query. I have it another read and couldn't find anything either. As written, LATERAL is effectively a noise word, AFAICT. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum stress-testing our system
On 26.9.2012 18:14, Jeff Janes wrote: > On Wed, Sep 26, 2012 at 8:25 AM, Tomas Vondra wrote: >> Dne 26.09.2012 16:51, Jeff Janes napsal: >> >> >>> What is generating the endless stream you are seeing is that you have >>> 1000 databases so if naptime is one minute you are vacuuming 16 per >>> second. Since every database gets a new process, that process needs >>> to read the file as it doesn't inherit one. >> >> >> Right. But that makes the 10ms timeout even more strange, because the >> worker is then using the data for very long time (even minutes). > > On average that can't happen, or else your vacuuming would fall way > behind. But I agree, there is no reason to have very fresh statistics > to start with. naptime/5 seems like a good cutoff for me for the > start up reading. If a table only becomes eligible for vacuuming in > the last 20% of the naptime, I see no reason that it can't wait > another round. But that just means the statistics collector needs to > write the file less often, the workers still need to read it once per > database since each one only vacuums one database and don't inherit > the data from the launcher. So what happens if there are two workers vacuuming the same database? Wouldn't that make it more probable that were already vacuumed by the other worker? See the comment at the beginning of autovacuum.c, where it also states that the statfile is reloaded before each table (probably because of the calls to autovac_refresh_stats which in turn calls clear_snapshot). >>> I think forking it off to to another value would be better. If you >>> are an autovacuum worker which is just starting up and so getting its >>> initial stats, you can tolerate a stats file up to "autovacuum_naptime >>> / 5.0" stale. If you are already started up and are just about to >>> vacuum a table, then keep the staleness at PGSTAT_RETRY_DELAY as it >>> currently is, so as not to redundantly vacuum a table. >> >> >> I always thought there's a "no more than one worker per database" limit, >> and that the file is always reloaded when switching to another database. >> So I'm not sure how could a worker see such a stale table info? Or are >> the workers keeping the stats across multiple databases? > > If you only have one "active" database, then all the workers will be > in it. I don't how likely it is that they will leap frog each other > and collide. But anyway, if you 1000s of databases, then each one > will generally require zero vacuums per naptime (as you say, it is > mostly read only), so it is the reads upon start-up, not the reads per > table that needs vacuuming, which generates most of the traffic. Once > you separate those two parameters out, playing around with the > PGSTAT_RETRY_DELAY one seems like a needless risk. OK, right. My fault. Yes, our databases are mostly readable - more precisely whenever we load data, we immediately do VACUUM ANALYZE on the tables, so autovacuum never kicks in on them. The only thing it works on are system catalogs and such stuff. 3) logic detecting the proper PGSTAT_RETRY_DELAY value - based mostly on the time it takes to write the file (e.g. 10x the write time or something). >>> >>> >>> This is already in place. >> >> >> Really? Where? > > I had thought that this part was effectively the same thing: > > * We don't recompute min_ts after sleeping, except in the > * unlikely case that cur_ts went backwards. > > But I think I did not understand your proposal. > >> >> I've checked the current master, and the only thing I see in >> pgstat_write_statsfile >> is this (line 3558): >> >> last_statwrite = globalStats.stats_timestamp; >> >> https://github.com/postgres/postgres/blob/master/src/backend/postmaster/pgstat.c#L3558 >> >> >> I don't think that's doing what I meant. That really doesn't scale the >> timeout >> according to write time. What happens right now is that when the stats file >> is >> written at time 0 (starts at zero, write finishes at 100 ms), and a worker >> asks >> for the file at 99 ms (i.e. 1ms before the write finishes), it will set the >> time >> of the inquiry to last_statrequest and then do this >> >>if (last_statwrite < last_statrequest) >> pgstat_write_statsfile(false); >> >> i.e. comparing it to the start of the write. So another write will start >> right >> after the file is written out. And over and over. > > Ah. I had wondered about this before too, and wondered if it would be > a good idea to have it go back to the beginning of the stats file, and > overwrite the timestamp with the current time (rather than the time it > started writing it), as the last action it does before the rename. I > think that would automatically make it adaptive to the time it takes > to write out the file, in a fairly simple way. Yeah, I was thinking about that too. >> Moreover there's the 'rename' step making the new file invisible for the >> worker >> processes, which makes the thing
Re: [HACKERS] autovacuum stress-testing our system
On 26.9.2012 18:29, Tom Lane wrote: > Alvaro Herrera writes: >> Excerpts from Euler Taveira's message of miĂŠ sep 26 11:53:27 -0300 2012: >>> On 26-09-2012 09:43, Tomas Vondra wrote: 5) splitting the single stat file into multiple pieces - e.g. per database, written separately, so that the autovacuum workers don't need to read all the data even for databases that don't need to be vacuumed. This might be combined with (4). > >>> IMHO that's the definitive solution. It would be one file per database plus >>> a >>> global one. That way, the check would only read the global.stat and process >>> those database that were modified. Also, an in-memory map could store that >>> information to speed up the checks. > >> +1 > > That would help for the case of hundreds of databases, but how much > does it help for lots of tables in a single database? Well, it wouldn't, but it wouldn't make it worse either. Or at least that's how I understand it. > I'm a bit suspicious of the idea that we should encourage people to use > hundreds of databases per installation anyway: the duplicated system > catalogs are going to be mighty expensive, both in disk space and in > their cache footprint in shared buffers. There was some speculation > at the last PGCon about how we might avoid the duplication, but I think > we're years away from any such thing actually happening. You don't need to encourage us to do that ;-) We know it's not perfect and considering a good alternative - e.g. several databases (~10) with schemas inside, replacing the current database-only approach. This way we'd get multiple stat files (thus gaining the benefits) with less overhead (shared catalogs). And yes, using tens of thousands of tables (serving as "caches") for a reporting solution is "interesting" (as in the old Chinese curse) too. > What seems to me like it could help more is fixing things so that the > autovac launcher needn't even launch a child process for databases that > haven't had any updates lately. I'm not sure how to do that, but it > probably involves getting the stats collector to produce some kind of > summary file. Yes, I've proposed something like this in my original mail - setting a "dirty" flag on objects (a database in this case) whenever a table in it gets eligible for vacuum/analyze. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system_information.triggers & truncate triggers
On 27/09/12 02:59, Christopher Browne wrote: On Wed, Sep 26, 2012 at 10:02 AM, Tom Lane wrote: Daniel Farina writes: On Tue, Sep 25, 2012 at 10:55 PM, Jaime Casanova wrote: The definition of information_schema.triggers contains this: -- TRIGGER_TYPE_UPDATE; we intentionally omit TRIGGER_TYPE_TRUNCATE so it seems that we are not showing TRUNCATE triggers intentionally, but that comment fails to explain why Wouldn't it be because TRUNCATE is a PostgreSQL language extension? Yeah. The SQL standard specifies the allowed values in that column, and TRUNCATE is not among them. For similar reasons, you won't find exclusion constraints represented in the information_schema views, and there are some other cases that I don't recall this early in the morning. The point of the information_schema (at least IMHO) is to present standard-conforming information about standard-conforming database objects in a standard-conforming way, so that cross-DBMS applications can rely on what they'll see there. If you are doing anything that's not described by the SQL standard, you will get at best an incomplete view of it from the information_schema. In that case you're a lot better off looking directly at the underlying catalogs. (Yes, I'm aware that some other DBMS vendors have a more liberal interpretation of what standards compliance means in this area.) Let me grouse about this a bit... I appreciate that standards compliance means that information_schema needs to be circumspect as to what it includes. But it is irritating that information_schema provides a representation of (for instance) triggers that, at first, looks nice and clean and somewhat version-independent, only to fall over because there's a class of triggers that it consciously ignores. If I'm wanting to do schema analytics on this (and I do), I'm left debating between painful choices: a) Use information_schema for what it *does* have, and then add in a surprising-looking hack that's pretty version-dependent to draw in the other triggers that it left out b) Ignore the seeming-nice information_schema representation, and construct a version-dependent extraction covering everything that more or less duplicates the work being done by information_schema.triggers. I'd really like to have something like c) Something like information_schema that "takes the standards-conformance gloves off" and gives a nice representation of all the triggers. Make no mistake, I'm not casting aspersions at how pg_trigger was implemented; I have no complaint there, as it's quite fair that the internal representation won't be totally "human-readability-friendly." That is a structure that is continuously accessed by backends, and it is entirely proper to bias implementation to internal considerations. But I'd sure like ways to get at more analytically-friendly representations. A different place where I wound up having to jump through considerable hoops when doing schema analytics was vis-a-vis identifying functions. I need to be able to compare schemas across databases, so oid-based identification of functions is a total non-starter. It appears that the best identification of a function would be based on the combination of schema name, function name, and the concatenation of argument data types. It wasn't terribly difficult to construct that third bit, but it surely would be nice if there was a view capturing it, and possibly even serializing it into a table to enable indexing on it. Performance-wise, function comparisons turned out to be one of the most expensive things I did, specifically because of that mapping surrounding arguments. I agree with your comments, but I couldn't helping thinking about Grouse shooting! :-) http://www.telegraph.co.uk/news/features/7944546/Grouse-shooting-season.html [...] Grouse shooting season Grouse-shooters have been looking forward to mid-August with bridal excitement since the Game Act of 1831 made it illegal to shoot out of season. [...] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] EVENT Keyword and CREATE TABLE
I think I just got bitten hard by a commit in mid July... git sha1 3855968. In some of our old tables going back several years we a column named 'event' as in: CREATE TABLE tblaudittrail ( id bigint NOT NULL, siteid integer NOT NULL, entrytype character varying(25), form character varying(50), recordid integer, field character varying(25), changedfrom character varying(500), changedto character varying(500), changedon timestamp with time zone, changedby character varying(25), event character varying(1000) ); I was working off trunk and the database refuses to create this table any longer. Is this by design or is it a regression bug? Thanks -- Brian -- /* insert witty comment here */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] data to json enhancements
On 09/26/2012 06:46 PM, Tom Lane wrote: Andrew Dunstan writes: Drawing together various discussions both here and elsewhere (e.g. the PostgresOpen hallway track) I propose to work on the following: 1. make datum_to_json() honor a type's cast to json if it exists. The fallback is to use the type's string representation, as now. 2. add a cast hstore -> json (any others needed for core / contrib types ?) 3. add a to_json(anyelement) function 4. add a new aggregate function json_agg(anyrecord) -> json to simplify and make more effecient turning a resultset into json. Comments welcome. ISTM the notion of to_json(anyelement) was already heavily discussed and had spec-compliance issues ... in fact, weren't you one of the people complaining? What exactly does #3 mean that is different from the previous thread? Also, on reflection I'm not sure about commandeering cast-to-json for this --- aren't we really casting to "json member" or something like that? The distinction between a container and its contents seems important here. With a container type as source, it might be important to do something different if we're coercing it to a complete JSON value versus something that will be just one member. I'm handwaving here because I don't feel like going back to re-read the RFC, but it seems like something that should be considered carefully before we lock down an assumption that there can never be a difference. "json value" is a superset of "json object", so no special handling should be required here - they nest cleanly. (you can check http://www.json.org/ for definition.) I agree that one of the standards did say that "JSON generators" should produce only JSON-serialised arrays and dictionaries and not "JSON values" - that is one of (literal null, true or false or a json array, dictionary, number or double-quoted string) But if we would do that we would really be the _only_ one who would place this restriction on their to_json function. As I already reported in the previous discussion, neither python, ruby or neither of the two javascripts I tested (mozilla & chrome) place this restriction on their json serialisation functions. Maybe the "JSON generator" in the standard means something else, like a stand-alone program or server. The only mention/definition for Generator in the RFC is "5. Generators A JSON generator produces JSON text. The resulting text MUST strictly conform to the JSON grammar." To be formally standards compliant I propose that we officially define our json type to store "a json value" and as json object is really just a subset of json value we can provide (or just advise the use of) a domain json_object, which has CHECK for the first non-whitespace char being { or [ . regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_signal_backend() asymmetry
I'm marking this patch Ready for Committer. I suggest backpatching it to 9.2; the patch corrects an oversight in 9.2 changes. There's more compatibility value in backpatching than in retaining distinct behavior for 9.2 only. On Thu, Jun 28, 2012 at 09:32:41AM -0700, Josh Kupershmidt wrote: > ! if (!superuser()) > { > /* > ! * Since the user is not superuser, check for matching roles. >*/ > ! if (proc->roleId != GetUserId()) > ! return SIGNAL_BACKEND_NOPERMISSION; > } I would have collapsed the conditionals and deleted the obvious comment: if (!(superuser() || proc->roleId == GetUserId())) return SIGNAL_BACKEND_NOPERMISSION; The committer can do that if desired; no need for another version. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_reorg in core?
On Mon, Sep 24, 2012 at 03:55:35PM -0700, Josh Berkus wrote: > On 9/24/12 3:43 PM, Simon Riggs wrote: > > On 24 September 2012 17:36, Josh Berkus wrote: > >> > For me, the Postgres user interface should include > * REINDEX CONCURRENTLY > >> > >> I don't see why we don't have REINDEX CONCURRENTLY now. > > > > Same reason for everything on (anyone's) TODO list. > > Yes, I'm just pointing out that it would be a very small patch for > someone, and that AFAIK it didn't make it on the TODO list yet. I see it on the TODO list, and it has been there for years: https://wiki.postgresql.org/wiki/Todo#Indexes Add REINDEX CONCURRENTLY, like CREATE INDEX CONCURRENTLY -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alter enum add value if not exists
On Sat, Sep 22, 2012 at 06:08:19PM -0400, Tom Lane wrote: > Hannu Krosing writes: > > On 09/22/2012 11:49 PM, Andrew Dunstan wrote: > >> Not really, I guess we should for the sake of consistency, although TBH > >> I find it just useless noise and rather wish we hadn't started the > >> trend when we did the first DROP IF NOT EXISTS stuff. > > > Time for a GUC > > existence_notice = none | exists | not_exists | all > > Not another one :-( ... isn't client_min_messages good enough? > > We sort of had this discussion before w.r.t. the notices about creating > primary key indexes etc. I wonder whether we should make a formal > effort to split NOTICE message level into, say, NOTICE and NOVICE > levels, where the latter contains all the "training wheels" stuff that > experienced users would really rather not see. Or maybe just redefine > NOTICE as meaning novice-oriented messages, and push anything that > doesn't seem to fit that categorization into another existing message > level? I have always wanted a "novice" level, so we could warn about things like unjoined tables. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] data to json enhancements
On 09/26/2012 01:46 PM, Tom Lane wrote: Andrew Dunstan writes: Drawing together various discussions both here and elsewhere (e.g. the PostgresOpen hallway track) I propose to work on the following: 1. make datum_to_json() honor a type's cast to json if it exists. The fallback is to use the type's string representation, as now. 2. add a cast hstore -> json (any others needed for core / contrib types ?) 3. add a to_json(anyelement) function 4. add a new aggregate function json_agg(anyrecord) -> json to simplify and make more effecient turning a resultset into json. Comments welcome. ISTM the notion of to_json(anyelement) was already heavily discussed and had spec-compliance issues ... in fact, weren't you one of the people complaining? What exactly does #3 mean that is different from the previous thread? I thought I got shouted down on that issue. The main reason we didn't include it was that it was getting rather late for those changes, IIRC - we only just got in any json stuff at all in under the wire. And in fact you can have a json value now that's not an array or object: andrew=# select json '1' as num, json '"foo"' as txt; num | txt -+--- 1 | "foo" Also, on reflection I'm not sure about commandeering cast-to-json for this --- aren't we really casting to "json member" or something like that? The distinction between a container and its contents seems important here. With a container type as source, it might be important to do something different if we're coercing it to a complete JSON value versus something that will be just one member. I'm handwaving here because I don't feel like going back to re-read the RFC, but it seems like something that should be considered carefully before we lock down an assumption that there can never be a difference. I think in view of the above this would be moot, no? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] data to json enhancements
Andrew Dunstan writes: > Drawing together various discussions both here and elsewhere (e.g. the > PostgresOpen hallway track) I propose to work on the following: > 1. make datum_to_json() honor a type's cast to json if it exists. The > fallback is to use the type's string representation, as now. > 2. add a cast hstore -> json (any others needed for core / contrib types ?) > 3. add a to_json(anyelement) function > 4. add a new aggregate function json_agg(anyrecord) -> json to simplify > and make more effecient turning a resultset into json. > Comments welcome. ISTM the notion of to_json(anyelement) was already heavily discussed and had spec-compliance issues ... in fact, weren't you one of the people complaining? What exactly does #3 mean that is different from the previous thread? Also, on reflection I'm not sure about commandeering cast-to-json for this --- aren't we really casting to "json member" or something like that? The distinction between a container and its contents seems important here. With a container type as source, it might be important to do something different if we're coercing it to a complete JSON value versus something that will be just one member. I'm handwaving here because I don't feel like going back to re-read the RFC, but it seems like something that should be considered carefully before we lock down an assumption that there can never be a difference. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum stress-testing our system
On Wed, Sep 26, 2012 at 9:29 AM, Tom Lane wrote: > Alvaro Herrera writes: >> Excerpts from Euler Taveira's message of mié sep 26 11:53:27 -0300 2012: >>> On 26-09-2012 09:43, Tomas Vondra wrote: 5) splitting the single stat file into multiple pieces - e.g. per database, written separately, so that the autovacuum workers don't need to read all the data even for databases that don't need to be vacuumed. This might be combined with (4). > >>> IMHO that's the definitive solution. It would be one file per database plus >>> a >>> global one. That way, the check would only read the global.stat and process >>> those database that were modified. Also, an in-memory map could store that >>> information to speed up the checks. > >> +1 > > That would help for the case of hundreds of databases, but how much > does it help for lots of tables in a single database? It doesn't help that case, but that case doesn't need much help. If you have N statistics-kept objects in total spread over M databases, of which T objects need vacuuming per naptime, the stats file traffic is proportional to N*(M+T). If T is low, then there is generally is no problem if M is also low. Or at least, the problem is much smaller than when M is high for a fixed value of N. > I'm a bit suspicious of the idea that we should encourage people to use > hundreds of databases per installation anyway: I agree with that, but we could still do a better job of tolerating it; without encouraging it. If someone volunteers to write the code to do this, what trade-offs would there be? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] data to json enhancements
Drawing together various discussions both here and elsewhere (e.g. the PostgresOpen hallway track) I propose to work on the following: 1. make datum_to_json() honor a type's cast to json if it exists. The fallback is to use the type's string representation, as now. 2. add a cast hstore -> json (any others needed for core / contrib types ?) 3. add a to_json(anyelement) function 4. add a new aggregate function json_agg(anyrecord) -> json to simplify and make more effecient turning a resultset into json. Comments welcome. I also propose to work on some accessor functions to pull data out of json (as opposed to producing json from non-json). I will email separately about that when I have firmed up a proposed API a bit more. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum stress-testing our system
Alvaro Herrera writes: > Excerpts from Euler Taveira's message of mié sep 26 11:53:27 -0300 2012: >> On 26-09-2012 09:43, Tomas Vondra wrote: >>> 5) splitting the single stat file into multiple pieces - e.g. per database, >>> written separately, so that the autovacuum workers don't need to read all >>> the data even for databases that don't need to be vacuumed. This might be >>> combined with (4). >> IMHO that's the definitive solution. It would be one file per database plus a >> global one. That way, the check would only read the global.stat and process >> those database that were modified. Also, an in-memory map could store that >> information to speed up the checks. > +1 That would help for the case of hundreds of databases, but how much does it help for lots of tables in a single database? I'm a bit suspicious of the idea that we should encourage people to use hundreds of databases per installation anyway: the duplicated system catalogs are going to be mighty expensive, both in disk space and in their cache footprint in shared buffers. There was some speculation at the last PGCon about how we might avoid the duplication, but I think we're years away from any such thing actually happening. What seems to me like it could help more is fixing things so that the autovac launcher needn't even launch a child process for databases that haven't had any updates lately. I'm not sure how to do that, but it probably involves getting the stats collector to produce some kind of summary file. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.1] 2 bugs with extensions
Marko Kreen writes: >> Can we work out a minimal example to reproduce the bug? > > Yes, the above text or sql/pgq_coop/sql/pgq_coop_init_ext.sql > > I guess you could replace pgq_coop with any extension just > consisting of just functions. I did just that, with a single function, and couldn't reproduce the problem either in 9.1 nor in master, with relocatable = true then with relocatable = false and schema = 'pg_catalog' as in your repository. The extension has those files contents: Makefile EXTENSION = extschema DATA = extschema--unpackaged--1.0.sql extschema.sql extschema--1.0.sql PG_CONFIG = pg_config PGXS := $(shell $(PG_CONFIG) --pgxs) include $(PGXS) extschema.control # extschema extension comment = 'debug extension schema handling' default_version = '1.0' relocatable = false schema = 'pg_catalog' extschema.sql create schema ext; create or replace function ext.extschema() returns int language sql as $$ select 1; $$; extschema--unpackaged--1.0.sql alter extension extschema add schema ext; alter extension extschema add function ext.extschema(); extschema--1.0.sql create schema ext; create or replace function ext.extschema() returns int language sql as $$ select 1; $$; So I guess that needs some more work to reproduce the bug. >> (The Makefile in skytools/sql/pgq_coop fails on my OS) > > How does it fail? Are you using gnu make? What version? I guess sed is the problem here, it's a BSD variant: dim ~/skytools/sql/pgq_coop make pgq_coop--unpackaged--3.1.sql sed: 1: "/^\\/{s/\\i //;p}\n": extra characters at the end of p command sed: 1: "/^\\/{s/\\i //;p}\n": extra characters at the end of p command sed: 1: "/^\\/{s/\\i //;p}\n": extra characters at the end of p command cat pgq_coop.upgrade.sql > pgq_coop--unpackaged--3.1.sql sed --version sed: illegal option -- - usage: sed script [-Ealn] [-i extension] [file ...] sed [-Ealn] [-i extension] [-e script] ... [-f script_file] ... [file ...] Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum stress-testing our system
On Wed, Sep 26, 2012 at 8:25 AM, Tomas Vondra wrote: > Dne 26.09.2012 16:51, Jeff Janes napsal: > > >> What is generating the endless stream you are seeing is that you have >> 1000 databases so if naptime is one minute you are vacuuming 16 per >> second. Since every database gets a new process, that process needs >> to read the file as it doesn't inherit one. > > > Right. But that makes the 10ms timeout even more strange, because the > worker is then using the data for very long time (even minutes). On average that can't happen, or else your vacuuming would fall way behind. But I agree, there is no reason to have very fresh statistics to start with. naptime/5 seems like a good cutoff for me for the start up reading. If a table only becomes eligible for vacuuming in the last 20% of the naptime, I see no reason that it can't wait another round. But that just means the statistics collector needs to write the file less often, the workers still need to read it once per database since each one only vacuums one database and don't inherit the data from the launcher. >> I think forking it off to to another value would be better. If you >> are an autovacuum worker which is just starting up and so getting its >> initial stats, you can tolerate a stats file up to "autovacuum_naptime >> / 5.0" stale. If you are already started up and are just about to >> vacuum a table, then keep the staleness at PGSTAT_RETRY_DELAY as it >> currently is, so as not to redundantly vacuum a table. > > > I always thought there's a "no more than one worker per database" limit, > and that the file is always reloaded when switching to another database. > So I'm not sure how could a worker see such a stale table info? Or are > the workers keeping the stats across multiple databases? If you only have one "active" database, then all the workers will be in it. I don't how likely it is that they will leap frog each other and collide. But anyway, if you 1000s of databases, then each one will generally require zero vacuums per naptime (as you say, it is mostly read only), so it is the reads upon start-up, not the reads per table that needs vacuuming, which generates most of the traffic. Once you separate those two parameters out, playing around with the PGSTAT_RETRY_DELAY one seems like a needless risk. > >>> 3) logic detecting the proper PGSTAT_RETRY_DELAY value - based mostly on >>> the >>> time >>>it takes to write the file (e.g. 10x the write time or something). >> >> >> This is already in place. > > > Really? Where? I had thought that this part was effectively the same thing: * We don't recompute min_ts after sleeping, except in the * unlikely case that cur_ts went backwards. But I think I did not understand your proposal. > > I've checked the current master, and the only thing I see in > pgstat_write_statsfile > is this (line 3558): > > last_statwrite = globalStats.stats_timestamp; > > https://github.com/postgres/postgres/blob/master/src/backend/postmaster/pgstat.c#L3558 > > > I don't think that's doing what I meant. That really doesn't scale the > timeout > according to write time. What happens right now is that when the stats file > is > written at time 0 (starts at zero, write finishes at 100 ms), and a worker > asks > for the file at 99 ms (i.e. 1ms before the write finishes), it will set the > time > of the inquiry to last_statrequest and then do this > >if (last_statwrite < last_statrequest) > pgstat_write_statsfile(false); > > i.e. comparing it to the start of the write. So another write will start > right > after the file is written out. And over and over. Ah. I had wondered about this before too, and wondered if it would be a good idea to have it go back to the beginning of the stats file, and overwrite the timestamp with the current time (rather than the time it started writing it), as the last action it does before the rename. I think that would automatically make it adaptive to the time it takes to write out the file, in a fairly simple way. > > Moreover there's the 'rename' step making the new file invisible for the > worker > processes, which makes the thing a bit more complicated. I think renames are assumed to be atomic. Either it sees the old one, or the new one, but never sees neither. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
Peter Eisentraut writes: > Here is a problem: If I write an "hstore-ng" extension, I have two In the patch for Finer Extension Dependencies, the offer is that you have the hstore-ng extension provide the 'hstore' feature. https://commitfest.postgresql.org/action/patch_view?id=727 Now, that says nothing about the type's OID nor how to trust it. In my proposal we would have a pg_features catalog, that are just keywords used in control files so that you can change your mind as far as dependencies are concerned from one release of your extension to the next (adding or removing some, splitting the extension in parts or joining them again, etc). Those features entries do not exist yet, and are a very specific set of OIDs, so we could maybe provision a large number of them here and refuse to assign them to untrusted sources. Again, the complex part of that problem, to me, is not about how to manage those numbers (running a registry, adding syntax support, etc) as much as how to manage a distributed network of trust. When applying for an OID or a feature identifier, the registry team could maybe GPG sign "something" with the PostgreSQL private key, the backend only having the public key embedded in its code and binary. IIRC that's enough to then validate the feature name/oid (with the signature, a new control file parameter) and allow it to get installed in the registry reserved range. We then need a way to ask for the list of objects provided by the extension providing that feature (it's been signed, it's now trusted), something that we already know how to do (joining pg_depend etc). The result of that query would be cached on the client side, because we're not trying to change the way pg_type OID assignment ranges work. We could maybe specialize all this down to pg_type and distribute a list of reserved OIDs there too, but it seems like that ship has already sailed (nice pg_upgrade chicken and egg problem). Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum stress-testing our system
Dne 26.09.2012 17:29, Alvaro Herrera napsal: Excerpts from Tomas Vondra's message of mié sep 26 12:25:58 -0300 2012: Dne 26.09.2012 16:51, Jeff Janes napsal: > I think forking it off to to another value would be better. If you > are an autovacuum worker which is just starting up and so getting its > initial stats, you can tolerate a stats file up to > "autovacuum_naptime > / 5.0" stale. If you are already started up and are just about to > vacuum a table, then keep the staleness at PGSTAT_RETRY_DELAY as it > currently is, so as not to redundantly vacuum a table. I always thought there's a "no more than one worker per database" limit, There is no such limitation. OK, thanks. Still, reading/writing the small (per-database) files would be much faster so it would be easy to read/write them more often on demand. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum stress-testing our system
Excerpts from Tomas Vondra's message of mié sep 26 12:25:58 -0300 2012: > Dne 26.09.2012 16:51, Jeff Janes napsal: > > I think forking it off to to another value would be better. If you > > are an autovacuum worker which is just starting up and so getting its > > initial stats, you can tolerate a stats file up to > > "autovacuum_naptime > > / 5.0" stale. If you are already started up and are just about to > > vacuum a table, then keep the staleness at PGSTAT_RETRY_DELAY as it > > currently is, so as not to redundantly vacuum a table. > > I always thought there's a "no more than one worker per database" > limit, There is no such limitation. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.1] 2 bugs with extensions
On Wed, Sep 26, 2012 at 5:36 PM, Dimitri Fontaine wrote: > After much distractions I've at least been able to do something about > that bug report. Thanks. >> 2) If there is schema with functions, but nothing else, >> create extension pgq_coop from 'unpackaged'; >> drop extension pgq_coop; >> create extension pgq_coop; >> ERROR: schema "pgq_coop" already exists > > From reading the scripts, it's not clear to me, but it appears that the > schema existed before the create from unpackaged then got added to the > extension by way of > > ALTER EXTENSION pgq_coop ADD SCHEMA pgq_coop; > > Is that true? Yes. > Can we work out a minimal example to reproduce the bug? Yes, the above text or sql/pgq_coop/sql/pgq_coop_init_ext.sql I guess you could replace pgq_coop with any extension just consisting of just functions. > (The Makefile in skytools/sql/pgq_coop fails on my OS) How does it fail? Are you using gnu make? What version? -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum stress-testing our system
Excerpts from Euler Taveira's message of mié sep 26 11:53:27 -0300 2012: > On 26-09-2012 09:43, Tomas Vondra wrote: > > 5) splitting the single stat file into multiple pieces - e.g. per database, > >written separately, so that the autovacuum workers don't need to read all > >the data even for databases that don't need to be vacuumed. This might be > >combined with (4). > > > IMHO that's the definitive solution. It would be one file per database plus a > global one. That way, the check would only read the global.stat and process > those database that were modified. Also, an in-memory map could store that > information to speed up the checks. +1 > The only downside I can see is that you > will increase the number of opened file descriptors. Note that most users of pgstat will only have two files open (instead of one as currently) -- one for shared, one for their own database. Only pgstat itself and autovac launcher would need to open pgstat files for all databases; but both do not have a need to open other files (arbitrary tables) so this shouldn't be a major problem. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum stress-testing our system
Dne 26.09.2012 16:51, Jeff Janes napsal: On Wed, Sep 26, 2012 at 5:43 AM, Tomas Vondra wrote: First - our system really is not a "common" one - we do have ~1000 of databases of various size, each containing up to several thousands of tables (several user-defined tables, the rest serve as caches for a reporting application - yes, it's a bit weird design but that's life). This all leads to pgstat.stat significantly larger than 60 MB. ... Now, let's suppose the write takes >10 ms, which is the PGSTAT_RETRY_DELAY values. With our current pgstat.stat filesize/num of relations, this is quite common. Actually the common write time in our case is ~100 ms, even if we move the file into tmpfs. That means that almost all the calls to backend_read_statsfile (which happen in all pgstat_fetch_stat_*entry calls) result in continuous stream of inquiries from the autovacuum workers, writing/reading of the file. I don't think it actually does. What you are missing is the same thing I was missing a few weeks ago when I also looked into something like this. 3962: * We don't recompute min_ts after sleeping, except in the * unlikely case that cur_ts went backwards. That means the file must have been written within 10 ms of when we *first* asked for it. Yeah, right - I've missed the first "if (pgStatDBHash)" check right at the beginning. What is generating the endless stream you are seeing is that you have 1000 databases so if naptime is one minute you are vacuuming 16 per second. Since every database gets a new process, that process needs to read the file as it doesn't inherit one. Right. But that makes the 10ms timeout even more strange, because the worker is then using the data for very long time (even minutes). ... First, I'm interested in feedback - did I get all the details right, or am I missing something important? Next, I'm thinking about ways to solve this: 1) turning of autovacuum, doing regular VACUUM ANALYZE from cron Increasing autovacuum_naptime seems like a far better way to do effectively the same thing. Agreed. One of my colleagues turned autovacuum off a few years back and that was a nice lesson how not to solve this kind of issues. 2) tweaking the timer values, especially increasing PGSTAT_RETRY_DELAY and so on to consider several seconds to be fresh enough - Would be nice to have this as a GUC variables, although we can do another private patch on our own. But more knobs is not always better. I think forking it off to to another value would be better. If you are an autovacuum worker which is just starting up and so getting its initial stats, you can tolerate a stats file up to "autovacuum_naptime / 5.0" stale. If you are already started up and are just about to vacuum a table, then keep the staleness at PGSTAT_RETRY_DELAY as it currently is, so as not to redundantly vacuum a table. I always thought there's a "no more than one worker per database" limit, and that the file is always reloaded when switching to another database. So I'm not sure how could a worker see such a stale table info? Or are the workers keeping the stats across multiple databases? 3) logic detecting the proper PGSTAT_RETRY_DELAY value - based mostly on the time it takes to write the file (e.g. 10x the write time or something). This is already in place. Really? Where? I've checked the current master, and the only thing I see in pgstat_write_statsfile is this (line 3558): last_statwrite = globalStats.stats_timestamp; https://github.com/postgres/postgres/blob/master/src/backend/postmaster/pgstat.c#L3558 I don't think that's doing what I meant. That really doesn't scale the timeout according to write time. What happens right now is that when the stats file is written at time 0 (starts at zero, write finishes at 100 ms), and a worker asks for the file at 99 ms (i.e. 1ms before the write finishes), it will set the time of the inquiry to last_statrequest and then do this if (last_statwrite < last_statrequest) pgstat_write_statsfile(false); i.e. comparing it to the start of the write. So another write will start right after the file is written out. And over and over. Moreover there's the 'rename' step making the new file invisible for the worker processes, which makes the thing a bit more complicated. What I'm suggesting it that there should be some sort of tracking the write time and then deciding whether the file is fresh enough using 10x that value. So when a file is written in 100 ms, it's be considered OK for the next 900 ms, i.e. 1 sec in total. Sure, we could use 5x or other coefficient, doesn't really matter. 5) splitting the single stat file into multiple pieces - e.g. per database, written separately, so that the autovacuum workers don't need to read all the data even for databases that don't need to be vacuumed. This might be combined with (4). I think this needs to ha
Re: [HACKERS] system_information.triggers & truncate triggers
On 26-09-2012 11:08, Simon Riggs wrote: > I suggest we implement that with some kind of switch/case in the view > definition. > -- parameter can be set in a session and defaults to on SET compliance_information_schema TO off; -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system_information.triggers & truncate triggers
On Wed, Sep 26, 2012 at 10:02 AM, Tom Lane wrote: > Daniel Farina writes: >> On Tue, Sep 25, 2012 at 10:55 PM, Jaime Casanova >> wrote: >>> The definition of information_schema.triggers contains this: >>> -- TRIGGER_TYPE_UPDATE; we intentionally omit TRIGGER_TYPE_TRUNCATE >>> so it seems that we are not showing TRUNCATE triggers intentionally, >>> but that comment fails to explain why > >> Wouldn't it be because TRUNCATE is a PostgreSQL language extension? > > Yeah. The SQL standard specifies the allowed values in that column, > and TRUNCATE is not among them. > > For similar reasons, you won't find exclusion constraints represented > in the information_schema views, and there are some other cases that > I don't recall this early in the morning. > > The point of the information_schema (at least IMHO) is to present > standard-conforming information about standard-conforming database > objects in a standard-conforming way, so that cross-DBMS applications > can rely on what they'll see there. If you are doing anything that's > not described by the SQL standard, you will get at best an incomplete > view of it from the information_schema. In that case you're a lot > better off looking directly at the underlying catalogs. > > (Yes, I'm aware that some other DBMS vendors have a more liberal > interpretation of what standards compliance means in this area.) Let me grouse about this a bit... I appreciate that standards compliance means that information_schema needs to be circumspect as to what it includes. But it is irritating that information_schema provides a representation of (for instance) triggers that, at first, looks nice and clean and somewhat version-independent, only to fall over because there's a class of triggers that it consciously ignores. If I'm wanting to do schema analytics on this (and I do), I'm left debating between painful choices: a) Use information_schema for what it *does* have, and then add in a surprising-looking hack that's pretty version-dependent to draw in the other triggers that it left out b) Ignore the seeming-nice information_schema representation, and construct a version-dependent extraction covering everything that more or less duplicates the work being done by information_schema.triggers. I'd really like to have something like c) Something like information_schema that "takes the standards-conformance gloves off" and gives a nice representation of all the triggers. Make no mistake, I'm not casting aspersions at how pg_trigger was implemented; I have no complaint there, as it's quite fair that the internal representation won't be totally "human-readability-friendly." That is a structure that is continuously accessed by backends, and it is entirely proper to bias implementation to internal considerations. But I'd sure like ways to get at more analytically-friendly representations. A different place where I wound up having to jump through considerable hoops when doing schema analytics was vis-a-vis identifying functions. I need to be able to compare schemas across databases, so oid-based identification of functions is a total non-starter. It appears that the best identification of a function would be based on the combination of schema name, function name, and the concatenation of argument data types. It wasn't terribly difficult to construct that third bit, but it surely would be nice if there was a view capturing it, and possibly even serializing it into a table to enable indexing on it. Performance-wise, function comparisons turned out to be one of the most expensive things I did, specifically because of that mapping surrounding arguments. -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?" -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum stress-testing our system
On 26-09-2012 09:43, Tomas Vondra wrote: > I've been struggling with autovacuum generating a lot of I/O and CPU on some > of our > systems - after a night spent analyzing this behavior, I believe the current > autovacuum accidentally behaves a bit like a stress-test in some corner cases > (but > I may be seriously wrong, after all it was a long night). > It is known that statistic collector doesn't scale for a lot of databases. It wouldn't be a problem if we don't have automatic maintenance (aka autovacuum). > Next, I'm thinking about ways to solve this: > > 1) turning of autovacuum, doing regular VACUUM ANALYZE from cron - certainly > an >option, but it's rather a workaround than a solution and I'm not very fond > of >it. Moreover it fixes only one side of the problem - triggering the > statfile >writes over and over. The file will be written anyway, although not that >frequently. > It solves your problem if you combine scheduled VA with pgstat.stat in a tmpfs. I don't see it as a definitive solution if we want to scale auto maintenance for several hundreds or even thousands databases in a single cluster (Someone could think it is not that common but in hosting scenarios this is true. DBAs don't want to run several VMs or pg servers just to minimize the auto maintenance scalability problem). > 2) tweaking the timer values, especially increasing PGSTAT_RETRY_DELAY and so > on >to consider several seconds to be fresh enough - Would be nice to have this >as a GUC variables, although we can do another private patch on our own. > But >more knobs is not always better. > It doesn't solve the problem. Also it could be a problem for autovacuum (that make assumptions based in those fixed values). > 3) logic detecting the proper PGSTAT_RETRY_DELAY value - based mostly on the > time >it takes to write the file (e.g. 10x the write time or something). > Such adaptive logic would be good only iff it takes a small time fraction to execute. It have to pay attention to the limits. It appears to be a candidate for exploration. > 4) keeping some sort of "dirty flag" in stat entries - and then writing only > info >about objects were modified enough to be eligible for vacuum/analyze (e.g. >increasing number of index scans can't trigger autovacuum while inserting >rows can). Also, I'm not worried about getting a bit older num of index > scans, >so 'clean' records might be written less frequently than 'dirty' ones. > It minimizes your problem but harms collector tools (that want fresh statistics about databases). > 5) splitting the single stat file into multiple pieces - e.g. per database, >written separately, so that the autovacuum workers don't need to read all >the data even for databases that don't need to be vacuumed. This might be >combined with (4). > IMHO that's the definitive solution. It would be one file per database plus a global one. That way, the check would only read the global.stat and process those database that were modified. Also, an in-memory map could store that information to speed up the checks. The only downside I can see is that you will increase the number of opened file descriptors. > Ideas? Objections? Preferred options? > I prefer to attack 3, sort of 4 (explained in 5 -- in-memory map) and 5. Out of curiosity, did you run perf (or some other performance analyzer) to verify if some (stats and/or autovac) functions pop up in the report? -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system_information.triggers & truncate triggers
On 26 September 2012 15:02, Tom Lane wrote: > Daniel Farina writes: >> On Tue, Sep 25, 2012 at 10:55 PM, Jaime Casanova >> wrote: >>> The definition of information_schema.triggers contains this: >>> -- TRIGGER_TYPE_UPDATE; we intentionally omit TRIGGER_TYPE_TRUNCATE >>> so it seems that we are not showing TRUNCATE triggers intentionally, >>> but that comment fails to explain why > >> Wouldn't it be because TRUNCATE is a PostgreSQL language extension? > > Yeah. The SQL standard specifies the allowed values in that column, > and TRUNCATE is not among them. > > For similar reasons, you won't find exclusion constraints represented > in the information_schema views, and there are some other cases that > I don't recall this early in the morning. > > The point of the information_schema (at least IMHO) is to present > standard-conforming information about standard-conforming database > objects in a standard-conforming way, so that cross-DBMS applications > can rely on what they'll see there. If you are doing anything that's > not described by the SQL standard, you will get at best an incomplete > view of it from the information_schema. In that case you're a lot > better off looking directly at the underlying catalogs. > > (Yes, I'm aware that some other DBMS vendors have a more liberal > interpretation of what standards compliance means in this area.) While I understand and even agree with that, I think we also need another view: information schema as a standard way of representing all data, even that which extends the standard. Especially so, since others take the latter view also. I suggest we implement that with some kind of switch/case in the view definition. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum stress-testing our system
On Wed, Sep 26, 2012 at 5:43 AM, Tomas Vondra wrote: > First - our system really is not a "common" one - we do have ~1000 of > databases of > various size, each containing up to several thousands of tables (several > user-defined > tables, the rest serve as caches for a reporting application - yes, it's a > bit weird > design but that's life). This all leads to pgstat.stat significantly larger > than 60 MB. > ... > Now, let's suppose the write takes >10 ms, which is the PGSTAT_RETRY_DELAY > values. > With our current pgstat.stat filesize/num of relations, this is quite > common. > Actually the common write time in our case is ~100 ms, even if we move the > file > into tmpfs. That means that almost all the calls to backend_read_statsfile > (which > happen in all pgstat_fetch_stat_*entry calls) result in continuous stream of > inquiries from the autovacuum workers, writing/reading of the file. I don't think it actually does. What you are missing is the same thing I was missing a few weeks ago when I also looked into something like this. 3962: * We don't recompute min_ts after sleeping, except in the * unlikely case that cur_ts went backwards. That means the file must have been written within 10 ms of when we *first* asked for it. What is generating the endless stream you are seeing is that you have 1000 databases so if naptime is one minute you are vacuuming 16 per second. Since every database gets a new process, that process needs to read the file as it doesn't inherit one. ... > > First, I'm interested in feedback - did I get all the details right, or am I > missing something important? > > Next, I'm thinking about ways to solve this: > > 1) turning of autovacuum, doing regular VACUUM ANALYZE from cron Increasing autovacuum_naptime seems like a far better way to do effectively the same thing. > > 2) tweaking the timer values, especially increasing PGSTAT_RETRY_DELAY and > so on >to consider several seconds to be fresh enough - Would be nice to have > this >as a GUC variables, although we can do another private patch on our own. > But >more knobs is not always better. I think forking it off to to another value would be better. If you are an autovacuum worker which is just starting up and so getting its initial stats, you can tolerate a stats file up to "autovacuum_naptime / 5.0" stale. If you are already started up and are just about to vacuum a table, then keep the staleness at PGSTAT_RETRY_DELAY as it currently is, so as not to redundantly vacuum a table. > > 3) logic detecting the proper PGSTAT_RETRY_DELAY value - based mostly on the > time >it takes to write the file (e.g. 10x the write time or something). This is already in place. > 5) splitting the single stat file into multiple pieces - e.g. per database, >written separately, so that the autovacuum workers don't need to read all >the data even for databases that don't need to be vacuumed. This might be >combined with (4). I think this needs to happen eventually. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum stress-testing our system
Really, as far as autovacuum is concerned, it would be much more useful to be able to reliably detect that a table has been recently vacuumed, without having to request a 10ms-recent pgstat snapshot. That would greatly reduce the amount of time autovac spends on pgstat requests. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] htup header reorganization breaks many extension modules
On 9/26/12 10:07 AM, Tom Lane wrote: > I can't get excited about this either. This isn't the first, or the > last, change that add-on modules can expect to have to make to track > newer Postgres versions. If we allow Peter's complaint to become the > new project policy, we'll never be able to make any header > rearrangements at all, nor change any internal APIs. I'm not saying we can never change anything about the internal headers, but we can make a small effort not to create useless annoyances. That said, could someone clarify the header comments in the new headers? We currently have * htup.h *POSTGRES heap tuple definitions. * htup_details.h *POSTGRES heap tuple header definitions. The names of the headers don't match their documented purpose very much. Is GETSTRUCT a "detail" of the heap tuple definition, or is it related to "tuple headers"? It's not really either, but I guess it is related to tuple headers because you need to know about the headers to get to the stuff past it. When I see headers stuff.h and stuff_details.h, it makes me think that you should only use stuff.h, and stuff_details.h are internal things. But a lot of external modules use GETSTRUCT, so they might get confused. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On 9/25/12 8:48 PM, Andrew Dunstan wrote: > That still leaves the other uses for having well known Oids (or possibly > UUIDs) for non-builtin types (e.g. so Drivers don't have to look them up > in the catalogs, or having issues when types are added to the core.) Here is a problem: If I write an "hstore-ng" extension, I have two options: 1. "Steal" the OID/UUID assigned to hstore so my extension integrates nicely with all the drivers. 2. Don't steal the OID/UUID assigned to hstore and have a very low chance that my extension will ever be supported nicely by a lot of drivers. Neither option is very nice. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.1] 2 bugs with extensions
Hi, After much distractions I've at least been able to do something about that bug report. Marko Kreen writes: > 1) Dumpable sequences are not supported - if sequence is tagged >with pg_catalog.pg_extension_config_dump(), the pg_dump tries >to run COPY on it. I can only reproduce that in 9.1. I first tried in HEAD where pg_dump will just entirely skip the sequence, which is not the right answer either, but at least does not spit out that message. > pg_dump: Error message from server: ERROR: cannot copy from sequence > "batch_id_seq" > pg_dump: The command was: COPY pgq.batch_id_seq TO stdout; Please find attached a patch that fixes it in 9.1, in all classic pg_dump, --data-only and --schema-only. git diff --stat src/bin/pg_dump/pg_dump.c | 68 +++- 1 files changed, 54 insertions(+), 14 deletions(-) Once that's ok for 9.1, I'll get to work on a fix for master, oh and look at what the situation is in 9.2, which I guess is the same as in master actually. > 2) If there is schema with functions, but nothing else, > create extension pgq_coop from 'unpackaged'; > drop extension pgq_coop; > create extension pgq_coop; > ERROR: schema "pgq_coop" already exists >From reading the scripts, it's not clear to me, but it appears that the schema existed before the create from unpackaged then got added to the extension by way of ALTER EXTENSION pgq_coop ADD SCHEMA pgq_coop; Is that true? Can we work out a minimal example to reproduce the bug? (The Makefile in skytools/sql/pgq_coop fails on my OS) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support *** a/src/bin/pg_dump/pg_dump.c --- b/src/bin/pg_dump/pg_dump.c *** *** 181,187 static void dumpTrigger(Archive *fout, TriggerInfo *tginfo); static void dumpTable(Archive *fout, TableInfo *tbinfo); static void dumpTableSchema(Archive *fout, TableInfo *tbinfo); static void dumpAttrDef(Archive *fout, AttrDefInfo *adinfo); ! static void dumpSequence(Archive *fout, TableInfo *tbinfo); static void dumpIndex(Archive *fout, IndxInfo *indxinfo); static void dumpConstraint(Archive *fout, ConstraintInfo *coninfo); static void dumpTableConstraintComment(Archive *fout, ConstraintInfo *coninfo); --- 181,187 static void dumpTable(Archive *fout, TableInfo *tbinfo); static void dumpTableSchema(Archive *fout, TableInfo *tbinfo); static void dumpAttrDef(Archive *fout, AttrDefInfo *adinfo); ! static void dumpSequence(Archive *fout, TableInfo *tbinfo, bool extMember); static void dumpIndex(Archive *fout, IndxInfo *indxinfo); static void dumpConstraint(Archive *fout, ConstraintInfo *coninfo); static void dumpTableConstraintComment(Archive *fout, ConstraintInfo *coninfo); *** *** 1536,1541 dumpTableData(Archive *fout, TableDataInfo *tdinfo) --- 1536,1547 DataDumperPtr dumpFn; char *copyStmt; + if (tbinfo->relkind == RELKIND_SEQUENCE) + { + dumpSequence(fout, tbinfo, true); + return; + } + if (!dump_inserts) { /* Dump/restore using COPY */ *** *** 1604,1609 makeTableDataInfo(TableInfo *tbinfo, bool oids) --- 1610,1630 { TableDataInfo *tdinfo; + /* + * Nothing to do if we already decided to dump the table. This will + * happen for "config" tables. + */ + if (tbinfo->dataObj != NULL) + return; + + /* Skip VIEWs (no data to dump) */ + if (tbinfo->relkind == RELKIND_VIEW) + return; + /* Skip FOREIGN TABLEs (no data to dump) */ + if (tbinfo->relkind == RELKIND_FOREIGN_TABLE) + return; + + /* OK, let's dump it */ tdinfo = (TableDataInfo *) malloc(sizeof(TableDataInfo)); tdinfo->dobj.objType = DO_TABLE_DATA; *** *** 11975,11981 dumpTable(Archive *fout, TableInfo *tbinfo) char *namecopy; if (tbinfo->relkind == RELKIND_SEQUENCE) ! dumpSequence(fout, tbinfo); else if (!dataOnly) dumpTableSchema(fout, tbinfo); --- 11996,12002 char *namecopy; if (tbinfo->relkind == RELKIND_SEQUENCE) ! dumpSequence(fout, tbinfo, false); else if (!dataOnly) dumpTableSchema(fout, tbinfo); *** *** 13118,13124 findLastBuiltinOid_V70(void) } static void ! dumpSequence(Archive *fout, TableInfo *tbinfo) { PGresult *res; char *startv, --- 13139,13145 } static void ! dumpSequence(Archive *fout, TableInfo *tbinfo, bool extMember) { PGresult *res; char *startv, *** *** 13219,13225 dumpSequence(Archive *fout, TableInfo *tbinfo) * * Add a 'SETVAL(seq, last_val, iscalled)' as part of a "data" dump. */ ! if (!dataOnly) { /* * DROP must be fully qualified in case same name appears in --- 13240,13246 * * Add a 'SETVAL(seq, last_val, iscalled)' as part of a "data" dump. */ ! if (!extMember && !dataOnly) { /* * DROP must be fully qualified in case same name app
Re: [HACKERS] htup header reorganization breaks many extension modules
Excerpts from Peter Eisentraut's message of mié sep 26 11:18:51 -0300 2012: > On 9/26/12 10:07 AM, Tom Lane wrote: > > I can't get excited about this either. This isn't the first, or the > > last, change that add-on modules can expect to have to make to track > > newer Postgres versions. If we allow Peter's complaint to become the > > new project policy, we'll never be able to make any header > > rearrangements at all, nor change any internal APIs. > > I'm not saying we can never change anything about the internal headers, > but we can make a small effort not to create useless annoyances. I proposed a possible way out of the problem elsewhere. Please comment on that. > That said, could someone clarify the header comments in the new headers? > We currently have > > * htup.h > *POSTGRES heap tuple definitions. > > * htup_details.h > *POSTGRES heap tuple header definitions. htup.h is what you need if you want to pass tuples around. It's particularly useful for other headers that want to declare their functions as receiving or returning tuples. htup_details.h is what you need if you want to operate on tuples, such as creating them or examining them. I guess those comments aren't all that well thought out; suggestions welcome. > The names of the headers don't match their documented purpose very much. > Is GETSTRUCT a "detail" of the heap tuple definition, or is it related > to "tuple headers"? It's not really either, but I guess it is related > to tuple headers because you need to know about the headers to get to > the stuff past it. > When I see headers stuff.h and stuff_details.h, it makes me think that > you should only use stuff.h, and stuff_details.h are internal things. > But a lot of external modules use GETSTRUCT, so they might get confused. The other proposal was htup_internal.h but that would have been much more indicative of stuff that's supposed to be used only for internal consumption of files in backend/access/heap and such, which is why I stayed away from that name. I think htup_details.h is a good enough compromise. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
Antonin Houska writes: > I'm also implementing an extension where direct access to non-fixed OIDs > (i.e. no catalog cache lookup by name) would be very helpful. I spent some > time thinking about a workaround that makes OID registry unnecessary. > How about the following? > 1. Add a new varlena column to pg_proc catalog table, say > 'ext_types',containing > C array of OIDs. > 2. Let each extension declare requirements like the following in its > configuration > files: > "I expect type at 0-th position of 'ext_types' array." > "I expect type at 1-st position of 'ext_types' array." > etc. I think this just begs the question: how do you specify and how do you know that whatever was found is what you want? Beyond that, nothing in what you said can't be done today by a function that does type name lookups and caches the results internally. And I'd just as soon not burden the function-call infrastructure with more overhead to support something only a small fraction of functions would need. Another point is that server-internal features don't help client-side code, which is actually where most of the pain is AFAICT. We aren't providing any infrastructure that helps clients interpret PQftype() values for non-core types. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On 09/25/2012 08:56 PM, Robert Haas wrote: On Tue, Sep 25, 2012 at 10:23 AM, Tom Lane wrote: Andrew Dunstan writes: Given your previous comments, perhaps we could just start handing out Oids (if there is any demand) numbered, say, 9000 and up. That should keep us well clear of any existing use. No, I think you missed my point entirely: handing out OIDs at the top of the manual assignment range is approximately the worst possible scenario. I foresee having to someday move FirstBootstrapObjectId down to 9000, or 8000, or even less, to cope with growth of the auto-assigned OID set. So we need to keep manually assigned OIDs reasonably compact near the bottom of the range, and it doesn't matter at all whether such OIDs are used internally or reserved for external developers. Nor do I see a need for such reserved OIDs to "look different" from internally-used OIDs. Reserved is reserved. I'm not sure how much anyone cares, but just in case anyone does... it would be mighty useful to EnterpriseDB to have a range of OIDs that are guarantee not to be assigned to anyone else, because we're maintaining a fork of PostgreSQL that regularly merges with the mainline. We're actually likely to get crunched in our fork well before PostgreSQL itself does. There are enough other forks of PostgreSQL out there that there may other people who are in a similar situation, though I am not sure how much we want to cater to people doing such things. That having been said, I can't really see how it would be practical anyway unless we raise the 16384 lower limit for user-assigned OIDs considerably. And I'm not sure how to do that without breaking pg_upgrade. How many would EDB need for it to be useful? I am somewhat opposed to the idea of an OID registry. I can't see why the space of things people want to reserve OIDs for would be only tens rather than thousands. There are surely plenty of extensions that would like to depend on specific other extensions, or on core. If we legislate that hard-coded OIDs are the way to do that, I think we're going to end up with a lot of 'em. Maybe you have a better feel than I do for how many live addon types are out there. Maybe there are hundreds or thousands, but if there are I am blissfully unaware of them. I did like the alternative idea upthread of UUIDs for types which would give them a virtually unlimited space. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] htup header reorganization breaks many extension modules
"Albe Laurenz" writes: > Hitoshi Harada wrote: >> But it's only add #include "access/htup_details.h"? I'd not argue >> it's harmful unless I missed your point. > I guess the point is that you need an #ifdef if you want a module > to be able to build with both 9.3 and lower versions. I can't get excited about this either. This isn't the first, or the last, change that add-on modules can expect to have to make to track newer Postgres versions. If we allow Peter's complaint to become the new project policy, we'll never be able to make any header rearrangements at all, nor change any internal APIs. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system_information.triggers & truncate triggers
Daniel Farina writes: > On Tue, Sep 25, 2012 at 10:55 PM, Jaime Casanova > wrote: >> The definition of information_schema.triggers contains this: >> -- TRIGGER_TYPE_UPDATE; we intentionally omit TRIGGER_TYPE_TRUNCATE >> so it seems that we are not showing TRUNCATE triggers intentionally, >> but that comment fails to explain why > Wouldn't it be because TRUNCATE is a PostgreSQL language extension? Yeah. The SQL standard specifies the allowed values in that column, and TRUNCATE is not among them. For similar reasons, you won't find exclusion constraints represented in the information_schema views, and there are some other cases that I don't recall this early in the morning. The point of the information_schema (at least IMHO) is to present standard-conforming information about standard-conforming database objects in a standard-conforming way, so that cross-DBMS applications can rely on what they'll see there. If you are doing anything that's not described by the SQL standard, you will get at best an incomplete view of it from the information_schema. In that case you're a lot better off looking directly at the underlying catalogs. (Yes, I'm aware that some other DBMS vendors have a more liberal interpretation of what standards compliance means in this area.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doc patch to note which system catalogs have oids
On Mon, Sep 24, 2012 at 9:18 PM, Stephen Frost wrote: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> I think this is fundamentally wrong, or at least misleading, because it >> documents OID as if it were an ordinary column. Somebody who did >> "select * from pg_class" and didn't see any "oid" in the result would >> think the docs were wrong. > > Given that it's been quite some time since we defaulted to including > OIDs in tables, and the high level of confusion that individuals trying > to join pg_class and pg_namespace together go through due to select * > not including the oid column, I wonder if perhaps we should consider > changing that. Might be possible to do for just the catalog tables (to > minimize the risk of breaking poorly-written applications), or provide > a GUC to control including the oid column in select *. > >> It's possible that it's worth expending a boilerplate paragraph in each >> of those pages to say "this catalog has OIDs" (or that it doesn't). >> But this isn't the way. > > I'm afraid I disagree with this. The oid column, in the system > catalog, is user-facing and I like having it included as a column in the > table in the docs, so users know what to use when doing joins. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] htup header reorganization breaks many extension modules
Excerpts from Peter Eisentraut's message of mar sep 25 21:30:59 -0300 2012: > I haven't followed the details of the htup header reorganization, but I > have noticed that a number of external extension modules will be broken > because of the move of GETSTRUCT() and to a lesser extent > heap_getattr(). Of course some #ifdefs can fix that, but it seems > annoying to make everyone do that. Maybe this could be reconsidered to > reduce the impact on other projects. Hmm. My original patch didn't have this problem :-( What it did was to keep htup.h the "everything needed to work on tuples" header; so external modules would have not become broken. I didn't realize this at the time, which is why I didn't argue to keep it that way instead of having the new header contain most innards. I guess we could rename the headers, so that htup.h is what's now called htup_details.h, and htup_basics.h for what's currently htup.h. This would have a lot of fallout in core code, but eliminate impact on external modules. That said, I am not really sure that I want to promise header compatibility forever. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_reorg in core?
On Wednesday, September 26, 2012 02:39:36 PM Michael Paquier wrote: > Do you think it is acceptable to consider that the user has to do the > cleanup of the old or new index himself if there is a failure? The problem I see is that if you want the thing to be efficient you might end up doing step 1) for all/a bunch of indexes, then 2), then In that case you can have loads of invalid indexes around. > You could also reissue the reindex command and avoid an additional command. > When launching a concurrent reindex, it could be possible to check if there > is already an index that has been created to replace the old one that failed > previously. In order to control that, why not adding an additional field in > pg_index? > When creating a new index concurrently, we register in its pg_index entry > the oid of the index that it has to replace. When reissuing the command > after a failure, it is then possible to check if there is already an index > that has been issued by a previous REINDEX CONCURRENT command and based on > the flag values of the old and new indexes it is then possible to replay the > command from the step where it previously failed. I don't really like this idea but we might end up there anyway because we probably need to keep track whether an index is actually only a "replacement" index that shouldn't exist on its own. Otherwise its hard to know which indexes to drop if it failed halfway through. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
I'm also implementing an extension where direct access to non-fixed OIDs (i.e. no catalog cache lookup by name) would be very helpful. I spent some time thinking about a workaround that makes OID registry unnecessary. How about the following? 1. Add a new varlena column to pg_proc catalog table, say 'ext_types',containing C array of OIDs. 2. Let each extension declare requirements like the following in its configuration files: "I expect type at 0-th position of 'ext_types' array." "I expect type at 1-st position of 'ext_types' array." etc. 3. Ensure that CREATE EXTENSION command reads all these type names, finds the appropriate OIDs in pg_type and puts them to the appropriate position in the 'ext_types' column for each function of the new extension (while in-core types would have it set to NULL of course). 4. Implement a macro to get the 0-th, 1-st, etc. value from pg_proc.ext_types (via FMGR). Is there any serious circumstance I forgot or does it seem to be e.g. too invasive? Kind regards, Tony H. On 09/25/2012 12:59 AM, Andrew Dunstan wrote: This rather overdue mail arises out the developer's meeting back in May, where we discussed an item I raised suggesting an Oid registry. The idea came from some difficulties I encountered when I did the backport of the JSON work we did in 9.2 to 9.1, but has wider application. Say someone writes an extension that defines type A. You want to write an extension that takes advantage of that type, but it's difficult if you don't know the type's Oid, and of course currently there is no way of knowing for sure what it is unless it's a builtin type. So the proposal is to have an Oid registry, in which authors could in effect reserve an Oid (or a couple of Oids) for a type. We would guarantee that these Oids would be reserved in just the same way Oids for builtins are reserved, and #define symbolic constants for the reserved Oids. To make that viable, we'd need to extend the CREATE commands for any objects we could reserve Oids for to allow for the specification of the Oids, somewhat like: CREATE TYPE newtype ( ) WITH (TYPEOID = 123456, TYPEARRAYOID = 234567); I'm not sure what objects we would need this for other than types, but CREATE TYPE would be a good start anyway. Another suggestion that was made during the discussion was that we should also reserve a range of Oids for private use, and guarantee that those would never be allocated, somewhat analogously to RFC1918 IP addresses. thoughts? If there is general agreement I want to get working on this fairly soon. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] autovacuum stress-testing our system
Hi, I've been struggling with autovacuum generating a lot of I/O and CPU on some of our systems - after a night spent analyzing this behavior, I believe the current autovacuum accidentally behaves a bit like a stress-test in some corner cases (but I may be seriously wrong, after all it was a long night). First - our system really is not a "common" one - we do have ~1000 of databases of various size, each containing up to several thousands of tables (several user-defined tables, the rest serve as caches for a reporting application - yes, it's a bit weird design but that's life). This all leads to pgstat.stat significantly larger than 60 MB. Now, the two main pieces of information from the pgstat.c are the timer definitions -- pgstat.c : 80 -- #define PGSTAT_STAT_INTERVAL500 /* Minimum time between stats file * updates; in milliseconds. */ #define PGSTAT_RETRY_DELAY 10/* How long to wait between checks for * a new file; in milliseconds. */ #define PGSTAT_MAX_WAIT_TIME1 /* Maximum time to wait for a stats * file update; in milliseconds. */ #define PGSTAT_INQ_INTERVAL 640 /* How often to ping the collector for * a new file; in milliseconds. */ #define PGSTAT_RESTART_INTERVAL 60/* How often to attempt to restart a * failed statistics collector; in * seconds. */ #define PGSTAT_POLL_LOOP_COUNT (PGSTAT_MAX_WAIT_TIME / PGSTAT_RETRY_DELAY) #define PGSTAT_INQ_LOOP_COUNT (PGSTAT_INQ_INTERVAL / PGSTAT_RETRY_DELAY) --- and then this loop (the current HEAD does this a bit differently, but the 9.2 code is a bit readable and suffers the same issue): -- pgstat.c : 3560 /* * Loop until fresh enough stats file is available or we ran out of time. * The stats inquiry message is sent repeatedly in case collector drops * it; but not every single time, as that just swamps the collector. */ for (count = 0; count < PGSTAT_POLL_LOOP_COUNT; count++) { TimestampTz file_ts = 0; CHECK_FOR_INTERRUPTS(); if (pgstat_read_statsfile_timestamp(false, &file_ts) && file_ts >= min_ts) break; /* Not there or too old, so kick the collector and wait a bit */ if ((count % PGSTAT_INQ_LOOP_COUNT) == 0) pgstat_send_inquiry(min_ts); pg_usleep(PGSTAT_RETRY_DELAY * 1000L); } if (count >= PGSTAT_POLL_LOOP_COUNT) elog(WARNING, "pgstat wait timeout"); /* Autovacuum launcher wants stats about all databases */ if (IsAutoVacuumLauncherProcess()) pgStatDBHash = pgstat_read_statsfile(InvalidOid, false); else pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false); --- What this code does it that it checks the statfile, and if it's not stale (the timestamp of the write start is not older than PGSTAT_RETRY_DELAY milliseconds), the loop is terminated and the file is read. Now, let's suppose the write takes >10 ms, which is the PGSTAT_RETRY_DELAY values. With our current pgstat.stat filesize/num of relations, this is quite common. Actually the common write time in our case is ~100 ms, even if we move the file into tmpfs. That means that almost all the calls to backend_read_statsfile (which happen in all pgstat_fetch_stat_*entry calls) result in continuous stream of inquiries from the autovacuum workers, writing/reading of the file. We're not getting 'pgstat wait timeout' though, because it finally gets written before PGSTAT_MAX_WAIT_TIME. By moving the file to a tmpfs we've minimized the I/O impact, but now the collector and autovacuum launcher consume ~75% of CPU (i.e. ~ one core) and do nothing except burning power because the database is almost read-only. Not a good thing in the "green computing" era I guess. First, I'm interested in feedback - did I get all the details right, or am I missing something important? Next, I'm thinking about ways to solve this: 1) turning of autovacuum, doing regular VACUUM ANALYZE from cron - certainly an option, but it's rather a workaround than a solution and I'm not very fond of it. Moreover it fixes only one side of the problem - triggering the statfile writes over and over. The file will be written anyway, although not that frequently. 2) tweaking the timer values, especially increasing
Re: [HACKERS] pg_reorg in core?
On Wed, Sep 26, 2012 at 8:13 PM, Andres Freund wrote: > On Tuesday, September 25, 2012 01:48:34 PM Michael Paquier wrote: > > On Tue, Sep 25, 2012 at 5:55 PM, Andres Freund >wrote: > > > On Tuesday, September 25, 2012 04:37:05 AM Michael Paquier wrote: > > > > On Tue, Sep 25, 2012 at 8:13 AM, Andres Freund < > and...@2ndquadrant.com > > > > > > > >wrote: > > > > Could you clarify what do you mean here by cleanup? > > > > I am afraid I do not get your point here. > > > > > > Sorry, was a bit tired when writing the above. > > > > > > The point is that to work concurrent the CONCURRENT operations > > > commit/start multiple transactions internally. It can be interrupted > > > (user, shutdown, error, > > > crash) and leave transient state behind every time it does so. What I > > > wanted to > > > say is that we need to take care that each of those can easily be > cleaned > > > up > > > afterwards. > > > > Sure, many errors may happen. > > But, in the case of CREATE INDEX CONCURRENTLY, there is no clean up > method > > implemented as far as I know (might be missing something though). Isn't > an > > index only considered as invalid in case of failure for concurrent > creation? > Well, you can DROP or REINDEX the invalid index. > > There are several scenarios where you can get invalid indexes. Unique > violations, postgres restarts, aborted index creation... > > > In the case of REINDEX it would be essential to create such a cleanup > > mechanism as I cannot imagine a production database with an index that > has > > been marked as invalid due to a concurrent reindex failure, by assuming > here, > > of course, that REINDEX CONCURRENTLY would use the same level of process > > error as CREATE INDEX CONCURRENTLY. > Not sure what youre getting at? > I just meant that when CREATE INDEX CONCURRENTLY fails, the index created is considered as invalid, so it cannot be used by planner. Based on what you told before: 1) build new index with indisready = false newindex.indisready = true wait 2) newindex.indisvalid = true wait 3) swap(oldindex.relfilenode, newindex.relfilenode) oldindex.indisvalid = false wait 4) oldindex.indisready = false wait drop new index with old relfilenode If the reindex fails at step 1 or 2, the new index is not usable so the relation will finish with an index which is not valid. If it fails at step 4, the old index is invalid. If it fails at step 3, both indexes are valid and both are usable for given relation. Do you think it is acceptable to consider that the user has to do the cleanup of the old or new index himself if there is a failure? > > One of the possible cleanup mechanisms I got on top of my head is a > > callback at transaction abort, each callback would need to be different > for > > each subtransaction used at during the concurrent operation. > > In case the callback itself fails, well the old and/or new indexes become > > invalid. > Thats not going to work. E.g. the session might have been aborted or such. > Also, there is not much you can do from an callback at transaction end as > you > cannot do catalog modifications. > > I was thinking of REINDEX CONCURRENTLY CONTINUE or something vaguely > similar. > You could also reissue the reindex command and avoid an additional command. When launching a concurrent reindex, it could be possible to check if there is already an index that has been created to replace the old one that failed previously. In order to control that, why not adding an additional field in pg_index? When creating a new index concurrently, we register in its pg_index entry the oid of the index that it has to replace. When reissuing the command after a failure, it is then possible to check if there is already an index that has been issued by a previous REINDEX CONCURRENT command and based on the flag values of the old and new indexes it is then possible to replay the command from the step where it previously failed. -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] Enum binary access
Hi Merlin, thanks, your code works perfectly. Cheers, Petr On 10.9.2012 16:33, Merlin Moncure wrote: On Mon, Sep 10, 2012 at 8:42 AM, Petr Chmelar wrote: Hi there, we tried to create the libpqtypes enum binary send but it doesn't work: // register types PGregisterType user_def[] = { {"seqtype", enum_put, enum_get} }; PQregisterTypes(connector->getConn(), PQT_USERDEFINED, user_def, 1, 0); // enum_put throws format error int enum_put (PGtypeArgs *args ) { char *val = va_arg(args->ap, char *); char *out = NULL; int vallen = 0, len = 0, oid = 0; float sortorder = 0.0; if (!args || !val) return 0; /* expand buffer enough */ vallen = strlen(val); len = sizeof(int) + sizeof(float) + (vallen * sizeof(char)); if (args->put.expandBuffer(args, len) == -1) return -1; /* put header (oid, sortorder) and value */ out = args->put.out; memcpy(out, &oid, sizeof(int)); out += sizeof(int); memcpy(out, &sortorder, sizeof(float)); out += sizeof(float); memcpy(out, val, vallen); return len; } // enum_get (FYI, get works OK) int enum_get (PGtypeArgs *args) { char *val = PQgetvalue(args->get.result, args->get.tup_num, args->get.field_num); int len = PQgetlength(args->get.result, args->get.tup_num, args->get.field_num); char **result = va_arg(args->ap, char **); *result = (char *) PQresultAlloc((PGresult *) args->get.result, len * sizeof(char)); memcpy(*result, val, len * sizeof(char)); return 0; } Postgres doesn't accept enum sent like this and throws format error. This should be used as a prototype for derived types. There is no real "enum" named type. Libpqypes doesn't seem to provide simplified binary manipulation for enum types. What should we do, please? Can you fix it? I think there is more people that need access enum types in binary mode. I was able to get it to work what I did: *) your 'get' routine should probably be allocating a terminating byte. In binary situations, PQgetlength does not return the length of the null terminator. *) backend binary format for enums is just the label text string both on get and put side. So your putting the oid and sort order was breaking the put side. Removed that and everything worked. *) see (very messy quickly written) code below: #include "libpq-fe.h" #include "libpqtypes.h" #include "string.h" // enum_put throws format error int enum_put (PGtypeArgs *args ) { char *val = va_arg(args->ap, char *); char *out = NULL; int vallen = 0, len = 0, oid = 0; float sortorder = 0.0; if (!args || !val) return 0; /* expand buffer enough */ vallen = strlen(val); if (args->put.expandBuffer(args, len) == -1) return -1; out = args->put.out; memcpy(out, val, vallen); return len; } // enum_get (FYI, get works OK) int enum_get (PGtypeArgs *args) { char *val = PQgetvalue(args->get.result, args->get.tup_num, args->get.field_num); int len = PQgetlength(args->get.result, args->get.tup_num, args->get.field_num) + 1; char **result = va_arg(args->ap, char **); *result = (char *) PQresultAlloc((PGresult *) args->get.result, len * sizeof(char)); memcpy(*result, val, len * sizeof(char)); result[len] = 0; return 0; } int main() { PGtext t = "b"; PGconn *conn = PQconnectdb("port=5492 host=localhost"); PQinitTypes(conn); PGregisterType user_def[] = { {"e", enum_put, enum_get} }; if(!PQregisterTypes(conn, PQT_USERDEFINED, user_def, 1, 0)) fprintf(stderr, "*ERROR: %s\n", PQgeterror()); PGresult *res = PQexecf(conn, "select %e", t); if(!res) fprintf(stderr, "*ERROR: %s\n", PQgeterror()); if (!PQgetf(res, 0, "%e", 0, &t)) { fprintf(stderr, "*ERROR: %s\n", PQgeterror()); } printf("%s\n", t); PQclear(res); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system_information.triggers & truncate triggers
On 09/26/2012 03:08 AM, Daniel Farina wrote: On Tue, Sep 25, 2012 at 10:55 PM, Jaime Casanova wrote: On Wed, Sep 26, 2012 at 12:17 AM, Daymel Bonne Solís wrote: Hello hackers: I need a list of all triggers created in my database, but the view system_information.triggers does not show truncate triggers, but it does for insert, update and delete triggers. The same problem is found in versions 9.1 and 9.2. The definition of information_schema.triggers contains this: """ FROM pg_namespace n, pg_class c, pg_trigger t, -- hard-wired refs to TRIGGER_TYPE_INSERT, TRIGGER_TYPE_DELETE, -- TRIGGER_TYPE_UPDATE; we intentionally omit TRIGGER_TYPE_TRUNCATE (VALUES (4, 'INSERT'), (8, 'DELETE'), (16, 'UPDATE')) AS em (num, text) """ so it seems that we are not showing TRUNCATE triggers intentionally, but that comment fails to explain why Wouldn't it be because TRUNCATE is a PostgreSQL language extension? I think this case should be explicitly stated in the documentation. Regards. 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_reorg in core?
On Tuesday, September 25, 2012 01:48:34 PM Michael Paquier wrote: > On Tue, Sep 25, 2012 at 5:55 PM, Andres Freund wrote: > > On Tuesday, September 25, 2012 04:37:05 AM Michael Paquier wrote: > > > On Tue, Sep 25, 2012 at 8:13 AM, Andres Freund > > > > >wrote: > > > Could you clarify what do you mean here by cleanup? > > > I am afraid I do not get your point here. > > > > Sorry, was a bit tired when writing the above. > > > > The point is that to work concurrent the CONCURRENT operations > > commit/start multiple transactions internally. It can be interrupted > > (user, shutdown, error, > > crash) and leave transient state behind every time it does so. What I > > wanted to > > say is that we need to take care that each of those can easily be cleaned > > up > > afterwards. > > Sure, many errors may happen. > But, in the case of CREATE INDEX CONCURRENTLY, there is no clean up method > implemented as far as I know (might be missing something though). Isn't an > index only considered as invalid in case of failure for concurrent creation? Well, you can DROP or REINDEX the invalid index. There are several scenarios where you can get invalid indexes. Unique violations, postgres restarts, aborted index creation... > In the case of REINDEX it would be essential to create such a cleanup > mechanism as I cannot imagine a production database with an index that has > been marked as invalid due to a concurrent reindex failure, by assuming here, > of course, that REINDEX CONCURRENTLY would use the same level of process > error as CREATE INDEX CONCURRENTLY. Not sure what youre getting at? > One of the possible cleanup mechanisms I got on top of my head is a > callback at transaction abort, each callback would need to be different for > each subtransaction used at during the concurrent operation. > In case the callback itself fails, well the old and/or new indexes become > invalid. Thats not going to work. E.g. the session might have been aborted or such. Also, there is not much you can do from an callback at transaction end as you cannot do catalog modifications. I was thinking of REINDEX CONCURRENTLY CONTINUE or something vaguely similar. > > > > 2. no support for concurrent on system tables (not easy for shared > > > > catalogs) > > > > > > Doesn't this exclude all the tables that are in the schema catalog? > > > > No. Only SELECT array_to_string(array_agg(relname), ', ') FROM pg_class > > WHERE relisshared AND relkind = 'r'; > > their toast tables and their indexes are shared. The problem is that for > > those you cannot create a separate index and let it update concurrently > > because you cannot write into each databases pg_class/pg_index. > Yes indeed, I didn't think about things that are shared among databases. > Blocking that is pretty simple, only a matter of places checked. Its just a bit sad to make the thing not really appear lockless ;) Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
You're right, REINDEX was not done. I've stopped the VACUUM, did a proper server restart (pg_ctl -m fast -w restart) and will work on rebuilding relations. Seems like I have another issue with a bunch of bloated tables on my way also. Thanks for the support. 2012/9/26 Andres Freund : >> Last entry from the 9.1.6 recommended VACUUM (FREEZE, VERBOSE, ANALYZE) > It recommended doing a REINDEX first though? I guess you didn't do that? > > ... > > I guess you cannot cancel the vacuum? Last time it was in a cycle without > checking interrupts inbetween. > > Can you restart the server? -- Victor Y. Yegorov -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] htup header reorganization breaks many extension modules
On Wed, Sep 26, 2012 at 4:14 PM, Albe Laurenz wrote: > Hitoshi Harada wrote: > > On Tue, Sep 25, 2012 at 5:30 PM, Peter Eisentraut > wrote: > >> I haven't followed the details of the htup header reorganization, but > I > >> have noticed that a number of external extension modules will be > broken > >> because of the move of GETSTRUCT() and to a lesser extent > >> heap_getattr(). Of course some #ifdefs can fix that, but it seems > >> annoying to make everyone do that. Maybe this could be reconsidered > to > >> reduce the impact on other projects. > > > But it's only add #include "access/htup_details.h"? I'd not argue > > it's harmful unless I missed your point. > > I guess the point is that you need an #ifdef if you want a module > to be able to build with both 9.3 and lower versions. > > Otherwise the compiler will complain about the missing include > file on older versions. > The modules of Postgres depend on the core and not the opposite, so isn't it the responsability of the maintainers of the modules to insure that what they make is still compilable with postgres? This can be simply fixed by providing, as mentionned, ifdefs controlled by PG_VERSION_NUM including htup_details.h, so the correction effort is not that much... -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] Oid registry
On 09/26/2012 02:48 AM, Andrew Dunstan wrote: On 09/25/2012 08:35 PM, Peter Eisentraut wrote: On Tue, 2012-09-25 at 18:22 -0400, Tom Lane wrote: Actually, after reading another message you sent, I thought you were going to respond that your proposed transforms feature would cover it. I had thought about this some time ago, but it's clearer to think of casts as associating two types, versus transforms associating a type and a language. JSON and XML tend to be types. OK, I think this solves the object_to_json problem after all - we'll look for a cast to json and if it's not there use the string form of the object. Nice. This is solved then :) Would it be possible to also use the cast mechanism to do anyarray-to-json casts as parallel spelling for array_to_json() and record-to-json cast for row_to_json() btw, is anybody currently working on also going the opposite way, that is loading rows/records from json ? That still leaves the other uses for having well known Oids (or possibly UUIDs) for non-builtin types (e.g. so Drivers don't have to look them up in the catalogs, or having issues when types are added to the core.) One way to solve this would be to pass non-system oids to clients as their names. this would need a change in protocol. Or we could make it so that server sends a special record with typename->typeoid mappings at first use of non-system type. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
Hi, On Wednesday, September 26, 2012 07:57:06 AM Виктор Егоров wrote: > I'm afraid I'm exactly in this situation now. :( > Last entry from the 9.1.6 recommended VACUUM (FREEZE, VERBOSE, ANALYZE) It recommended doing a REINDEX first though? I guess you didn't do that? > was: INFO: "meta_version_chunks": found 55363 removable, 32566245 > nonremovable row versions in 450292 out of 450292 pages > DETAIL: 0 dead row versions cannot be removed yet. > There were 588315 unused item pointers. > 0 pages are entirely empty. > CPU 2.44s/5.77u sec elapsed 2150.18 sec. > INFO: vacuuming "pg_toast.pg_toast_16582" > > And here're are the locks held by the VACCUM backend: > select > oid,relname,relkind,relpages,reltuples::numeric(15,0),reltoastrelid,reltoas > tidxid from pg_class > where oid in (select relation from pg_locks where pid = 1380); > oid | relname| relkind | relpages | reltuples | > reltoastrelid | reltoastidxid > ---+--+-+--+---+--- > +--- 16585 | pg_toast_16582 | t | 16460004 | > 58161600 | > 0 | 16587 > 16587 | pg_toast_16582_index | i | 188469 | 58161600 | > 0 | 0 > 16582 | meta_version_chunks | r | 450292 | 32566200 | > 16585 | 0 > > I will not touch anything and would like to get some recommendations on how > to proceed. On Wednesday, September 26, 2012 08:12:37 AM Виктор Егоров wrote: > Forget to mention, that: > - VACUUM is running on the master; > - current state is unchanged for 20 hours. I guess you cannot cancel the vacuum? Last time it was in a cycle without checking interrupts inbetween. Can you restart the server? Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] htup header reorganization breaks many extension modules
Hitoshi Harada wrote: > On Tue, Sep 25, 2012 at 5:30 PM, Peter Eisentraut wrote: >> I haven't followed the details of the htup header reorganization, but I >> have noticed that a number of external extension modules will be broken >> because of the move of GETSTRUCT() and to a lesser extent >> heap_getattr(). Of course some #ifdefs can fix that, but it seems >> annoying to make everyone do that. Maybe this could be reconsidered to >> reduce the impact on other projects. > But it's only add #include "access/htup_details.h"? I'd not argue > it's harmful unless I missed your point. I guess the point is that you need an #ifdef if you want a module to be able to build with both 9.3 and lower versions. Otherwise the compiler will complain about the missing include file on older versions. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system_information.triggers & truncate triggers
On Tue, Sep 25, 2012 at 10:55 PM, Jaime Casanova wrote: > On Wed, Sep 26, 2012 at 12:17 AM, Daymel Bonne Solís wrote: >> Hello hackers: >> >> I need a list of all triggers created in my database, but the view >> system_information.triggers does not show truncate triggers, but it does for >> insert, update and delete triggers. >> >> The same problem is found in versions 9.1 and 9.2. >> > > The definition of information_schema.triggers contains this: > """ > FROM pg_namespace n, pg_class c, pg_trigger t, > -- hard-wired refs to TRIGGER_TYPE_INSERT, TRIGGER_TYPE_DELETE, > -- TRIGGER_TYPE_UPDATE; we intentionally omit TRIGGER_TYPE_TRUNCATE > (VALUES (4, 'INSERT'), > (8, 'DELETE'), > (16, 'UPDATE')) AS em (num, text) > """ > > so it seems that we are not showing TRUNCATE triggers intentionally, > but that comment fails to explain why Wouldn't it be because TRUNCATE is a PostgreSQL language extension? -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers