Re: [HACKERS] Transaction-controlled robustness for replication
Hi, Robert Hodges wrote: I like Simon’s logical vs. physical terminology So far, it seems to mainly have caused confusion (physical replication, but logical application? logical replication using WAL shipping?). At least I still prefer the more meaningful and descriptive terms, like log shipping, statement based replication or row based replication. But maybe, what Simon is about to propose just doesn't fit into any of those categories. I have a similar problem with Postgres-R, which is somewhere in between synchronous and asynchronous. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction-controlled robustness for replication
Hi, Alvaro Herrera wrote: Actually I think the idea here is to take certain WAL records, translate them into portable constructs, ship them, At which point it clearly shouldn't be called a WAL shipping method. What would it have to do with the WAL at all, then? Why translate from WAL records at all, better use the real tuples right away. (Almost needless to say that here, but obviously Postgres-R does it that way). So far, Simon really seems to mean WAL shipping: it allows WAL to be used as the replication transport, see [1]. Regards Markus Wanner [1]: mail to -hackers from Simon, Subject: Plans for 8.4: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01010.php -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction-controlled robustness for replication
On Tue, 2008-08-12 at 13:33 -0400, Bruce Momjian wrote: Simon Riggs wrote: with values of: nothing:have network traffic send WAL as needed netflush: wait for flush of WAL network packets to slave process:wait for slave to process WAL traffic and optionally fsync Suggest async syncnet syncdisk I think the first two are fine, but 'syncdisk' might be wrong if the slave has 'synchronous_commit = off'. Any ideas? Yes, synchronous_commit can be set in the postgresql.conf, but its great advantage is it is a userset parameter. The main point of the post is that the parameter would be transaction controlled, so *must* be set in the transaction and thus *must* be set on the master. Otherwise the capability is not available in the way I am describing. Oh, so synchronous_commit would not control WAL sync on the slave? What about our fsync parameter? Because the slave is read-only, I saw no disadvantage of setting synchronous_commit to off in postgresql.conf on the slave. The setting of synchronous_commit will be important if the standby becomes the primary. I can see many cases where we might want syncnet mode (i.e. no fsync of WAL data to disk on standby) and yet want synchronous_commit=on when it becomes primary. So if we were to use same parameters it would be confusing. synchronous_commit applies to transaction commits. The code path would be completely different here, so having parameter passed as an info byte from master will not cause code structure problems or performance problems. OK, I was just trying to simplify it. I understood why you've had those thoughts and commend the lateral thinking. I just don't think that on this occasion we've discovered any better ways of doing it. The big problem with an async slave is that not only would you have lost data in a failover, but the database might be inconsistent, like fsync = off, which is something I think we want to try to avoid, which is why I was suggesting synchronous_commit = off. Or were you thinking of always doing fsync on the slave, no matter what. I am worried the slave might not be able to keep up (being single-threaded) and therefore we should allow a way to async commit on the slave. Bit confused here. I've not said I want always async, neither have I said I want always sync. The main thing is we agree there will be 3 settings, including two variants of synchronous replication one fairly safe and one ultra safe. For the ultra safe mode we really need to see how synch replication will work before we comment on where we might introduce fsyncs. I'm presuming that incoming WAL will be written to WAL files (and optionally fsynced). You might be talking about applying WAL records to the database and then fsyncing them, but we do need to allow for crash recovery of the standby server, so the data must be synced to WAL files before it is synced to database. Certainly if the master is async sending the data, there is no need to do a synchronous_commit on the slave. Agreed -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] temporary statistics option at initdb time
Tom Lane wrote: Decibel! [EMAIL PROTECTED] writes: I disagree. While we don't guarantee stats are absolutely up-to-date, or atomic I don't think that gives license for them to just magically not exist sometimes. Would it really be that hard to have the system copy the file out before telling all the other backends of the change? Well, there is no (zero, zilch, nada) use-case for changing this setting on the fly. Why not make it a frozen at postmaster start GUC? Seems like that gets all the functionality needed and most of the ease of use. Oh, there is a use-case. If you run your system and then only afterwards realize the I/O from the stats file is high enough to be an issue, and want to change it. That said, I'm not sure the use-case is anywhere near common enough to put a lot of code into it. But I can certainly look at making it a startup GUC. As you say, that'll solve *most* of the cases. //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replay attack of query cancel
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Andrew Gierth wrote: That's easily solved: when the client wants to do a cancel, have it send, in place of the actual cancel key, an integer N and the value HMAC(k,N) where k is the cancel key. Replay is prevented by requiring the value of N to be strictly greater than any previous value successfully used for this session. (Since we already have md5 code, HMAC-MD5 would be the obvious choice.) I like this approach. It's not a bad idea, if we are willing to change the protocol. If we don't touch the protocol version, we could in theory at least backpatch this as a fix for those who are really concerned about this issue. Huh? How can you argue this isn't a protocol change? Um. By looking at it only from the backend perspective? *blush* [ thinks for a bit... ] You could make it a change in the cancel protocol, which is to some extent independent of the main FE/BE protocol. The problem is: how can the client know whether it's okay to use this new protocol for cancel? Yeah, that's the point that will require a protocol bump, I think. Since there is no response to the cancel packet, we can't even do things like sending in a magic key and look at the response (which would be a rather ugly hack, but doable if we had a success/fail response to the cancel packet). I guess bumping the protocol to 3.1 pretty much kills any chance for a backpatch though :( Since a new libpq would no longer be able to talk to an old server, if I remember the logic correctly? //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction-controlled robustness for replication
Hi, Robert Hodges wrote: Part of this is semantics—I like Simon’s logical vs. physical terminology because it distinguishes neatly between replication that copies implementation down to OIDs etc. and replication that copies data content including schema changes but not implementation. So far, these terms have mostly caused confusion for me: logical replication using WAL shipping, physical replication, but logical application... As Simon didn't explain in more details, what he has in mind, we all have our own and quite different interpretations. These terms obviously haven't helped to clarify the issue until now. It seems a noble goal get both to work well, as they are quite complementary. Agreed. There are various ways to get information to recapitulate SQL, but piggy-backing off WAL record generation has a lot of advantages. You at least have the data structures and don’t have to reverse-engineer log information on disk. Of the multiple ways to build capable logical replication solutions, this seems to involve the least effort. We even have the real tuple, which is about the closest you can get to being a logical representation. Using that clearly requires less efforts than converting a WAL record back to a logical tuple. For example, it allows the optimization of sending only differences to the old tuple for UPDATES, instead of always sending full tuples - see Postgres-R for a partially working implementation. My company is currently heads down building a solution for Oracle based on reading REDO log files. It requires a master of Oracle dark arts to decode them and is also purely asynchronous. That sounds pretty challenging. Good luck! Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SeqScan costs
On Tue, 2008-08-12 at 23:22 -0400, Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: On Tue, 2008-08-12 at 15:46 -0400, Tom Lane wrote: This is only going to matter for a table of 1 block (or at least very few blocks), and for such a table it's highly likely that it's in RAM anyway. So I'm unconvinced that the proposed change represents a better model of reality. I think the first block of a sequential scan is clearly a random access. If that doesn't represent reality well then perhaps we need to tackle both problems together. The point I was trying to make (evidently not too well) is that fooling around with fundamental aspects of the cost models is not something that should be done without any evidence. We've spent ten years getting the system to behave reasonably well with the current models, and it's quite possible that changing them to be more accurate according to a five-minute analysis is going to make things markedly worse overall. I'm not necessarily opposed to making this change --- it does sound kinda plausible --- but I want to see some hard evidence that it does more good than harm before we put it in. psql -f seq.sql -v numblocks=2 -v pkval=Anything -v filler=Varies When numblocks=2 I consistently see that an index scan is actually faster than a seqscan, yet the planner chooses a seqscan in all cases. This is true for any value of pkval and values of filler up to 4-500 bytes. We already take into account the length of rows because we estimate the CPU costs per row not per block. That is not what I wish to change. This same situation occurs for all small tables. What I conclude is that the disk cost swamps the CPU costs and so we end up with a seq scan when we really want an index scan. There are two ways of looking at this * we work out a complex scheme for knowing when to remove disk costs * we realise that the disk cost is actually the same on the *first* block whether we are in memory or on disk. If we take the second way, then we have a small but crucial correction factor that produces better plans in most cases on small tables. Doing it this way allows us to not worry about the cacheing, but just have a scheme that balances the access costs better so that although they are still present in the total cost the final plan choice is less dependent upon the disk cost and more dependent upon the CPU costs. This analysis is the result of experience, then measurement, not theory. I've been looking for an easy and justifiable way to nudge the cost factors so that they work better for small tables. run_cost += random_page_cost + seq_page_cost * (baserel-pages - 1); People lower random_page_cost because we're not doing a good job estimating how much of a table is in cache. Agreed, the elephant in the room is that we lack enough data to model caching effects with any degree of realism. I'm specifically talking about a proposal that works whether or not the first block of the table is in cache, because I see a problem with small table access. I'm not suggesting that we model cacheing effects (though we may choose to later). If you did, you might need to consider cross-statement effects such as the likelihood that a UPDATE .. WHERE CURRENT OF CURSOR is more likely to find the block in cache, or other effects such as certain MFVs might actually be more likely to be in cache than non-MFVs and so index scans against them are actually more preferable than it might appear. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replay attack of query cancel
Tom Lane wrote: [ thinks for a bit... ] You could make it a change in the cancel protocol, which is to some extent independent of the main FE/BE protocol. The problem is: how can the client know whether it's okay to use this new protocol for cancel? Two options: a. Send two cancelkeys in rapid succession at session startup, whereas the first one is 0 or something. The client can detect the first special cancelkey and then knows that the connection supports cancelmethod 2. b. At sessionstartup, advertise a new runtimeparameter: cancelmethod=plainkey,hmaccoded which the client can then chose from. I'd prefer b over a. -- Sincerely, Stephen R. van den Berg. And now for something *completely* different! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replay attack of query cancel
Magnus Hagander [EMAIL PROTECTED] writes: Yeah, that's the point that will require a protocol bump, I think. Since there is no response to the cancel packet, we can't even do things like sending in a magic key and look at the response (which would be a rather ugly hack, but doable if we had a success/fail response to the cancel packet). From the server point of view we could accept either kind of cancel message for the first cancel message and set a variable saying which to expect from there forward. If the first cancel message is an old-style message then we always expect old-style messages. If it's a new-style message then we require new-style messages and keep track of the counter to require a monotically increasing counter. From the client point-of-view we have no way to know if the server is going to accept new-style cancel messages though. We could try sending the new-style message and see if we get an error (do we get an error if you send an invalid cancel message?). We could have the server indicate it's the new protocol by sending the initial cancel key twice. If the client sees more than one cancel key it automatically switches to new-style cancel messages. Or we could just bump the protocol version. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] C Extension woes
Hello all, I am trying to write an extension in C that returns a simple environment variable. The code compiles without any complaint or warning, and it loads fine into the database, however, when I run the function, I get disconnected from the server. Here is my C code: #include postgres.h #include fmgr.h PG_MODULE_MAGIC; #include stdio.h #include stdlib.h PG_FUNCTION_INFO_V1(pl_masterkey); text * pl_masterkey(PG_FUNCTION_ARGS) { char *e_var = getenv(PGMASTERKEY); size_t length = VARSIZE(e_var) - VARHDRSZ; text * mkey = (text *) palloc(length); VARATT_SIZEP(mkey) = length; memcpy(VARDATA(mkey), e_var, length); return mkey; } And here is the SQL I use to create the function in PostgreSQL: CREATE FUNCTION pl_masterkey() RETURNS text AS 'pl_masterkey', 'pl_masterkey' LANGUAGE C STRICT; And the results: select pl_masterkey(); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. ! Thanks ahead of time for any and all help. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] C Extension woes
Tim Hawes wrote: Hello all, I am trying to write an extension in C that returns a simple environment variable. The code compiles without any complaint or warning, and it loads fine into the database, however, when I run the function, I get disconnected from the server. Here is my C code: #include postgres.h #include fmgr.h PG_MODULE_MAGIC; #include stdio.h #include stdlib.h PG_FUNCTION_INFO_V1(pl_masterkey); text * pl_masterkey(PG_FUNCTION_ARGS) { char *e_var = getenv(PGMASTERKEY); size_t length = VARSIZE(e_var) - VARHDRSZ; text * mkey = (text *) palloc(length); VARATT_SIZEP(mkey) = length; memcpy(VARDATA(mkey), e_var, length); return mkey; } Oh, you confused a lot of things. You need something like Datum pl_masterkey(PG_FUNCTION_ARGS) { char *e_var = getenv(PGMASTERKEY); PG_RETURN_TEXT_P(cstring_to_text(e_var)); } You don't need to mess with anything varlena-related (liek VARSIZE), it's all handled for you. Also, read up on how to declare user-defined C functions in Postgres (they always need to return Datum). Cheers, Jan -- Jan Urbanski GPG key ID: E583D7D2 ouden estin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replay attack of query cancel
Gregory Stark wrote: Magnus Hagander [EMAIL PROTECTED] writes: Yeah, that's the point that will require a protocol bump, I think. Since there is no response to the cancel packet, we can't even do things like sending in a magic key and look at the response (which would be a rather ugly hack, but doable if we had a success/fail response to the cancel packet). From the server point of view we could accept either kind of cancel message for the first cancel message and set a variable saying which to expect from there forward. If the first cancel message is an old-style message then we always expect old-style messages. If it's a new-style message then we require new-style messages and keep track of the counter to require a monotically increasing counter. From the client point-of-view we have no way to know if the server is going to accept new-style cancel messages though. We could try sending the new-style message and see if we get an error (do we get an error if you send an invalid cancel message?). No, that is the point I made above - we don't respond to the cancel message *at all*. We could have the server indicate it's the new protocol by sending the initial cancel key twice. If the client sees more than one cancel key it automatically switches to new-style cancel messages. That will still break things like JDBC I think - they only expect one cancel message, and then move on to expect other things. Or we could just bump the protocol version. Yeah, but that would kill backwards compatibility in that the new libpq could no longer talk to old servers. What would work is using a parameter field, per Stephen's suggestion elsewhere in the thread. Older libpq versions should just ignore the parameter if they don't know what it is. Question is, is that too ugly a workaround, since we'll need to keep it around forever? (We have special handling of a few other parameters already, so maybe not?) //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] C Extension woes
Tim Hawes wrote: text * pl_masterkey(PG_FUNCTION_ARGS) { char *e_var = getenv(PGMASTERKEY); size_t length = VARSIZE(e_var) - VARHDRSZ; The VARSIZE macro is for variable length structures, like a text or bytea which contains a length and data member. You are using this macro on a regular C string e_var. Try this instead: size_t length = e_var != NULL ? strlen(e_var) : 0; -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Patch: propose to include 3 new functions into intarray and intagg
Hello. Here are these functions with detailed documentation: http://en.dklab.ru/lib/dklab_postgresql_patch/ intagg.int_array_append_aggregate(int[]): fast merge arrays into one large list intarray._int_group_count_sort(int[], bool): frequency-based sorting intarray.bidx(int[], int): binary search in a sorted array Tested for about a year on a real PostgreSQL cluster (10 machines, Slony replication) under a heavy load (millions of requests). No crash nor memory problem detected during a year, so I suppose these functions are well-tested. What do you think about that?
Re: [HACKERS] Transaction-controlled robustness for replication
On Wed, 2008-08-13 at 11:27 +0200, Markus Wanner wrote: Hi, Robert Hodges wrote: Part of this is semantics—I like Simon’s logical vs. physical terminology because it distinguishes neatly between replication that copies implementation down to OIDs etc. and replication that copies data content including schema changes but not implementation. So far, these terms have mostly caused confusion for me: logical replication using WAL shipping, physical replication, but logical application... As Simon didn't explain in more details, what he has in mind, we all have our own and quite different interpretations. These terms obviously haven't helped to clarify the issue until now. Classification of Replication Techniques --- We can consider that there are two stages to replication * transfer or shipping of data to second node(s) * apply mechanism Apply mechanism can be either logical, where we execute SQL, or physical, where we bypass the SQL layer and do this at lower level. It is possible to have 1. logical shipping, logical apply 2. physical shipping, logical apply 3. physical shipping, physical apply Mammoth and Postgres-R are both type 1 replication systems, since they stream data to second node in a form that makes SQL reconstruction easier. Slony is also type 1, using triggers. So we have another classification 1 Logical data shipping a) Trigger based b) Additional internals based approaches Warm Standby log shipping is type 3. We ship the WAL and apply it directly. Easy and good. Type 2 is where you ship the WAL (efficient) then use it to reconstruct SQL (flexible) and then apply that to other nodes. It is somewhat harder than type 1, but requires less infrastructure (IMHO). Definitely requires less data shipping from Primary node, so very possibly more efficient. [Tom is absolutely right to say this is impossible. Currently, it is, but that's why we develop]. 1a can't be synchronous by definition, but 1b and other approaches can be. Otherwise above definitions not relevant to sync/async capability. Note that DRBD is also a type 3 system, but uses filesystem level physical data. Since this is outside of control of Postgres, I discount this approach because it cannot deliver transaction-controlled synch replication. Previously, most RDBMS vendors supported type 1a) systems. They have now moved to type 2 and 3 systems. Both DB2 and Oracle support a type 2 *and* a type 3 replication system. The reasons they do this are valid for us also, so I suggest that we do the same. So for me, it is not about whether we do type 2 or type 3, I think we should do both. Doing both may not be possible in this release: I have one potential sponsor for type 2, and have made public call for work on query access to type 3 systems. I recognise that some people that like type 3 do not like type 2 and possibly vice versa. If (not when) I do work on type 2 systems it will be only to provide that as a transport option for other technologies. Not as a full-blown replication system. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] temporary statistics option at initdb time
Magnus Hagander wrote: Tom Lane wrote: Decibel! [EMAIL PROTECTED] writes: I disagree. While we don't guarantee stats are absolutely up-to-date, or atomic I don't think that gives license for them to just magically not exist sometimes. Would it really be that hard to have the system copy the file out before telling all the other backends of the change? Well, there is no (zero, zilch, nada) use-case for changing this setting on the fly. Why not make it a frozen at postmaster start GUC? Seems like that gets all the functionality needed and most of the ease of use. Oh, there is a use-case. If you run your system and then only afterwards realize the I/O from the stats file is high enough to be an issue, and want to change it. That said, I'm not sure the use-case is anywhere near common enough to put a lot of code into it. But I can certainly look at making it a startup GUC. As you say, that'll solve *most* of the cases. Here's a patch that implements the simple case making it a PGC_POSTMASTER variable. Is this good enough for people? ;-) //Magnus Index: src/backend/postmaster/pgstat.c === RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v retrieving revision 1.178 diff -c -r1.178 pgstat.c *** src/backend/postmaster/pgstat.c 5 Aug 2008 12:09:30 - 1.178 --- src/backend/postmaster/pgstat.c 13 Aug 2008 13:32:02 - *** *** 70,77 */ #define PGSTAT_STAT_PERMANENT_FILENAME global/pgstat.stat #define PGSTAT_STAT_PERMANENT_TMPFILE global/pgstat.tmp - #define PGSTAT_STAT_FILENAMEpg_stat_tmp/pgstat.stat - #define PGSTAT_STAT_TMPFILE pg_stat_tmp/pgstat.tmp /* -- * Timer definitions. --- 70,75 *** *** 106,111 --- 104,116 int pgstat_track_functions = TRACK_FUNC_OFF; int pgstat_track_activity_query_size = 1024; + /* -- + * Built from GUC parameter + * -- + */ + char *pgstat_stat_filename = NULL; + char *pgstat_stat_tmpname = NULL; + /* * BgWriter global statistics counters (unused in other processes). * Stored directly in a stats message structure so it can be sent *** *** 511,517 void pgstat_reset_all(void) { ! unlink(PGSTAT_STAT_FILENAME); unlink(PGSTAT_STAT_PERMANENT_FILENAME); } --- 516,522 void pgstat_reset_all(void) { ! unlink(pgstat_stat_filename); unlink(PGSTAT_STAT_PERMANENT_FILENAME); } *** *** 2911,2918 PgStat_StatFuncEntry *funcentry; FILE *fpout; int32 format_id; ! const char *tmpfile = permanent?PGSTAT_STAT_PERMANENT_TMPFILE:PGSTAT_STAT_TMPFILE; ! const char *statfile = permanent?PGSTAT_STAT_PERMANENT_FILENAME:PGSTAT_STAT_FILENAME; /* * Open the statistics temp file to write out the current values. --- 2916,2923 PgStat_StatFuncEntry *funcentry; FILE *fpout; int32 format_id; ! const char *tmpfile = permanent?PGSTAT_STAT_PERMANENT_TMPFILE:pgstat_stat_tmpname; ! const char *statfile = permanent?PGSTAT_STAT_PERMANENT_FILENAME:pgstat_stat_filename; /* * Open the statistics temp file to write out the current values. *** *** 3012,3018 } if (permanent) ! unlink(PGSTAT_STAT_FILENAME); } --- 3017,3023 } if (permanent) ! unlink(pgstat_stat_filename); } *** *** 3039,3045 FILE *fpin; int32 format_id; bool found; ! const char *statfile = permanent?PGSTAT_STAT_PERMANENT_FILENAME:PGSTAT_STAT_FILENAME; /* * The tables will live in pgStatLocalContext. --- 3044,3050 FILE *fpin; int32 format_id; bool found; ! const char *statfile = permanent?PGSTAT_STAT_PERMANENT_FILENAME:pgstat_stat_filename; /* * The tables will live in pgStatLocalContext. Index: src/backend/utils/misc/guc.c === RCS file: /cvsroot/pgsql/src/backend/utils/misc/guc.c,v retrieving revision 1.465 diff -c -r1.465 guc.c *** src/backend/utils/misc/guc.c 23 Jul 2008 17:29:53 - 1.465 --- src/backend/utils/misc/guc.c 13 Aug 2008 13:32:03 - *** *** 164,169 --- 164,170 static const char *show_tcp_keepalives_count(void); static bool assign_autovacuum_max_workers(int newval, bool doit, GucSource source); static bool assign_maxconnections(int newval, bool doit, GucSource source); + static const char *assign_pgstat_temp_directory(const char *newval, bool doit, GucSource source); static char *config_enum_get_options(struct config_enum *record, const char *prefix, const char *suffix); *** *** 343,348 --- 344,351 char *IdentFileName; char *external_pid_file; + char *pgstat_temp_directory; + int tcp_keepalives_idle; int tcp_keepalives_interval; int tcp_keepalives_count; *** *** 2467,2472 --- 2470,2485 },
Re: [HACKERS] Transaction-controlled robustness for replication
Hi, Simon Riggs wrote: Classification of Replication Techniques Thanks for your classifications. It helps a great deal to clarify. Type 2 is where you ship the WAL (efficient) then use it to reconstruct SQL (flexible) and then apply that to other nodes. It is somewhat harder than type 1, but requires less infrastructure (IMHO). Definitely requires less data shipping from Primary node, so very possibly more efficient. What leads you to that conclusion? AFAICT a logical format, specifically designed for replication is quite certainly more compact than the WAL (assuming that's what you mean by less data). The only efficiency gain I can see compared to type 1 is, that most of the processing work is offloaded from the master to the slave(s). For setups with multiple slaves, that's a bad trade-off, IMO. Previously, most RDBMS vendors supported type 1a) systems. They have now moved to type 2 and 3 systems. Both DB2 and Oracle support a type 2 *and* a type 3 replication system. The reasons they do this are valid for us also, so I suggest that we do the same. So for me, it is not about whether we do type 2 or type 3, I think we should do both. I currently don't think type 2 is doable with any reasonable effort, but hey, I'm always open for surprises. :-) Which of IBM's and Oracle's products are you referring to? Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replay attack of query cancel
Magnus Hagander wrote: Gregory Stark wrote: Magnus Hagander [EMAIL PROTECTED] writes: We could have the server indicate it's the new protocol by sending the initial cancel key twice. If the client sees more than one cancel key it automatically switches to new-style cancel messages. That will still break things like JDBC I think - they only expect one cancel message, and then move on to expect other things. Why didn't they follow recommended practice to process any message anytime? Was/is there a specific reason to avoid that in that driver? (Just curious). What would work is using a parameter field, per Stephen's suggestion elsewhere in the thread. Older libpq versions should just ignore the parameter if they don't know what it is. Question is, is that too ugly a workaround, since we'll need to keep it around forever? (We have special handling of a few other parameters already, so maybe not?) You only need to keep the runtimeparameter for as long as you don't bump the protocol version. Then again, runtimeparameters are cheap. -- Sincerely, Stephen R. van den Berg. And now for something *completely* different! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction-controlled robustness for replication
On Wed, 2008-08-13 at 15:38 +0200, Markus Wanner wrote: Simon Riggs wrote: Classification of Replication Techniques Thanks for your classifications. It helps a great deal to clarify. Type 2 is where you ship the WAL (efficient) then use it to reconstruct SQL (flexible) and then apply that to other nodes. It is somewhat harder than type 1, but requires less infrastructure (IMHO). Definitely requires less data shipping from Primary node, so very possibly more efficient. What leads you to that conclusion? AFAICT a logical format, specifically designed for replication is quite certainly more compact than the WAL (assuming that's what you mean by less data). Possibly, but since we are generating and writing WAL anyway that's not a completely fair comparison. Which of IBM's and Oracle's products are you referring to? IBM DB2 HADR, QReplication. Oracle Streams 10g+, Data Guard Logical and Physical Standby All of which I've personally used, except for Oracle Streams10g, which I investigated thoroughly for a client about 4 years ago. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replay attack of query cancel
Stephen R. van den Berg wrote: Magnus Hagander wrote: Gregory Stark wrote: Magnus Hagander [EMAIL PROTECTED] writes: We could have the server indicate it's the new protocol by sending the initial cancel key twice. If the client sees more than one cancel key it automatically switches to new-style cancel messages. That will still break things like JDBC I think - they only expect one cancel message, and then move on to expect other things. Why didn't they follow recommended practice to process any message anytime? Was/is there a specific reason to avoid that in that driver? (Just curious). No idea, but that's how it is IIRC. And there are other drivers to think about as well. What would work is using a parameter field, per Stephen's suggestion elsewhere in the thread. Older libpq versions should just ignore the parameter if they don't know what it is. Question is, is that too ugly a workaround, since we'll need to keep it around forever? (We have special handling of a few other parameters already, so maybe not?) You only need to keep the runtimeparameter for as long as you don't bump the protocol version. Then again, runtimeparameters are cheap. Yeah, I guess that's true. Once you break backwards compatibility once, you can break a couple of things at the same time :) //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replay attack of query cancel
Magnus Hagander [EMAIL PROTECTED] writes: What would work is using a parameter field, per Stephen's suggestion elsewhere in the thread. Older libpq versions should just ignore the parameter if they don't know what it is. +1 for that one; we have done it already for several send-at-startup parameters that didn't use to be there, eg standard_conforming_strings. I'd go with defining it as the maximum cancel version number supported, eg cancel_version = 31, with the understanding that if it's not reported then 3.0 should be assumed. So the plan looks like this, I think: * Main FE/BE protocol doesn't change, but there might be an additional value reported in the initial ParameterStatus messages. We believe that this will not break any existing clients. * Server accepts two different styles of cancel messages, identified by different protocol numbers. * Client decides which type to send based on looking for the cancel_version parameter. This seems back-patchable since neither old clients nor old servers are broken: the only consequence of using an old one is your cancels aren't sent securely, which frankly a lot of people aren't going to care about anyway. Note after looking at the postmaster code: the contents of new-style cancel keys ought to be the backend PID in clear, the sequence number in clear, and the hash of backend PID + cancel key + sequence number. If we don't do it that way, the postmaster will need to apply the hash function for *each* backend to see if it matches, which seems like a lot more computation than we want. The postmaster needs to be able to identify which backend is the potential match before executing the hash. The main drawback I can see to keeping this backward-compatible is that keeping the cancel key to 32 bits could still leave us vulnerable to brute force attacks: once you've seen a cancel message, just try all the possible keys till you get a match, and then you can generate a cancel that will work. Can we refine the HMAC idea to defeat that? Otherwise we're assuming that 2^32 HMACs take longer than the useful life of a cancel key, which doesn't seem like a very future-proof assumption. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replay attack of query cancel
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: What would work is using a parameter field, per Stephen's suggestion elsewhere in the thread. Older libpq versions should just ignore the parameter if they don't know what it is. +1 for that one; we have done it already for several send-at-startup parameters that didn't use to be there, eg standard_conforming_strings. I'd go with defining it as the maximum cancel version number supported, eg cancel_version = 31, with the understanding that if it's not reported then 3.0 should be assumed. So the plan looks like this, I think: * Main FE/BE protocol doesn't change, but there might be an additional value reported in the initial ParameterStatus messages. We believe that this will not break any existing clients. * Server accepts two different styles of cancel messages, identified by different protocol numbers. With the additional point that there is a GUC variable to turn this off or warn about it, right? * Client decides which type to send based on looking for the cancel_version parameter. This seems back-patchable since neither old clients nor old servers are broken: the only consequence of using an old one is your cancels aren't sent securely, which frankly a lot of people aren't going to care about anyway. Right. All those people running without SSL in the first place, for example... Note after looking at the postmaster code: the contents of new-style cancel keys ought to be the backend PID in clear, the sequence number in clear, and the hash of backend PID + cancel key + sequence number. If we don't do it that way, the postmaster will need to apply the hash function for *each* backend to see if it matches, which seems like a lot more computation than we want. The postmaster needs to be able to identify which backend is the potential match before executing the hash. Yes, certainly. Otherwise, the cost ends up a lot higher on the server than the client, which is a really simple DOS opportunity. The main drawback I can see to keeping this backward-compatible is that keeping the cancel key to 32 bits could still leave us vulnerable to brute force attacks: once you've seen a cancel message, just try all the possible keys till you get a match, and then you can generate a cancel that will work. Can we refine the HMAC idea to defeat that? Otherwise we're assuming that 2^32 HMACs take longer than the useful life of a cancel key, which doesn't seem like a very future-proof assumption. Well, you're also going to have to increment n every time. We could just cap n at arbitrary level. Say you can only cancel queries on a single connection a million times or so. It's not perfect, but it gets you somewhere. Another option would be to send a new, longer, cancel key as part of the separate parameter we're sending during startup (when we're indicating which version we support). Then we'll use the longer cancel key if we're dealing with new style cancel but keep the old 32-bit one for backwards compatibility. //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replay attack of query cancel
Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: * Server accepts two different styles of cancel messages, identified by different protocol numbers. With the additional point that there is a GUC variable to turn this off or warn about it, right? I see pretty much no value in that. The main drawback I can see to keeping this backward-compatible is that keeping the cancel key to 32 bits could still leave us vulnerable to brute force attacks: once you've seen a cancel message, just try all the possible keys till you get a match, and then you can generate a cancel that will work. Can we refine the HMAC idea to defeat that? Otherwise we're assuming that 2^32 HMACs take longer than the useful life of a cancel key, which doesn't seem like a very future-proof assumption. Well, you're also going to have to increment n every time. We could just cap n at arbitrary level. Say you can only cancel queries on a single connection a million times or so. It's not perfect, but it gets you somewhere. Once you've brute-forced the secret key, you can just use an n that's somewhat more than the last one the client used, assuming you've been sniffing the connection the whole time. Or use one that's just a bit less than what you can predict the server will take. Not only do you get to kill the current query, but you'll have prevented the client from issuing legitimate cancels after that, since it won't know you bumped the server's n value. So the idea still needs work. Another option would be to send a new, longer, cancel key as part of the separate parameter we're sending during startup (when we're indicating which version we support). Then we'll use the longer cancel key if we're dealing with new style cancel but keep the old 32-bit one for backwards compatibility. Yeah. So then we just need one added parameter: secure_cancel_key = string. BTW, should we make all of this conditional on the use of an SSL connection? If the original sending of the cancel key isn't secure against sniffing, it's hard to see what anyone is buying with all the added computation. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replay attack of query cancel
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: * Server accepts two different styles of cancel messages, identified by different protocol numbers. With the additional point that there is a GUC variable to turn this off or warn about it, right? I see pretty much no value in that. The server is the point where you enforce security policies. The server is where you say that SSL is required, for example. The same way, this would let you at the server level say that secure cancels are required. The main drawback I can see to keeping this backward-compatible is that keeping the cancel key to 32 bits could still leave us vulnerable to brute force attacks: once you've seen a cancel message, just try all the possible keys till you get a match, and then you can generate a cancel that will work. Can we refine the HMAC idea to defeat that? Otherwise we're assuming that 2^32 HMACs take longer than the useful life of a cancel key, which doesn't seem like a very future-proof assumption. Well, you're also going to have to increment n every time. We could just cap n at arbitrary level. Say you can only cancel queries on a single connection a million times or so. It's not perfect, but it gets you somewhere. Once you've brute-forced the secret key, you can just use an n that's somewhat more than the last one the client used, assuming you've been sniffing the connection the whole time. Or use one that's just a bit less than what you can predict the server will take. Not only do you get to kill the current query, but you'll have prevented the client from issuing legitimate cancels after that, since it won't know you bumped the server's n value. So the idea still needs work. Yeah, I like the idea below much better. Another option would be to send a new, longer, cancel key as part of the separate parameter we're sending during startup (when we're indicating which version we support). Then we'll use the longer cancel key if we're dealing with new style cancel but keep the old 32-bit one for backwards compatibility. Yeah. So then we just need one added parameter: secure_cancel_key = string. Yup. Seems a whole lot cleaner. BTW, should we make all of this conditional on the use of an SSL connection? If the original sending of the cancel key isn't secure against sniffing, it's hard to see what anyone is buying with all the added computation. Not sure. In practice it makes no difference, but our code is more readable with less #ifdefs and branches. I don't think the added computation makes any noticable difference at all in the normal non-attacker scenario, so I'd just as well leave it in. //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] C Extension woes
Andrew Chernow [EMAIL PROTECTED] writes: Tim Hawes wrote: char *e_var = getenv(PGMASTERKEY); size_t length = VARSIZE(e_var) - VARHDRSZ; The VARSIZE macro is for variable length structures, like a text or bytea which contains a length and data member. You are using this macro on a regular C string e_var. The code on the output side is wrong too: if the useful data length is length, then the palloc has to ask for length + VARHDRSZ, and that's also what's got to be put into the length word. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
Hi there... Configuration autotuning is something I am really interested in. I have seen this page: http://wiki.postgresql.org/wiki/GUCS_Overhaul and a couple of emails mentioning this, so I wanted to ask is someone already on it? If yes, I'd like to contribute. Ideally, an external little app should also provide recommendations based on current database usage statistics -- wouldn't this constitute something akin to application-specific advice? In this regard, the tool could also tell you things like how much RAM you should order if you were to replace your existing database server :-) Michael
Re: [HACKERS] C Extension woes
Thank you for your replies, however, it still is not working, see below... Andrew Chernow wrote: Tim Hawes wrote: text * pl_masterkey(PG_FUNCTION_ARGS) { char *e_var = getenv(PGMASTERKEY); size_t length = VARSIZE(e_var) - VARHDRSZ; The VARSIZE macro is for variable length structures, like a text or bytea which contains a length and data member. You are using this macro on a regular C string e_var. Try this instead: size_t length = e_var != NULL ? strlen(e_var) : 0; @Jan: It appears the cstring_to_text function is unique to the latest PostgreSQL code. I do not have a def for that for PostgreSQL 8.2, and currently I am stuck working with that version. I changed the return value to Datum, and I had previously only copied from the examples at: http://www.postgresql.org/docs/8.2/interactive/xfunc-c.html @Andrew: here is my new code: Datum pl_masterkey(PG_FUNCTION_ARGS) { char *e_var = getenv(PGMASTERKEY); size_t length = e_var != NULL ? strlen(e_var) : 0; text * mkey = (text *) palloc(length); VARATT_SIZEP(mkey) = length; memcpy(VARDATA(mkey), e_var, length); PG_RETURN_TEXT_P(mkey); } now gets: select pl_masterkey(); ERROR: invalid memory alloc request size 4294967293 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uncopied parameters on CREATE TABLE LIKE
Added to TODO: o Add CREATE TABLE LIKE ... INCLUDING COMMENTS * Allow table and index WITH options to be specified via hooks, for use with plugins like GiST index methods --- ITAGAKI Takahiro wrote: Simon Riggs [EMAIL PROTECTED] wrote: On Tue, 2008-08-12 at 13:34 -0400, Bruce Momjian wrote: Is there a TODO here? Yes, three: * create plugin API to allow extensible reloptions Yes, and some index access methods (gist and gin) should have abilities to handle reloptions in user defined functions, too. * allow reloptions to be copied when using INCLUDING DEFAULTS (if I got that right..) The part INCLUDING DEFAULTS is not correct. Reloptions are attribute of table, not of columns. So they should not be copied on LIKE and INHERITS. Of course, we'd better to have convenient way to copy reltoptions from another table, but concrete syntax was not suggested in the discussion. (and I don't have.) * allow new option to copy comments INCLUDING COMMENTS Sure. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replay attack of query cancel
Tom Lane [EMAIL PROTECTED] wrote: BTW, should we make all of this conditional on the use of an SSL connection? If the original sending of the cancel key isn't secure against sniffing, it's hard to see what anyone is buying with all the added computation. +1 All of our important production work is done with local connections. If the machine has been compromised to the level that loopback traffic is being intercepted, these protections won't help. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction-controlled robustness for replication
Simon Riggs wrote: On Wed, 2008-08-13 at 15:38 +0200, Markus Wanner wrote: Simon Riggs wrote: Classification of Replication Techniques Thanks for your classifications. It helps a great deal to clarify. Type 2 is where you ship the WAL (efficient) then use it to reconstruct SQL (flexible) and then apply that to other nodes. It is somewhat harder than type 1, but requires less infrastructure (IMHO). Definitely requires less data shipping from Primary node, so very possibly more efficient. What leads you to that conclusion? AFAICT a logical format, specifically designed for replication is quite certainly more compact than the WAL (assuming that's what you mean by less data). Possibly, but since we are generating and writing WAL anyway that's not a completely fair comparison. Which of IBM's and Oracle's products are you referring to? IBM DB2 HADR, QReplication. Oracle Streams 10g+, Data Guard Logical and Physical Standby All of which I've personally used, except for Oracle Streams10g, which I investigated thoroughly for a client about 4 years ago. I think doing the WAL streaming and allowing a read-only slave is enough work to keep Simon busy for quite some time. I don't understand why the logical issue is being discussed at this stage --- let's get the other stuff done first. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction-controlled robustness for replication
Simon Riggs wrote: The main point of the post is that the parameter would be transaction controlled, so *must* be set in the transaction and thus *must* be set on the master. Otherwise the capability is not available in the way I am describing. Oh, so synchronous_commit would not control WAL sync on the slave? What about our fsync parameter? Because the slave is read-only, I saw no disadvantage of setting synchronous_commit to off in postgresql.conf on the slave. The setting of synchronous_commit will be important if the standby becomes the primary. I can see many cases where we might want syncnet mode (i.e. no fsync of WAL data to disk on standby) and yet want synchronous_commit=on when it becomes primary. So if we were to use same parameters it would be confusing. I disagree. If they make it the master they change the setting. Also, if we choose async slave I would expect the consistency of synchronous_commit rather than fsync=off's database inconsistency. The big problem with an async slave is that not only would you have lost data in a failover, but the database might be inconsistent, like fsync = off, which is something I think we want to try to avoid, which is why I was suggesting synchronous_commit = off. Or were you thinking of always doing fsync on the slave, no matter what. I am worried the slave might not be able to keep up (being single-threaded) and therefore we should allow a way to async commit on the slave. Bit confused here. I've not said I want always async, neither have I said I want always sync. The main thing is we agree there will be 3 settings, including two variants of synchronous replication one fairly safe and one ultra safe. For the ultra safe mode we really need to see how synch replication will work before we comment on where we might introduce fsyncs. I'm presuming that incoming WAL will be written to WAL files (and optionally fsynced). You might be talking about applying WAL records to the database and then fsyncing them, but we do need to allow for crash recovery of the standby server, so the data must be synced to WAL files before it is synced to database. I was talking about fsync'ing the WAL when each entry arrives from the master. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SeqScan costs
Proposal: Make the first block of a seq scan cost random_page_cost, then after that every additional block costs seq_page_cost. +1 AFAICS the cost cross-over is much higher than the actual elapsed time cross-over for both narrow and wide tables. Which makes absolute sense, since readahead can only reduce cost when you read more than one page (and more than a few when lacking fadvise tech). I am wondering about your test though. It was all cached, so it seems we also underestimate the CPU cost of accessing and comparing all the rows during seq scan. Thats why using SET enable_seqscan=off helps performance in many cases, or why people reduce random_page_cost to force index selection. Sequential scans that cause more IO than an alternate index path are often counter productive in highly concurrent scenarios. In such scenarios it is really reasonable to lower random_page_cost. Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] temporary statistics option at initdb time
On Aug 13, 2008, at 4:12 AM, Magnus Hagander wrote: Tom Lane wrote: Decibel! [EMAIL PROTECTED] writes: I disagree. While we don't guarantee stats are absolutely up-to- date, or atomic I don't think that gives license for them to just magically not exist sometimes. Would it really be that hard to have the system copy the file out before telling all the other backends of the change? Well, there is no (zero, zilch, nada) use-case for changing this setting on the fly. Why not make it a frozen at postmaster start GUC? Seems like that gets all the functionality needed and most of the ease of use. Oh, there is a use-case. If you run your system and then only afterwards realize the I/O from the stats file is high enough to be an issue, and want to change it. That said, I'm not sure the use-case is anywhere near common enough to put a lot of code into it. Something to keep in mind as PG is used to build larger systems 'further up the enterprise'... for us to bounce a database at work costs us a LOT in lost revenue. I don't want to go into specifics, but it's more than enough to buy a very nice car. :) That's why I asked how hard it'd be to do this on the fly. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] benchmark farm
On 1/21/08, Andrew Dunstan [EMAIL PROTECTED] wrote: It's on my (very long) TODO list to add benchmarking as an option on the buildfarm. If you're interested in working on it then contact me offline and we will work on how to move forward. any move in this? -- regards, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. (593) 87171157 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction-controlled robustness for replication
On Wed, 2008-08-13 at 11:17 -0400, Bruce Momjian wrote: I think doing the WAL streaming and allowing a read-only slave is enough work to keep Simon busy for quite some time. I don't understand why the logical issue is being discussed at this stage --- let's get the other stuff done first. I'm not working on WAL streaming. My plans assume someone else is and I'll be as worried as you if the current answer is nobody. I expect to be involved in reviewing it fairly closely though... Other than that, my time is limited only by how much funding I have. Or put another way, the constraining factor is money, not available time. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] C Extension woes
Tim Hawes wrote: @Jan: It appears the cstring_to_text function is unique to the latest PostgreSQL code. I do not have a def for that for PostgreSQL 8.2, and Oh, I'm sorry, I forgot about that. cstring_to_text has been added only recently (it's not even it 8.3, silly me). Datum pl_masterkey(PG_FUNCTION_ARGS) { char *e_var = getenv(PGMASTERKEY); size_t length = e_var != NULL ? strlen(e_var) : 0; text * mkey = (text *) palloc(length); VARATT_SIZEP(mkey) = length; memcpy(VARDATA(mkey), e_var, length); PG_RETURN_TEXT_P(mkey); } You forgot to palloc space for the varlena header. You need an extra VARHDRSZ bytes for the header, in addition to what is needed for your data. Try something like this: Datum pl_masterkey(PG_FUNCTION_ARGS) { char *e_var; size_t length; text *mkey; e_var = getenv(PGMASTERKEY); length = e_var ? strlen(e_var) : 0; mkey = (text *) palloc(VARHDRSZ + length); VARATT_SIZEP(mkey) = VARHDRSZ + length; memcpy(VARDATA(mkey), e_var, length); PG_RETURN_TEXT_P(mkey); } -- Jan Urbanski GPG key ID: E583D7D2 ouden estin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] C Extension woes
Ok, that worked! Thank you very much, Jan and others who gave their input. I did see Tom's input for the VARHDRSZ and tried that, but forgot to add that again when I called VARATT_SIZEP Jan Urbański wrote: Tim Hawes wrote: @Jan: It appears the cstring_to_text function is unique to the latest PostgreSQL code. I do not have a def for that for PostgreSQL 8.2, and Oh, I'm sorry, I forgot about that. cstring_to_text has been added only recently (it's not even it 8.3, silly me). Datum pl_masterkey(PG_FUNCTION_ARGS) { char *e_var = getenv(PGMASTERKEY); size_t length = e_var != NULL ? strlen(e_var) : 0; text * mkey = (text *) palloc(length); VARATT_SIZEP(mkey) = length; memcpy(VARDATA(mkey), e_var, length); PG_RETURN_TEXT_P(mkey); } You forgot to palloc space for the varlena header. You need an extra VARHDRSZ bytes for the header, in addition to what is needed for your data. Try something like this: Datum pl_masterkey(PG_FUNCTION_ARGS) { char *e_var; size_t length; text *mkey; e_var = getenv(PGMASTERKEY); length = e_var ? strlen(e_var) : 0; mkey = (text *) palloc(VARHDRSZ + length); VARATT_SIZEP(mkey) = VARHDRSZ + length; memcpy(VARDATA(mkey), e_var, length); PG_RETURN_TEXT_P(mkey); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction-controlled robustness for replication
On Wed, 2008-08-13 at 11:27 -0400, Bruce Momjian wrote: I disagree. If they make it the master they change the setting. It's not acceptable to force people to edit a configuration file when failover occurs. Some people wish to automate this and fumbling parameter values at this important time is definitely inappropriate. We gain nothing by asking people do do things that way. Plus, as I have said: if it is controlled on the Standby then it will not be transaction-controlled and this will be a useful thing. I asked myself where would I like to be set? The answer was on the master. If you think differently, please say why. Yes, we can set it on the standby, but I see no reason to do so. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SeqScan costs
On Aug 12, 2008, at 4:52 PM, Andrew Gierth wrote: Tom == Tom Lane [EMAIL PROTECTED] writes: Proposal: Make the first block of a seq scan cost random_page_cost, then after that every additional block costs seq_page_cost. ?Tom This is only going to matter for a table of 1 block (or at least ?Tom very few blocks), and for such a table it's highly likely that ?Tom it's in RAM anyway.? So I'm unconvinced that the proposed change ?Tom represents a better model of reality. Simple example which demonstrates a 10x speed improvement for index scan over seqscan for a 1-block table (on 8.3.3): create table oneblock (id integer primary key, value text not null);? insert into oneblock select i, 'row ' || i from generate_series (1,200) i; test= select pg_relation_size('oneblock'); ?pg_relation_size? -- ?? ? ? ? ? ? 8192 analyze oneblock; set enable_seqscan=true; select (select value from oneblock where id = i) ? from generate_series(1,200) i, generate_series(1,5000) j; Time: 25596.709 ms? (that's 25.6 us per row) set enable_seqscan=false; select (select value from oneblock where id = i) ? from generate_series(1,200) i, generate_series(1,5000) j; Time: 2415.691 ms ? (that's 2.4 us per row) Roughly what I get on my MBP (I'm about a factor of 2 slower). This makes me think it's an issue of having to slog through an entire page one row at a time vs just using the index. To test this I tested selecting i=200 (remember we start filling data at the back of the page, so 200 would actually be the front, and I'm assuming the first value that would be hit) vs i=1. With seqscans, I saw about a 10% difference. With index scans the difference was moot, but also note that now index scans are in-between seqscans in performance. [EMAIL PROTECTED] explain analyze select (select value from oneblock where id = 200) from generate_series(1,200) i, generate_series(1,50) j; QUERY PLAN - Nested Loop (cost=17.00..20029.50 rows=100 width=0) (actual time=270.867..65821.373 rows=1 loops=1) InitPlan - Seq Scan on oneblock (cost=0.00..3.50 rows=1 width=7) (actual time=0.052..0.053 rows=1 loops=1) Filter: (id = 200) - Function Scan on generate_series i (cost=0.00..12.50 rows=1000 width=0) (actual time=0.062..0.351 rows=200 loops=1) - Materialize (cost=13.50..23.50 rows=1000 width=0) (actual time=1.368..164.634 rows=50 loops=200) - Function Scan on generate_series j (cost=0.00..12.50 rows=1000 width=0) (actual time=270.743..459.335 rows=50 loops=1) Total runtime: 79055.822 ms (8 rows) [EMAIL PROTECTED] explain analyze select (select value from oneblock where id = 1) from generate_series(1,200) i, generate_series(1,50) j; QUERY PLAN - Nested Loop (cost=17.00..20029.50 rows=100 width=0) (actual time=261.941..72937.226 rows=1 loops=1) InitPlan - Seq Scan on oneblock (cost=0.00..3.50 rows=1 width=7) (actual time=0.025..0.056 rows=1 loops=1) Filter: (id = 1) - Function Scan on generate_series i (cost=0.00..12.50 rows=1000 width=0) (actual time=0.060..0.346 rows=200 loops=1) - Materialize (cost=13.50..23.50 rows=1000 width=0) (actual time=1.375..182.474 rows=50 loops=200) - Function Scan on generate_series j (cost=0.00..12.50 rows=1000 width=0) (actual time=261.815..448.652 rows=50 loops=1) Total runtime: 87702.315 ms (8 rows) [EMAIL PROTECTED] set enable_seqscan =off; SET [EMAIL PROTECTED] explain analyze select (select value from oneblock where id = 200) from generate_series(1,200) i, generate_series(1,50) j; QUERY PLAN - Nested Loop (cost=21.77..20034.27 rows=100 width=0) (actual time=262.219..69851.786 rows=1 loops=1) InitPlan - Index Scan using oneblock_pkey on oneblock (cost=0.00..8.27 rows=1 width=7) (actual time=0.024..0.026 rows=1 loops=1) Index Cond: (id = 200) - Function Scan on generate_series i (cost=0.00..12.50 rows=1000 width=0) (actual time=0.062..0.355 rows=200 loops=1) - Materialize (cost=13.50..23.50 rows=1000 width=0) (actual time=1.325..174.314 rows=50 loops=200) - Function Scan on generate_series j (cost=0.00..12.50 rows=1000 width=0) (actual time=262.119..449.383 rows=50 loops=1) Total runtime: 83024.952 ms (8 rows) [EMAIL
Re: [HACKERS] Transaction-controlled robustness for replication
Simon Riggs wrote: On Wed, 2008-08-13 at 11:17 -0400, Bruce Momjian wrote: I think doing the WAL streaming and allowing a read-only slave is enough work to keep Simon busy for quite some time. I don't understand why the logical issue is being discussed at this stage --- let's get the other stuff done first. I'm not working on WAL streaming. My plans assume someone else is and I'll be as worried as you if the current answer is nobody. Oh, OK, good. I was unclear on that. NTT is working with EnterpriseDB on the WAL steaming code to be released to the community. I expect to be involved in reviewing it fairly closely though... Yes, of course. Other than that, my time is limited only by how much funding I have. Or put another way, the constraining factor is money, not available time. That is good, I think. ;-) -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] temporary statistics option at initdb time
Decibel! wrote: On Aug 13, 2008, at 4:12 AM, Magnus Hagander wrote: Tom Lane wrote: Decibel! [EMAIL PROTECTED] writes: I disagree. While we don't guarantee stats are absolutely up-to-date, or atomic I don't think that gives license for them to just magically not exist sometimes. Would it really be that hard to have the system copy the file out before telling all the other backends of the change? Well, there is no (zero, zilch, nada) use-case for changing this setting on the fly. Why not make it a frozen at postmaster start GUC? Seems like that gets all the functionality needed and most of the ease of use. Oh, there is a use-case. If you run your system and then only afterwards realize the I/O from the stats file is high enough to be an issue, and want to change it. That said, I'm not sure the use-case is anywhere near common enough to put a lot of code into it. Something to keep in mind as PG is used to build larger systems 'further up the enterprise'... for us to bounce a database at work costs us a LOT in lost revenue. I don't want to go into specifics, but it's more than enough to buy a very nice car. :) That's why I asked how hard it'd be to do this on the fly. Well, it's doable, but fairly hard. But you can do it the symlink way without shutting it down, I think. :-) //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction-controlled robustness for replication
Simon Riggs wrote: On Wed, 2008-08-13 at 11:27 -0400, Bruce Momjian wrote: I disagree. If they make it the master they change the setting. It's not acceptable to force people to edit a configuration file when failover occurs. Some people wish to automate this and fumbling parameter values at this important time is definitely inappropriate. We gain nothing by asking people do do things that way. Plus, as I have said: if it is controlled on the Standby then it will not be transaction-controlled and this will be a useful thing. I asked myself where would I like to be set? The answer was on the master. If you think differently, please say why. Yes, we can set it on the standby, but I see no reason to do so. Ah, I can see an advantage for the failover case to have the slave wal sync controlled from the master. My original goal was just to reduce GUC option bloat. Let's get farther down the road on this and see how it looks. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replay attack of query cancel
On 8/8/08, Heikki Linnakangas [EMAIL PROTECTED] wrote: It occurred to me a while ago that our query cancel messages are sent unencrypted, even when SSL is otherwise used. That's not a big issue on its own, because the cancellation message only contains the backend PID and the cancellation key, but it does open us to a replay attack. After the first query in a connection has been cancelled, an eavesdropper can reuse the backend PID and cancellation key to cancel subsequent queries on the same connection. We discussed this on the security list, and the consensus was that this isn't worth a quick fix and a security release, because - it only affects applications that use query cancel, which is rare - it only affects SSL encrypted connections (the point is moot non-encrypted connections, as you can just snatch the cancel key from the initial message) - it only let's you cancel queries, IOW it's only a DOS attack. - there's no simple fix. However, it is something to keep in mind, and perhaps fix for the next release. One idea for fixing this is to make cancellation keys disposable, and automatically issue a new one through the main connection when one is used, but that's not completely trivial, and requires a change in both the clients and the server. Another idea is to send the query cancel message only after SSL authentication, but that is impractical for libpq because we PQcancel needs to be callable from a signal handler. Why not establish SSL before sending cancel key? That way potential SSL auth is also enforced. I'm not against improving cancel protocol generally, also for non-SSL clients, but this seems orthogonal to SSL issue - if client uses SSL then I'd expect cancel packet also be sent over SSL. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SeqScan costs
Decibel! == Decibel! [EMAIL PROTECTED] writes: Decibel Roughly what I get on my MBP (I'm about a factor of 2 Decibel slower). This makes me think it's an issue of having to slog Decibel through an entire page one row at a time vs just using the Decibel index. To test this I tested selecting i=200 (remember we Decibel start filling data at the back of the page, so 200 would Decibel actually be the front, and I'm assuming the first value that Decibel would be hit) vs i=1. With seqscans, I saw about a 10% Decibel difference. With index scans the difference was moot, but Decibel also note that now index scans are in-between seqscans in Decibel performance. The problem is that by looking for a constant row, you're actually eliminating the entire effect being tested, because the uncorrelated subselect is run ONCE as an initplan, and the entire query time is then spent elsewhere. The differences in runtime you're seeing are pure noise (the fact that you had to increase the iteration count so much should have been a clue here). -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction-controlled robustness for replication
On Wed, 2008-08-13 at 12:55 -0400, Bruce Momjian wrote: NTT is working with EnterpriseDB on the WAL steaming code to be released to the community. Hopefully the code isn't steaming... :-) and that we will be able to see detailed designs and code soon. Might end up as a big pileup otherwise. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] benchmark farm
Jaime Casanova wrote: On 1/21/08, Andrew Dunstan [EMAIL PROTECTED] wrote: It's on my (very long) TODO list to add benchmarking as an option on the buildfarm. If you're interested in working on it then contact me offline and we will work on how to move forward. any move in this? Not much on my side. It seemed really complicated to plug into the existing run_bench.pl, which I didnt think was going to let me play with different postgres.conf settings easily. And they wanted to start with just running pg_bench, which wasn't what I was really interested in. I did write a few perl scripts to play around with inserts and updates and compared a few timings on xfs vs ext2... and then kinda lost interest. -Andy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replay attack of query cancel
On Wed, 13 Aug 2008, Marko Kreen wrote: On 8/8/08, Heikki Linnakangas [EMAIL PROTECTED] wrote: One idea for fixing this is to make cancellation keys disposable, and automatically issue a new one through the main connection when one is used, but that's not completely trivial, and requires a change in both the clients and the server. Another idea is to send the query cancel message only after SSL authentication, but that is impractical for libpq because we PQcancel needs to be callable from a signal handler. Why not establish SSL before sending cancel key? That way potential SSL auth is also enforced. I'm not against improving cancel protocol generally, also for non-SSL clients, but this seems orthogonal to SSL issue - if client uses SSL then I'd expect cancel packet also be sent over SSL. Because libpq PQcancel needs to be callable from a signal handler. There's limitations on what you can safely do in a signal handler, and calling an external SSL library probably isn't safe, at least not on all platforms. It certainly would be possible for many other clients like JDBC, though. In fact, we might want to do that anyway, even if we change the protocol, just on the grounds that it's surprising that the cancellation isn't SSL protected while the rest of the protocol is. In theory, one might have a strict firewall rule that only let's through SSL connections, or you might want to hide the fact that a query is being cancelled from eavesdroppers, or the PID. It's a bit far-fetched and I doubt anyone cares in practice, but for the clients that it's easy to do, I think we should. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gsoc, oprrest function for text search take 2
Heikki Linnakangas wrote: Jan Urbański wrote: through it. The only tiny ugliness is that there's one function used for qsort() and another for bsearch(), because I'm sorting an array of texts (from pg_statistic) and I'm binary searching for a lexeme (non-NULL terminated string with length). It would be nice to clean that up a bit. I think you could convert the lexeme to a TextFreq, or make the TextFreq.element a text * instead of Datum (ie., detoast it with PG_DETOAST_DATUM while you build the array for qsort). Hm, making it a text * won't help, I think, because the problem is: - pg_statistics holds an array of text datums and an array of float datums, ordered by the latter - a tsquery is a tree of WordEntries (lexemes), ie. non-NULL terminated strings The approach was to: (1) create an array of (text, float) pairs by zipping the two pg_statistics arrays into one (2) sort that array on text values (3) every time a frequency of a WordEntry needs to be determined, look it up using binary search in the sorted array So for (2) I needed a function that compares text with text (I reused bttext_pattern_cmp for that). And to do (3) I needed a function that compares text with WordEntries. I didn't want to convert WordEntries into text * values, because that would require a palloc(). Hmm, maybe I could build an array of (lexeme, float) in (1) instead, turning text * into a lexeme is very straightforward. Then I'd use the same function in (2) and (3) - cleaner. However, maybe I should just skip all that qsort() nonsense and use a hashtable? Another bold thought is: keep the pg_statistics arrays for tsvectors ordered by text datums, and not frequencies. Would've been awkward, 'cause people expect the most common frequencies array to be sorted and not the most common values/elements one, but it'd help a lot and simplify the code quite a bit. It would induce one extra qsort() in ts_typanalyze(), but would allow only bsearch()es in tssel(). My medicore gprof skills got me: [... nonsense ...] I'd like to see a little bit more testing of that. I can't read gprof myself, so the above doesn't give me much confidence. I use oprofile, which I find is much simpler to use. I think the worst case scenario is with statistics_target set to maximum, with a simplest possible query and simplest possible tsquery. One kernel recompile later... I got oprofile running, here's the setup: $ pgbench -n -f tssel-bench.sql -c 10 -t 1000 postgres And here's tssel-bench.sql: explain select * from manuals where tsvector @@ to_tsquery('foo'); The manuals table was rather small (but that's irrelevant I think) and statistic_target for the tsvector column were set to 100. Obviously foo() isn't a most common element in my test data, so the bsearch()es always miss. The results are: samples %symbol name 101507 13.4461 internal_text_pattern_compare 9139812.1070 bttext_pattern_cmp 8275310.9618 pg_detoast_datum_packed 66434 8.8001 pg_qsort 54005 7.1537 DirectFunctionCall2 48925 6.4808 pglz_decompress 44931 5.9518 compare_two_textfreqs 40178 5.3222 AllocSetAlloc 26763 3.5451 AllocSetCheck 20839 2.7604 AtEOXact_CatCache 16057 2.1270 AllocSetFree 13772 1.8243 swapfunc 10001 1.3248 .plt 7859 1.0410 text_to_cstring 7556 1.0009 datumCopy So, maybe qsorting() every time you plan a query is not that cheap after all. I think hashing would also be an overkill. How do peole feel about storing the statistics sorted on values and not on frequencies? Cheers, Jan PS: I used a simple $ opreport --symbols /path/to/postgres are there any magic switches I need to add? J -- Jan Urbanski GPG key ID: E583D7D2 ouden estin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SeqScan costs
On Wed, Aug 13, 2008 at 07:33:40PM +0100, Andrew Gierth wrote: The following message is a courtesy copy of an article that has been posted to pgsql.hackers as well. Decibel! == Decibel! [EMAIL PROTECTED] writes: Decibel Roughly what I get on my MBP (I'm about a factor of 2 Decibel slower). This makes me think it's an issue of having to slog Decibel through an entire page one row at a time vs just using the Decibel index. To test this I tested selecting i=200 (remember we Decibel start filling data at the back of the page, so 200 would Decibel actually be the front, and I'm assuming the first value that Decibel would be hit) vs i=1. With seqscans, I saw about a 10% Decibel difference. With index scans the difference was moot, but Decibel also note that now index scans are in-between seqscans in Decibel performance. The problem is that by looking for a constant row, you're actually eliminating the entire effect being tested, because the uncorrelated subselect is run ONCE as an initplan, and the entire query time is then spent elsewhere. The differences in runtime you're seeing are pure noise (the fact that you had to increase the iteration count so much should have been a clue here). Makes sense, and yeah, I was wondering a bit about that. I'll try to fake it out with offset 0 later on if no one beats me to it; I do still think we could just be seeing the effect of slogging through 200 tuples instead of going directly to the one we want. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 pgpHzjB5xruWd.pgp Description: PGP signature
[HACKERS] autovacuum: use case for indenpedent TOAST table autovac settings
Hackers and PG users, Does anyone see a need for having TOAST tables be individually configurable for autovacuum? I've finally come around to looking at being able to use ALTER TABLE for autovacuum settings, and I'm wondering if we need to support that case. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gsoc, oprrest function for text search take 2
Jan Urbański wrote: 26763 3.5451 AllocSetCheck Make sure you disable assertions before profiling. Although I'm actually a bit surprised the overhead isn't more than 3.5%, I've seen much higher overheads on other tests, but it's still skewing the results. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SeqScan costs
Decibel! [EMAIL PROTECTED] writes: Makes sense, and yeah, I was wondering a bit about that. I'll try to fake it out with offset 0 later on if no one beats me to it; I do still think we could just be seeing the effect of slogging through 200 tuples instead of going directly to the one we want. Of course index lookups aren't magic. You don't get to go *directly* to the one you want. You still have to slog through index tuples to find the right pointer. That means going to the index meta page, find the fast root pointer, look up that page, look at the single leaf page pointer, look up that page, and do a binary search of the 200 leaf pointers. Once you find the resulting match, look up the heap page and *then* go directly to the right tuple. So that means up to four trips to the buffer cache, trips through lwlocks, etc. And it still means up to 9 btree comparisons. Still less than 200 but it's not entirely free. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SeqScan costs
Gregory Stark [EMAIL PROTECTED] writes: That means going to the index meta page, find the fast root pointer, look up that page, look at the single leaf page pointer, look up that page, and do a binary search of the 200 leaf pointers. Once you find the resulting match, look up the heap page and *then* go directly to the right tuple. Actually, the metapage access has been cached for some time, and there's not going to be a separate root page if you only have 1 page worth of index entries. But yeah, for large indexes there are going to be multiple page accesses before you find what you want. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] autovacuum: use case for indenpedent TOAST table autovac settings
Alvaro Herrera [EMAIL PROTECTED] writes: Does anyone see a need for having TOAST tables be individually configurable for autovacuum? I've finally come around to looking at being able to use ALTER TABLE for autovacuum settings, and I'm wondering if we need to support that case. It seems like we'll want to do it somehow. Perhaps the cleanest way is to incorporate toast-table settings in the reloptions of the parent table. Otherwise dump/reload is gonna be a mess. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] autovacuum: use case for indenpedent TOAST table autovac settings
Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: Does anyone see a need for having TOAST tables be individually configurable for autovacuum? I've finally come around to looking at being able to use ALTER TABLE for autovacuum settings, and I'm wondering if we need to support that case. It seems like we'll want to do it somehow. Perhaps the cleanest way is to incorporate toast-table settings in the reloptions of the parent table. Otherwise dump/reload is gonna be a mess. Yeah, Magnus was suggesting this syntax: ALTER TABLE foo SET toast_autovacuum_enable = false; and the like. My question is whether there is interest in actually having support for this, or should we just inherit the settings from the main table. My gut feeling is that this may be needed in some cases, but perhaps I'm overengineering the thing. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] autovacuum: use case for indenpedent TOAST table autovac settings
Alvaro Herrera [EMAIL PROTECTED] writes: Tom Lane wrote: It seems like we'll want to do it somehow. Perhaps the cleanest way is to incorporate toast-table settings in the reloptions of the parent table. Otherwise dump/reload is gonna be a mess. My question is whether there is interest in actually having support for this, or should we just inherit the settings from the main table. My gut feeling is that this may be needed in some cases, but perhaps I'm overengineering the thing. It seems reasonable to inherit the parent's settings by default, in any case. So you could do that now and then extend the feature later if there's real demand. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SeqScan costs
On Aug 13, 2008, at 3:52 PM, Decibel! wrote: The problem is that by looking for a constant row, you're actually eliminating the entire effect being tested, because the uncorrelated subselect is run ONCE as an initplan, and the entire query time is then spent elsewhere. The differences in runtime you're seeing are pure noise (the fact that you had to increase the iteration count so much should have been a clue here). Makes sense, and yeah, I was wondering a bit about that. I'll try to fake it out with offset 0 later on if no one beats me to it; I do still think we could just be seeing the effect of slogging through 200 tuples instead of going directly to the one we want. OK, ran the test again via this query: explain analyze select (select value from oneblock where id = i) from generate_series(1,1) i, generate_series(1,10) j; changing 1,1 to 200,200 as needed. I don't see any meaningful differences between 1,1 and 200,200. The seqscan case is still notably slower than the index case (~5500ms vs ~800ms). It'd be useful to get strace data on this, but OS X doesn't have that :/ (and I'm on 10.4 so no dtrace either). Can someone get an strace from this? -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] SeqScan costs
Decibel! [EMAIL PROTECTED] writes: It'd be useful to get strace data on this, but OS X doesn't have that :/ (and I'm on 10.4 so no dtrace either). See ktrace. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] compilig libpq with borland 5.5
Uh, what version of Postgres are you testing? Would you email the output of the compile so we can see why it isn't find it? My guess is that borland CC isn't configured to find the Windows include directory. --- claudio lezcano wrote: Hi everybody Iam traying to build libpq.lib and libpq.dll library using Borland c++ 5.5 and i got these error: Error libpq.rc 1 11: Cannot open file: winver.h I opened the libpq.rc file and i saw the reference to that file #include winver.h I cheched out the path and the existence of the file and everything is all right. I actually find the file in the desired location. I don't know what to do to carry on my work. Can somebody help me out please?. Thanks in advance. Claudio Lezcano -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: patch to create explicit support for semi and anti joins
On Aug 13, 2008, at 17:31, Tom Lane wrote: What's done: Introduce JOIN_SEMI and JOIN_ANTI join types, the former replacing JOIN_IN. Unify the InClauseInfo and OuterJoinInfo infrastructure into SpecialJoinInfo. Convert IN, EXISTS, and NOT EXISTS clauses at top level of WHERE into semi and anti joins respectively. Recognize LEFT JOIN with a suitable IS NULL filter condition as an anti join. This all compiles and passes the regression tests. Wow. That sound awesome, Tom. Stupid question: Do these join types have some sort of correspondence to the SQL standard? Or would they be specific to PostgreSQL? Or is this just something that's under the hood an not actually a change to the syntax of SQL joins? What's not done: nodeMergejoin.c doesn't yet handle JOIN_ANTI. (This is just a SMOP, but it's a lot more complicated than the nestloop or hash logic, and I figured nestloop and hash were enough for testing the planner.) I guess that means you plan to do it once there has been significant testing with nestloop and hash and when the selectivity stuff is done? Best, David (Who is in over his head, but decides to stick his toe in the water anyway.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: patch to create explicit support for semi and anti joins
David E. Wheeler [EMAIL PROTECTED] writes: On Aug 13, 2008, at 17:31, Tom Lane wrote: Introduce JOIN_SEMI and JOIN_ANTI join types, Wow. That sound awesome, Tom. Stupid question: Do these join types have some sort of correspondence to the SQL standard? Semi and anti joins are pretty standard concepts in relational theory, but they have no direct mapping in the SQL join syntax. You can write them with certain well-known locutions, though: IN and EXISTS, with certain restrictions, represent semi join NOT EXISTS, with certain restrictions, represents anti join LEFT JOIN with an incompatible higher IS NULL test represents anti join Basically what this patch is about is teaching the planner that these constructs are best understood via the relational-theory concepts. We'd been doing it in a pretty ad-hoc way before, and run into a lot of problems that we've had to kluge around. I think that this approach provides a structure that will actually work well. Or is this just something that's under the hood an not actually a change to the syntax of SQL joins? Right, there's no user visible feature or syntax change here. We're just trying to provide better performance for certain common SQL idioms. What's not done: nodeMergejoin.c doesn't yet handle JOIN_ANTI. (This is just a SMOP, I guess that means you plan to do it once there has been significant testing with nestloop and hash and when the selectivity stuff is done? Actually, I got it done an hour or so ago --- it turned out to be easier than I thought. It just didn't seem like part of the critical path for the patch, so I'd been willing to let it go till later. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: patch to create explicit support for semi and anti joins
On Aug 13, 2008, at 20:12, Tom Lane wrote: Wow. That sound awesome, Tom. Stupid question: Do these join types have some sort of correspondence to the SQL standard? Semi and anti joins are pretty standard concepts in relational theory, but they have no direct mapping in the SQL join syntax. You can write them with certain well-known locutions, though: IN and EXISTS, with certain restrictions, represent semi join NOT EXISTS, with certain restrictions, represents anti join LEFT JOIN with an incompatible higher IS NULL test represents anti join Basically what this patch is about is teaching the planner that these constructs are best understood via the relational-theory concepts. We'd been doing it in a pretty ad-hoc way before, and run into a lot of problems that we've had to kluge around. I think that this approach provides a structure that will actually work well. Great. Thanks for the explanation, Tom, as always. Or is this just something that's under the hood an not actually a change to the syntax of SQL joins? Right, there's no user visible feature or syntax change here. We're just trying to provide better performance for certain common SQL idioms. Good, it makes a lot of sense. What's not done: nodeMergejoin.c doesn't yet handle JOIN_ANTI. (This is just a SMOP, I guess that means you plan to do it once there has been significant testing with nestloop and hash and when the selectivity stuff is done? Actually, I got it done an hour or so ago --- it turned out to be easier than I thought. It just didn't seem like part of the critical path for the patch, so I'd been willing to let it go till later. I love it when things work that way. :-) Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SeqScan costs
Decibel == Decibel! [EMAIL PROTECTED] writes: Decibel OK, ran the test again via this query: Decibel explain analyze select (select value from oneblock where id = i) Decibel from generate_series(1,1) i, generate_series(1,10) j; Decibel changing 1,1 to 200,200 as needed. I don't see any Decibel meaningful differences between 1,1 and 200,200. Well of course you don't, since it scans all the rows regardless. (Scalar subselects don't abort after the first row, they only abort if they find _more_ than one row, and in this example there is only one, so the whole of oneblock is scanned every time.) You could likely expose a difference using LIMIT 1 in the subselect, but that doesn't tell us anything we didn't already know (which is that yes, index scan is much faster than seqscan even for 1-block tables, except in the rare case when neither the index page nor the table page are in cache, causing the indexscan to take two page fetches rather than just one). Oddly enough, when I try it with LIMIT 1, it _does_ show a significant speed difference according to the row position, _but_ the index scan is still twice as fast even when fetching only row 1 (which is indeed physically first). -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] compilig libpq with borland 5.5
On Thu, Jul 31, 2008 at 1:52 PM, claudio lezcano [EMAIL PROTECTED] wrote: Hi everybody Iam traying to build libpq.lib and libpq.dll library using Borland c++ 5.5 and i got these error: Error libpq.rc 1 11: Cannot open file: winver.h I opened the libpq.rc file and i saw the reference to that file #include winver.h I cheched out the path and the existence of the file and everything is all right. I actually find the file in the desired location. I don't know what to do to carry on my work. Can somebody help me out please?. Just in case you can't get it working, you can always take a msvc compiled libpq.dll and create a static library out of it with the 'implib' utility using the underscore option. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] benchmark farm
On Wed, Aug 13, 2008 at 7:09 PM, Jaime Casanova wrote: any move in this? I did some changes to pgbench in February and sent them to Andrew. No reaction has been got so far. -- Regards, Michael Holzman -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers