Re: [HACKERS] Future In-Core Replication
On Sun, Apr 29, 2012 at 11:20 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Simon Riggs's message of jue abr 26 11:10:09 -0300 2012: On Thu, Apr 26, 2012 at 1:41 PM, Simon Riggs si...@2ndquadrant.com wrote: I will also be organising a small-medium sized Future of In-Core Replication meeting in Ottawa on Wed 16 May, 6-10pm. Thanks for such rapid response. I've put up a wiki page and will be adding names as they come through http://wiki.postgresql.org/wiki/PgCon2012CanadaInCoreReplicationMeeting How is this not redundant with the Cluster Summit? http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit ... oh, you're also already enlisted in that one. Sigh. My understanding is that the agenda for the cluster meeting is almost entirely dedicated to Postgres-XC. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Future In-Core Replication
On Mon, Apr 30, 2012 at 7:35 AM, Dave Page dp...@pgadmin.org wrote: On Sun, Apr 29, 2012 at 11:20 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Simon Riggs's message of jue abr 26 11:10:09 -0300 2012: On Thu, Apr 26, 2012 at 1:41 PM, Simon Riggs si...@2ndquadrant.com wrote: I will also be organising a small-medium sized Future of In-Core Replication meeting in Ottawa on Wed 16 May, 6-10pm. Thanks for such rapid response. I've put up a wiki page and will be adding names as they come through http://wiki.postgresql.org/wiki/PgCon2012CanadaInCoreReplicationMeeting How is this not redundant with the Cluster Summit? http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit ... oh, you're also already enlisted in that one. Sigh. My understanding is that the agenda for the cluster meeting is almost entirely dedicated to Postgres-XC. Yes. Regrettably, there wasn't a 90 minute slot available to discuss this at the cluster meeting, the dev meeting agenda is fairly full and my presentation on replication was rejected for the main conference. As a result, the additional meeting was the only way left open to me to initiate detailed discussion. The additional meeting will give us 240 minutes of discussion and allows us to bring in some technical users as well. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] smart shutdown at end of transaction (was: Default mode for shutdown)
Tom Lane wrote: On Fri, Apr 27, 2012 at 7:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: No, I'm not happy with that. Smart shutdown is defined to not affect current sessions. I'm fine with having a fourth mode that acts as you suggest (and, probably, even with making it the default); but not with taking away a behavior that people may well be relying on. Agreed, but not sure what to call the new mode: smarter? I'm not necessarily opposed to commandeering the name smart for the new behavior, so that what we have to find a name for is the old smart behavior. How about slow- allow existing sessions to finish (old smart) smart - allow existing transactions to finish (new) fast- kill active queries immediate - unclean shutdown But if the meaning of smart changes, then people who use pg_ctl stop -m smart and expect that active sessions will not be affected will get a surprise. Wouldn't it be better to pick a different name for the new fourth mode? It could still be the default mode, but I think that people who explicitly specify a certain mode are more likely to care about the exact behaviour. I second Heikki's suggestions for mode names. And +1 from me on changing the default behaviour. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 release notes, beta time?
On Mon, Apr 30, 2012 at 02:23, Devrim GÜNDÜZ dev...@gunduz.org wrote: On Sun, 2012-04-29 at 13:23 +0100, Simon Riggs wrote: (As a side note, RPMs *may not* be ready, because I (and Magnus) will be at PGDay Turkey on 12th, and will be busy over the whole weekend). Is that a closed meeting? I hadn't seen any mention of that anywhere. Not that much. I've been tweeting about that (in Turkish, sorry) and sending out announcements to local mailing lists for quite some time, but we not able to finalize the venue until last week. This is the 2nd one after last year. I will do a wider announcement next time, I promise -- and the new event submission policy on postgresql.org raised a barrier against adding an item to the website, but that is more a -www topic. Uh, are you referring to the policy at http://wiki.postgresql.org/wiki/NewsEventsApproval? I don't see anything there that would exclude this. (Oh, and you need to have the secret key to enter the meeting.) I think Simon can be included in the group of people who can be accepted with just the secret handshake, no? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] smart shutdown at end of transaction (was: Default mode for shutdown)
Just for the ones interested in a view on another turf: In Oracle shutdown immediate is the fastest _clean_ shutdown and shutdown abort is equal to shutdown immediate in PG. The other modes are called shutdown normal and shutdown transactional. Wolfgang Von: Tom Lane t...@sss.pgh.pa.us An: Simon Riggs si...@2ndquadrant.com CC: Robert Haas robertmh...@gmail.com; Alvaro Herrera alvhe...@commandprompt.com; Magnus Hagander mag...@hagander.net; PostgreSQL-development pgsql-hackers@postgresql.org Gesendet: 20:48 Freitag, 27.April 2012 Betreff: Re: [HACKERS] smart shutdown at end of transaction (was: Default mode for shutdown) Simon Riggs si...@2ndquadrant.com writes: On Fri, Apr 27, 2012 at 7:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: No, I'm not happy with that. Smart shutdown is defined to not affect current sessions. I'm fine with having a fourth mode that acts as you suggest (and, probably, even with making it the default); but not with taking away a behavior that people may well be relying on. Agreed, but not sure what to call the new mode: smarter? I'm not necessarily opposed to commandeering the name smart for the new behavior, so that what we have to find a name for is the old smart behavior. How about slow - allow existing sessions to finish (old smart) smart - allow existing transactions to finish (new) fast - kill active queries immediate - unclean shutdown regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Analyzing foreign tables memory problems
During ANALYZE, in analyze.c, functions compute_minimal_stats and compute_scalar_stats, values whose length exceed WIDTH_THRESHOLD (= 1024) are not used for calculating statistics other than that they are counted as too wide rows and assumed to be all different. This works fine with regular tables; values exceeding that threshold don't get detoasted and won't consume excessive memory. With foreign tables the situation is different. Even though values exceeding WIDTH_THRESHOLD won't get used, the complete rows will be fetched from the foreign table. This can easily exhaust maintenance_work_mem. A foreign data wrapper has no good way to counter the problem. It can return truncated values in ist AcquireSampleRowsFunc, but WIDTH_THRESHOLD is private to analyze.c and it's a bad idea to hard code a cutoff limit of 1025. I can think of two remedies: 1) Expose WIDTH_THRESHOLD in commands/vacuum.h and add documentation so that the authors of foreign data wrappers are aware of the problem and can avoid it on their side. This would be quite simple. 2) Instead of one callback that returns all sample rows, have a callback that just returns the next table row (or the next table row from a subset of the pages of the table in the internal case). This function could be called in a short-lived memory context. Vitter's algorithm for selecting a sample and the truncation of excessively long values would then be handled in analyze.c. This would avoid the problem completely and make it easier to write a foreign data wrapper. I haven't thought this out completely, and it would require bigger changes to analyze.c and the API than are probably welcome this close to beta. What is your opinion? Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] enable_indexonly
On Sat, Apr 28, 2012 at 17:41, Andrew Dunstan and...@dunslane.net wrote: On 04/27/2012 12:44 PM, Magnus Hagander wrote: Hmm. Forgive me, I pressed the wrong button and looked at current docs rather than dev docs. (Easier when they used to look different...) Maybe we should have the stylesheet watermark the dev docs pages. Are you offering to write the patch? ;) I am about the worst person in the world to ask to do this. But there are plenty of web jockeys in the community who would be better at it than would I. :-) Well, same here. But if one of said web jockeys can come up with a good stylesheet, I'll be happy to integrate it. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Allow breaking out of hung connection attempts
On Sun, Apr 29, 2012 at 10:12:40PM -0400, Alvaro Herrera wrote: Excerpts from Ryan Kelly's message of sáb ene 14 16:22:21 -0300 2012: I have attached a new patch which handles the connect_timeout option by adding a PQconnectTimeout(conn) function to access the connect_timeout which I then use to retrieve the existing value from the old connection. Was this patch dropped entirely? If not and it hasn't been committed yet, I think it belongs in the open CF here: https://commitfest.postgresql.org/action/commitfest_view?id=14 Needs some freshening if anyone still wants it. Update against latest HEAD attached. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -Ryan Kelly diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 8a820ac..bf4d110 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -1479,6 +1479,24 @@ char *PQoptions(const PGconn *conn); /para /listitem /varlistentry + +varlistentry id=libpq-pqconnecttimeout + term + functionPQconnectTimeout/function + indexterm + primaryPQconnectTimeout/primary + /indexterm + /term + + listitem + para + Returns the connect_timeout property as given to libpq. +synopsis +char *PQconnectTimeout(const PGconn *conn); +/synopsis + /para + /listitem +/varlistentry /variablelist /para diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index dd59aa1..90dfe13 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -1504,7 +1504,7 @@ static bool do_connect(char *dbname, char *user, char *host, char *port) { PGconn *o_conn = pset.db, - *n_conn; + *n_conn = NULL; char *password = NULL; if (!dbname) @@ -1537,7 +1537,7 @@ do_connect(char *dbname, char *user, char *host, char *port) while (true) { -#define PARAMS_ARRAY_SIZE 8 +#define PARAMS_ARRAY_SIZE 9 const char **keywords = pg_malloc(PARAMS_ARRAY_SIZE * sizeof(*keywords)); const char **values = pg_malloc(PARAMS_ARRAY_SIZE * sizeof(*values)); @@ -1555,17 +1555,120 @@ do_connect(char *dbname, char *user, char *host, char *port) values[5] = pset.progname; keywords[6] = client_encoding; values[6] = (pset.notty || getenv(PGCLIENTENCODING)) ? NULL : auto; - keywords[7] = NULL; - values[7] = NULL; + keywords[7] = connect_timeout; + values[7] = PQconnectTimeout(o_conn); + keywords[8] = NULL; + values[8] = NULL; - n_conn = PQconnectdbParams(keywords, values, true); + /* attempt connection asynchronously */ + n_conn = PQconnectStartParams(keywords, values, true); + + if (sigsetjmp(sigint_interrupt_jmp, 1) != 0) + { + /* interrupted during connection attempt */ + PQfinish(n_conn); + n_conn = NULL; + } + else + { + time_t end_time = -1; + + /* + * maybe use a connection timeout. this code essentially stolen + * from src/interfaces/libpq/fe-connect.c connectDBComplete + */ + if (PQconnectTimeout(n_conn) != NULL) + { +int timeout = atoi(PQconnectTimeout(n_conn)); +if (timeout 0) +{ + /* + * Rounding could cause connection to fail; need at least 2 secs + */ + if (timeout 2) + timeout = 2; + /* calculate the finish time based on start + timeout */ + end_time = time(NULL) + timeout; +} + } + + while(end_time 0 || time(NULL) end_time) + { +int poll_res; +int rc; +fd_set read_mask, + write_mask; +struct timeval timeout; +struct timeval *ptr_timeout; + +poll_res = PQconnectPoll(n_conn); +if (poll_res == PGRES_POLLING_OK || + poll_res == PGRES_POLLING_FAILED) +{ + break; +} + +if (poll_res == PGRES_POLLING_READING) + FD_SET(PQsocket(n_conn), read_mask); +if (poll_res == PGRES_POLLING_WRITING) + FD_SET(PQsocket(n_conn), write_mask); + +/* + * Compute appropriate timeout interval. essentially stolen + * from src/interfaces/libpq/fe-misc.c pqSocketPoll. Maybe + * that function could be made public? we could then replace + * the whole inside of this while loop, assuming it is safe + * to longjmp out from there. + */ +if (end_time == ((time_t) -1)) + ptr_timeout = NULL; +else +{ + time_t now = time(NULL); + + if (end_time now) + timeout.tv_sec = end_time - now; + else + timeout.tv_sec = 0; + timeout.tv_usec = 0; + ptr_timeout = timeout; +} + +sigint_interrupt_enabled = true; +if (cancel_pressed) +{ + PQfinish(n_conn); + n_conn = NULL; + sigint_interrupt_enabled = false; + break; +} +rc = select(PQsocket(n_conn) + 1, + read_mask, write_mask, NULL, + ptr_timeout); +sigint_interrupt_enabled = false; + +if (rc 0 errno != EINTR) + break; + } + + if (PQstatus(n_conn) != CONNECTION_OK +end_time 0 time(NULL) = end_time)
Re: [HACKERS] Future In-Core Replication
On Mon, Apr 30, 2012 at 12:39 PM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, Apr 30, 2012 at 7:35 AM, Dave Page dp...@pgadmin.org wrote: On Sun, Apr 29, 2012 at 11:20 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Simon Riggs's message of jue abr 26 11:10:09 -0300 2012: On Thu, Apr 26, 2012 at 1:41 PM, Simon Riggs si...@2ndquadrant.com wrote: I will also be organising a small-medium sized Future of In-Core Replication meeting in Ottawa on Wed 16 May, 6-10pm. Thanks for such rapid response. I've put up a wiki page and will be adding names as they come through http://wiki.postgresql.org/wiki/PgCon2012CanadaInCoreReplicationMeeting How is this not redundant with the Cluster Summit? http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit ... oh, you're also already enlisted in that one. Sigh. My understanding is that the agenda for the cluster meeting is almost entirely dedicated to Postgres-XC. Yes. Regrettably, there wasn't a 90 minute slot available to discuss this at the cluster meeting, the dev meeting agenda is fairly full and my presentation on replication was rejected for the main conference. As a result, the additional meeting was the only way left open to me to initiate detailed discussion. The additional meeting will give us 240 minutes of discussion and allows us to bring in some technical users as well. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers Hi All, Will the live streaming/recording of the meeting be available? Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] smart shutdown at end of transaction (was: Default mode for shutdown)
On Mon, Apr 30, 2012 at 9:55 AM, Wolfgang Wilhelm wolfgang20121...@yahoo.de wrote: Just for the ones interested in a view on another turf: In Oracle shutdown immediate is the fastest _clean_ shutdown and shutdown abort is equal to shutdown immediate in PG. The other modes are called shutdown normal and shutdown transactional. Though the behaviour users see is quite different. In Oracle the fastest clean shutdown still requires rolling back transactions which can take a long time. In Postgres rolling back transactions is instantaneous so a shutdown immediate will appear to behave like a shutdown abort in Oracle in that it will always run fast even if the effect on the database is different. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Allow breaking out of hung connection attempts
Excerpts from Ryan Kelly's message of lun abr 30 07:10:14 -0400 2012: On Sun, Apr 29, 2012 at 10:12:40PM -0400, Alvaro Herrera wrote: Excerpts from Ryan Kelly's message of sáb ene 14 16:22:21 -0300 2012: I have attached a new patch which handles the connect_timeout option by adding a PQconnectTimeout(conn) function to access the connect_timeout which I then use to retrieve the existing value from the old connection. Was this patch dropped entirely? If not and it hasn't been committed yet, I think it belongs in the open CF here: https://commitfest.postgresql.org/action/commitfest_view?id=14 Needs some freshening if anyone still wants it. Update against latest HEAD attached. Well, do *you* want it? -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Analyzing foreign tables memory problems
Albe Laurenz laurenz.a...@wien.gv.at writes: During ANALYZE, in analyze.c, functions compute_minimal_stats and compute_scalar_stats, values whose length exceed WIDTH_THRESHOLD (= 1024) are not used for calculating statistics other than that they are counted as too wide rows and assumed to be all different. This works fine with regular tables; values exceeding that threshold don't get detoasted and won't consume excessive memory. With foreign tables the situation is different. Even though values exceeding WIDTH_THRESHOLD won't get used, the complete rows will be fetched from the foreign table. This can easily exhaust maintenance_work_mem. I'm fairly skeptical that this is a real problem, and would prefer not to complicate wrappers until we see some evidence from the field that it's worth worrying about. The WIDTH_THRESHOLD logic was designed a dozen years ago when common settings for work_mem were a lot smaller than today. Moreover, to my mind it's always been about avoiding detoasting operations as much as saving memory, and we don't have anything equivalent to that consideration in foreign data wrappers. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] default_transaction_isolation = serializable causes crash under Hot Standby
Robert Haas robertmh...@gmail.com wrote: Simon Riggs si...@2ndquadrant.com wrote: * throw a WARNING if serializable is stated in other cases, and downgrade the request to repeatable read I think this would be reasonable, but it's still my second choice. The advantage of throwing an ERROR is that someone will presumably be forced to realize that a problem exists and fix it, whereas a WARNING may just generate a combination of log spam and unexpected behavior forever. Also, we currently block cases where you try to set transaction_isolation by throwing an ERROR, so it seems a bit more consistent to do that in other cases as well. Still, it's a reasonable choice, and certainly better than failing an assertion. I'm not totally clear on your first choice. Are you looking for something similar to the patch I posted, except that it would dodge all resulting errors at the point where they are promoted to FATAL (before HS is really functional)? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] default_transaction_isolation = serializable causes crash under Hot Standby
On Mon, Apr 30, 2012 at 10:26 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Robert Haas robertmh...@gmail.com wrote: Simon Riggs si...@2ndquadrant.com wrote: * throw a WARNING if serializable is stated in other cases, and downgrade the request to repeatable read I think this would be reasonable, but it's still my second choice. The advantage of throwing an ERROR is that someone will presumably be forced to realize that a problem exists and fix it, whereas a WARNING may just generate a combination of log spam and unexpected behavior forever. Also, we currently block cases where you try to set transaction_isolation by throwing an ERROR, so it seems a bit more consistent to do that in other cases as well. Still, it's a reasonable choice, and certainly better than failing an assertion. I'm not totally clear on your first choice. Are you looking for something similar to the patch I posted, except that it would dodge all resulting errors at the point where they are promoted to FATAL (before HS is really functional)? That's my vote. Where is that FATAL error coming from? I'm guessing it's somehow resulting from failure to set up the startup transaction in InitPostgres(). If that's the case, we ought to be able to work around it, because surely repeatable read would be fine for the startup transaction, which doesn't really do anything anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Analyzing foreign tables memory problems
Tom Lane wrote: During ANALYZE, in analyze.c, functions compute_minimal_stats and compute_scalar_stats, values whose length exceed WIDTH_THRESHOLD (= 1024) are not used for calculating statistics other than that they are counted as too wide rows and assumed to be all different. This works fine with regular tables; values exceeding that threshold don't get detoasted and won't consume excessive memory. With foreign tables the situation is different. Even though values exceeding WIDTH_THRESHOLD won't get used, the complete rows will be fetched from the foreign table. This can easily exhaust maintenance_work_mem. I'm fairly skeptical that this is a real problem, and would prefer not to complicate wrappers until we see some evidence from the field that it's worth worrying about. The WIDTH_THRESHOLD logic was designed a dozen years ago when common settings for work_mem were a lot smaller than today. Moreover, to my mind it's always been about avoiding detoasting operations as much as saving memory, and we don't have anything equivalent to that consideration in foreign data wrappers. If I have a table with 10 rows and default_statistics_target at 100, then a sample of 3 rows will be taken. If each row contains binary data of 1MB (an Image), then the data structure returned will use about 30 GB of memory, which will probably exceed maintenance_work_mem. Or is there a flaw in my reasoning? Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Sun, Apr 29, 2012 at 12:26 AM, Robert Haas robertmh...@gmail.com wrote: As for track_iotiming - track_io_timing, I'm fine with that as well. I'm still grumpy about the idea of a GUC changing the explain analyze output. How would people feel about adding an explain option that explicitly requests io timing for this explain analyze and then having the io timing be enabled if either it's requested by explain analyze or if it's set on globally? That would make it more consistent with the other explain analyze options? I realize I don't get to be grumpy without actually contributing anything, but I'm happy to write up the patch if people agree with the change. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Greg Stark st...@mit.edu writes: On Sun, Apr 29, 2012 at 12:26 AM, Robert Haas robertmh...@gmail.com wrote: As for track_iotiming - track_io_timing, I'm fine with that as well. I'm still grumpy about the idea of a GUC changing the explain analyze output. How would people feel about adding an explain option that explicitly requests io timing for this explain analyze and then having the io timing be enabled if either it's requested by explain analyze or if it's set on globally? That would make it more consistent with the other explain analyze options? I think it's going to be hard to decouple that altogether. For instance, if track_io_timing were not on but you did EXPLAIN (TIMING), you'd end up with timing info getting sent to the stats collector for just that one statement. That seems a bit weird too. I see where you're coming from but I don't think it's a good idea to add an EXPLAIN option unless you can make the two behaviors (EXPLAIN reporting and stats collection) truly independent. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Analyzing foreign tables memory problems
Albe Laurenz laurenz.a...@wien.gv.at writes: Tom Lane wrote: I'm fairly skeptical that this is a real problem, and would prefer not to complicate wrappers until we see some evidence from the field that it's worth worrying about. If I have a table with 10 rows and default_statistics_target at 100, then a sample of 3 rows will be taken. If each row contains binary data of 1MB (an Image), then the data structure returned will use about 30 GB of memory, which will probably exceed maintenance_work_mem. Or is there a flaw in my reasoning? Only that I don't believe this is a real-world scenario for a foreign table. If you have a foreign table in which all, or even many, of the rows are that wide, its performance is going to suck so badly that you'll soon look for a different schema design anyway. I don't want to complicate FDWs for this until it's an actual bottleneck in real applications, which it may never be, and certainly won't be until we've gone through a few rounds of performance refinement for basic operations. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Analyzing foreign tables memory problems
On Mon, Apr 30, 2012 at 3:24 PM, Tom Lane t...@sss.pgh.pa.us wrote: Albe Laurenz laurenz.a...@wien.gv.at writes: During ANALYZE, in analyze.c, functions compute_minimal_stats and compute_scalar_stats, values whose length exceed WIDTH_THRESHOLD (= 1024) are not used for calculating statistics other than that they are counted as too wide rows and assumed to be all different. This works fine with regular tables; values exceeding that threshold don't get detoasted and won't consume excessive memory. With foreign tables the situation is different. Even though values exceeding WIDTH_THRESHOLD won't get used, the complete rows will be fetched from the foreign table. This can easily exhaust maintenance_work_mem. I'm fairly skeptical that this is a real problem AFAIK its not possible to select all columns from an Oracle database. If you use an unqualified LONG column as part of the query then you get an error. So there are issues with simply requesting data for analysis. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Allow breaking out of hung connection attempts
On Mon, Apr 30, 2012 at 09:02:33AM -0400, Alvaro Herrera wrote: Excerpts from Ryan Kelly's message of lun abr 30 07:10:14 -0400 2012: On Sun, Apr 29, 2012 at 10:12:40PM -0400, Alvaro Herrera wrote: Excerpts from Ryan Kelly's message of sáb ene 14 16:22:21 -0300 2012: I have attached a new patch which handles the connect_timeout option by adding a PQconnectTimeout(conn) function to access the connect_timeout which I then use to retrieve the existing value from the old connection. Was this patch dropped entirely? If not and it hasn't been committed yet, I think it belongs in the open CF here: https://commitfest.postgresql.org/action/commitfest_view?id=14 Needs some freshening if anyone still wants it. Update against latest HEAD attached. Well, do *you* want it? Of course. That way I can stop patching my psql and go back to using the one that came with my release :) -Ryan Kelly -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Typo in access/transam/xlog.c
Within access/transam/xlog.c , the following comment has an obvious error: * (This should not be called for for synchronous commits.) -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Analyzing foreign tables memory problems
On Mon, Apr 30, 2012 at 12:27:45PM +0200, Albe Laurenz wrote: During ANALYZE, in analyze.c, functions compute_minimal_stats and compute_scalar_stats, values whose length exceed WIDTH_THRESHOLD (= 1024) are not used for calculating statistics other than that they are counted as too wide rows and assumed to be all different. This works fine with regular tables; values exceeding that threshold don't get detoasted and won't consume excessive memory. With foreign tables the situation is different. Even though values exceeding WIDTH_THRESHOLD won't get used, the complete rows will be fetched from the foreign table. This can easily exhaust maintenance_work_mem. A foreign data wrapper has no good way to counter the problem. It can return truncated values in ist AcquireSampleRowsFunc, but WIDTH_THRESHOLD is private to analyze.c and it's a bad idea to hard code a cutoff limit of 1025. I can think of two remedies: 1) Expose WIDTH_THRESHOLD in commands/vacuum.h and add documentation so that the authors of foreign data wrappers are aware of the problem and can avoid it on their side. This would be quite simple. Seems reasonable. How would the FDW return an indication that a value was non-NULL but removed due to excess width? Not all databases can cheaply filter out wide column values; by the time the remote side has an exact width, the remote I/O damage may already be done. To dodge that problem, when a column has SET STATISTICS 0, the FDW should be able to completely omit reading it. (I haven't studied the API needs, if any, to make that possible.) 2) Instead of one callback that returns all sample rows, have a callback that just returns the next table row (or the next table row from a subset of the pages of the table in the internal case). This function could be called in a short-lived memory context. Vitter's algorithm for selecting a sample and the truncation of excessively long values would then be handled in analyze.c. This would avoid the problem completely and make it easier to write a foreign data wrapper. I haven't thought this out completely, and it would require bigger changes to analyze.c and the API than are probably welcome this close to beta. This solves the (in your downthread example) 30 GiB of memory consumption, but you'll still read 30 GiB on the remote side and ship it all over the network. To call this fixed, we'll need something like (1) that lets the FDW limit volume at the remote side. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Torn page hazard in ginRedoUpdateMetapage()
When GIN changes a metapage, we WAL-log its ex-header content and never use a backup block. This reduces WAL volume since the vast majority of the metapage is unused. However, ginRedoUpdateMetapage() only restores the WAL-logged content if the metapage LSN predates the WAL record LSN. If a metapage write tore and updated the LSN but not the other content, we would fail to complete the update. Instead, unconditionally reinitialize the metapage similar to how _bt_restore_meta() handles the situation. I found this problem by code reading and did not attempt to build a test case illustrating its practical consequences. It's possible that there's no problem in practice on account of some reason I haven't contemplated. Thanks, nm *** a/src/backend/access/gin/ginxlog.c --- b/src/backend/access/gin/ginxlog.c *** *** 492,504 ginRedoUpdateMetapage(XLogRecPtr lsn, XLogRecord *record) return; /* assume index was deleted, nothing to do */ metapage = BufferGetPage(metabuffer); ! if (!XLByteLE(lsn, PageGetLSN(metapage))) ! { ! memcpy(GinPageGetMeta(metapage), data-metadata, sizeof(GinMetaPageData)); ! PageSetLSN(metapage, lsn); ! PageSetTLI(metapage, ThisTimeLineID); ! MarkBufferDirty(metabuffer); ! } if (data-ntuples 0) { --- 492,503 return; /* assume index was deleted, nothing to do */ metapage = BufferGetPage(metabuffer); ! GinInitMetabuffer(metabuffer); ! memcpy(GinPageGetMeta(metapage), data-metadata, sizeof(GinMetaPageData)); ! ! PageSetLSN(metapage, lsn); ! PageSetTLI(metapage, ThisTimeLineID); ! MarkBufferDirty(metabuffer); if (data-ntuples 0) { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Future In-Core Replication
On Thu, Apr 26, 2012 at 01:41:33PM +0100, Simon Riggs wrote: Some people have talked about the need for multi-master replication, whereby 2+ databases communicate changes to one another. This topic has been discussed in some depth in Computer Science academic papers, most notably, The Dangers of Replication and a Solution by the late Jim Gray. I've further studied this to the point where I have a mathematical model of this that allows me to predict what our likely success will be from implementing that. Without meaning to worry you, MM replication alone is not a solution for large data or the general case. For the general case, single master replication will continue to be the most viable option. For large and distributed data sets, some form of partitioning/sharding is required simply because full multi-master replication just isn't viable at both volume and scale. So my take on this is that MM is desirable, but is not the only thing we need - we also need partial/filtered replication to make large systems practical. Hence why I've been calling this the Bi-Directional Replication project. I'm aware that paragraph alone requires lots of explanation, which I hope to do both in writing and in person at the forthcoming developer conference. I would love to see a layout of exactly where these things make sense, similar to what we do at the bottom of our documentation for High Availability, Load Balancing, and Replication: http://www.postgresql.org/docs/9.1/static/different-replication-solutions.html Users and developers just can't seem to get the calculus of where things make sense into their heads, me included. For example, you said that MM replication alone is not a solution for large data or the general case. Why is that? Is the goal of your work really to do logical replciation, which allows for major version upgrades? Is that the defining feature? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte
On Sun, Apr 29, 2012 at 8:12 AM, Erik Rijkers e...@xs4all.nl wrote: Perhaps I'm too early with these tests, but FWIW I reran my earlier test program against three instances. (the patches compiled fine, and make check was without problem). These tests results seem to be more about the pg_trgm changes than the patch actually on this thread, unless I'm missing something. But the executive summary seems to be that pg_trgm might need to be a bit smarter about costing the trigram-based search, because when the number of trigrams is really big, using the index is counterproductive. Hopefully that's not too hard to fix; the basic approach seems quite promising. (I haven't actually looked at the patch on this thread yet to understand how it fits in; the above comments are about the pg_trgm regex stuff.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] precision and scale functions for numeric
On Sun, Apr 29, 2012 at 1:51 PM, Peter Eisentraut pete...@gmx.net wrote: I didn't find a good way to find out how many digits a numeric value has or things like whether a numeric value is an integer. (I had to go through bc(1) for the latter.) Functions like precision() and scale() would have been quite handy. Are there other ways to do this, or would this make a good addition? I think you could test for integer-ness by testing whether val % 0 = 0. You could in general test of digits after the decimal point by casting the value to text and using POSITION() to find the decimal point. If POSITION() returns zero then zero; else subtract the result from the string length. Similarly, it's pretty easy to regexp away the non-digits and measure the length of the resulting string. I don't have a good feeling for whether these operations are common enough to justify adding a few more functions. They haven't yet come up for me personally. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Future In-Core Replication
On Mon, Apr 30, 2012 at 12:38 PM, Bruce Momjian br...@momjian.us wrote: For example, you said that MM replication alone is not a solution for large data or the general case. Why is that? Is the goal of your work really to do logical replciation, which allows for major version upgrades? Is that the defining feature? TBH, I don't think MM replication belongs in the database at all. Ditto any replication solution that implements 'eventual consistency' such that after the fact conflict resolution is required. In an SQL database, when a transaction commits, it should remain so. It belongs in the application layer. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Torn page hazard in ginRedoUpdateMetapage()
Noah Misch n...@leadboat.com writes: When GIN changes a metapage, we WAL-log its ex-header content and never use a backup block. This reduces WAL volume since the vast majority of the metapage is unused. However, ginRedoUpdateMetapage() only restores the WAL-logged content if the metapage LSN predates the WAL record LSN. If a metapage write tore and updated the LSN but not the other content, we would fail to complete the update. Instead, unconditionally reinitialize the metapage similar to how _bt_restore_meta() handles the situation. I found this problem by code reading and did not attempt to build a test case illustrating its practical consequences. It's possible that there's no problem in practice on account of some reason I haven't contemplated. I think there's no problem in practice; the reason is that the GinMetaPageData struct isn't large enough to extend past the first physical sector of the page. So it's in the same disk sector as the LSN and tearing is impossible. Still, this might be a good future-proofing move, in case GinMetaPageData gets larger. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Analyzing foreign tables memory problems
Simon Riggs wrote: During ANALYZE, in analyze.c, functions compute_minimal_stats and compute_scalar_stats, values whose length exceed WIDTH_THRESHOLD (= 1024) are not used for calculating statistics other than that they are counted as too wide rows and assumed to be all different. This works fine with regular tables; values exceeding that threshold don't get detoasted and won't consume excessive memory. With foreign tables the situation is different. Even though values exceeding WIDTH_THRESHOLD won't get used, the complete rows will be fetched from the foreign table. This can easily exhaust maintenance_work_mem. I'm fairly skeptical that this is a real problem AFAIK its not possible to select all columns from an Oracle database. If you use an unqualified LONG column as part of the query then you get an error. So there are issues with simply requesting data for analysis. To detail on the specific case of Oracle, I have given up on LONG since a) it has been deprecated for a long time and b) it is not possible to retrieve a LONG column unless you know in advance how long it is. But you can have several BLOB and CLOB columns in a table, each of which can be arbitrarily large and can lead to the problem I described. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Analyzing foreign tables memory problems
Tom Lane wrote: I'm fairly skeptical that this is a real problem, and would prefer not to complicate wrappers until we see some evidence from the field that it's worth worrying about. If I have a table with 10 rows and default_statistics_target at 100, then a sample of 3 rows will be taken. If each row contains binary data of 1MB (an Image), then the data structure returned will use about 30 GB of memory, which will probably exceed maintenance_work_mem. Or is there a flaw in my reasoning? Only that I don't believe this is a real-world scenario for a foreign table. If you have a foreign table in which all, or even many, of the rows are that wide, its performance is going to suck so badly that you'll soon look for a different schema design anyway. Of course it wouldn't work well to SELECT * from such a foreign table, but it would work well enough to get one or a few rows at a time, which is probably such a table's purpose in life anyway. I don't want to complicate FDWs for this until it's an actual bottleneck in real applications, which it may never be, and certainly won't be until we've gone through a few rounds of performance refinement for basic operations. I agree that it may not be the right thing to do something invasive to solve an anticipated problem that may never be one. So scrap my second idea. But I think that exposing WIDTH_THRESHOLD wouldn't be unreasonable, would it? Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Future In-Core Replication
On Mon, Apr 30, 2012 at 6:38 PM, Bruce Momjian br...@momjian.us wrote: I would love to see a layout of exactly where these things make sense, similar to what we do at the bottom of our documentation for High Availability, Load Balancing, and Replication: http://www.postgresql.org/docs/9.1/static/different-replication-solutions.html Users and developers just can't seem to get the calculus of where things make sense into their heads, me included. For example, you said that MM replication alone is not a solution for large data or the general case. Why is that? Is the goal of your work really to do logical replciation, which allows for major version upgrades? Is that the defining feature? Good question. The use case, its breadth and utility are always the first place I start. I'm in the middle of writing a presentation that explains this from first principles and will be discussing that at the PgCon meeting. It's taken a long time to articulate that rather than make leaps of assumption and belief. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Analyzing foreign tables memory problems
Noah Misch wrote: During ANALYZE, in analyze.c, functions compute_minimal_stats and compute_scalar_stats, values whose length exceed WIDTH_THRESHOLD (= 1024) are not used for calculating statistics other than that they are counted as too wide rows and assumed to be all different. This works fine with regular tables; With foreign tables the situation is different. Even though values exceeding WIDTH_THRESHOLD won't get used, the complete rows will be fetched from the foreign table. This can easily exhaust maintenance_work_mem. I can think of two remedies: 1) Expose WIDTH_THRESHOLD in commands/vacuum.h and add documentation so that the authors of foreign data wrappers are aware of the problem and can avoid it on their side. This would be quite simple. Seems reasonable. How would the FDW return an indication that a value was non-NULL but removed due to excess width? The FDW would return a value of length WIDTH_THRESHOLD+1 that is long enough to be recognized as too long, but not long enough to cause a problem. Not all databases can cheaply filter out wide column values; by the time the remote side has an exact width, the remote I/O damage may already be done. To dodge that problem, when a column has SET STATISTICS 0, the FDW should be able to completely omit reading it. (I haven't studied the API needs, if any, to make that possible.) Depending on the capabilities of the remote side, a FDW can do more or less intelligent things to avoid the problem. But it must know WIDTH_THRESHOLD. Disabling statistics for a column as a workaround is an interesting idea, but would be more work for the FDW writer and the user. 2) Instead of one callback that returns all sample rows, have a callback that just returns the next table row (or the next table row from a subset of the pages of the table in the internal case). This function could be called in a short-lived memory context. Vitter's algorithm for selecting a sample and the truncation of excessively long values would then be handled in analyze.c. This would avoid the problem completely and make it easier to write a foreign data wrapper. This solves the (in your downthread example) 30 GiB of memory consumption, but you'll still read 30 GiB on the remote side and ship it all over the network. To call this fixed, we'll need something like (1) that lets the FDW limit volume at the remote side. You are right. I guess the first idea is the more promising one. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] precision and scale functions for numeric
-Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers- ow...@postgresql.org] On Behalf Of Robert Haas Sent: Monday, April 30, 2012 2:20 PM To: Peter Eisentraut Cc: pgsql-hackers Subject: Re: [HACKERS] precision and scale functions for numeric I think you could test for integer-ness by testing whether val % 0 = 0. Either I am missing something here or you are. Since Modulus is a division function anything % 0 results in a division-by-zero ERROR - division has to occur before a remainder can be obtained. Maybe val % 2 NOT IN (0,1) ... David J. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Future In-Core Replication
On Mon, Apr 30, 2012 at 2:33 PM, Merlin Moncure mmonc...@gmail.com wrote: On Mon, Apr 30, 2012 at 12:38 PM, Bruce Momjian br...@momjian.us wrote: For example, you said that MM replication alone is not a solution for large data or the general case. Why is that? Is the goal of your work really to do logical replciation, which allows for major version upgrades? Is that the defining feature? TBH, I don't think MM replication belongs in the database at all. Ditto any replication solution that implements 'eventual consistency' such that after the fact conflict resolution is required. In an SQL database, when a transaction commits, it should remain so. It belongs in the application layer. I basically agree, at least in the medium term. The logical replication solutions we have today generally seem to work by watching the inserts, updates, and deletes go by and writing the changed tuples to a side table. This is not very performant, because it amounts to writing the data four times: we have to write WAL for the original change, write the data files for the original change, write more WAL for the change records, and the write those data files. Since all large database solutions are eventually I/O-bound, this is not great. Writing and flushing a separate replication log in parallel to WAL would get us down to three writes, and extracting tuple data from the existing WAL would get us down to two writes, which is as well as we ever know how to do. If we just had that much in core - that is, the ability to efficiently extra tuple inserts, updates, and deletes on a logical level - it would be much easier to build a good logical replication system around PostgreSQL than it is today, and the existing systems could be adapted to deliver higher performance by making use of the new infrastructure. The other half of the changes - applying the updates - is relatively straightforward, and it wouldn't bother me to leave that in user-land, especially in the MMR case, where you have to deal with conflict resolution rules that may be much simpler to express in a higher-level language than they would be in C. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] precision and scale functions for numeric
On Mon, Apr 30, 2012 at 3:33 PM, David Johnston pol...@yahoo.com wrote: -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers- ow...@postgresql.org] On Behalf Of Robert Haas Sent: Monday, April 30, 2012 2:20 PM To: Peter Eisentraut Cc: pgsql-hackers Subject: Re: [HACKERS] precision and scale functions for numeric I think you could test for integer-ness by testing whether val % 0 = 0. Either I am missing something here or you are. Since Modulus is a division function anything % 0 results in a division-by-zero ERROR - division has to occur before a remainder can be obtained. Maybe val % 2 NOT IN (0,1) ... Oops. I meant (val % 1) = 0, not (val % 0) = 0. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] precision and scale functions for numeric
David Johnston pol...@yahoo.com wrote: I think you could test for integer-ness by testing whether val % 0 = 0. Modulus is a division function anything % 0 results in a division-by-zero It seems pretty clear that he meant % 1. test=# select '1.01'::numeric % 1; ?column? -- 0.01 (1 row) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Future In-Core Replication
On Mon, Apr 30, 2012 at 07:55:00PM +0100, Simon Riggs wrote: On Mon, Apr 30, 2012 at 6:38 PM, Bruce Momjian br...@momjian.us wrote: I would love to see a layout of exactly where these things make sense, similar to what we do at the bottom of our documentation for High Availability, Load Balancing, and Replication: http://www.postgresql.org/docs/9.1/static/different-replication-solutions.html Users and developers just can't seem to get the calculus of where things make sense into their heads, me included. For example, you said that MM replication alone is not a solution for large data or the general case. Why is that? Is the goal of your work really to do logical replciation, which allows for major version upgrades? Is that the defining feature? Good question. The use case, its breadth and utility are always the first place I start. I'm in the middle of writing a presentation that explains this from first principles and will be discussing that at the PgCon meeting. It's taken a long time to articulate that rather than make leaps of assumption and belief. Yep, it is the assumption and belief that always confuses me. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Future In-Core Replication
Robert Haas robertmh...@gmail.com wrote: The other half of the changes - applying the updates - is relatively straightforward, and it wouldn't bother me to leave that in user-land, especially in the MMR case, where you have to deal with conflict resolution rules that may be much simpler to express in a higher-level language than they would be in C. I've developed successful MMR more than once, and while I wouldn't say it's exactly been implemented in the database, it hasn't exactly been in application space either. The most successful implementations I've worked with have been a layer just outside the database, of which application software was completely unaware. The database wasn't aware of the coordination per se; but it *did* need to provide transaction information in a clean way, and the declarations of how data was distributed were in the database. In my experience a declarative definition of data distribution has always been sufficient, and certainly cleaner to deal with than imperative coding would be. YMMV. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Future In-Core Replication
On Mon, Apr 30, 2012 at 2:38 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Apr 30, 2012 at 2:33 PM, Merlin Moncure mmonc...@gmail.com wrote: On Mon, Apr 30, 2012 at 12:38 PM, Bruce Momjian br...@momjian.us wrote: For example, you said that MM replication alone is not a solution for large data or the general case. Why is that? Is the goal of your work really to do logical replciation, which allows for major version upgrades? Is that the defining feature? TBH, I don't think MM replication belongs in the database at all. Ditto any replication solution that implements 'eventual consistency' such that after the fact conflict resolution is required. In an SQL database, when a transaction commits, it should remain so. It belongs in the application layer. I basically agree, at least in the medium term. The logical replication solutions we have today generally seem to work by watching the inserts, updates, and deletes go by and writing the changed tuples to a side table. This is not very performant, because it amounts to writing the data four times: we have to write WAL for the original change, write the data files for the original change, write more WAL for the change records, and the write those data files. Since all large database solutions are eventually I/O-bound, this is not great. Writing and flushing a separate replication log in parallel to WAL would get us down to three writes, and extracting tuple data from the existing WAL would get us down to two writes, which is as well as we ever know how to do. If we just had that much in core - that is, the ability to efficiently extra tuple inserts, updates, and deletes on a logical level - it would be much easier to build a good logical replication system around PostgreSQL than it is today, and the existing systems could be adapted to deliver higher performance by making use of the new infrastructure. The other half of the changes - applying the updates - is relatively straightforward, and it wouldn't bother me to leave that in user-land, especially in the MMR case, where you have to deal with conflict resolution rules that may be much simpler to express in a higher-level language than they would be in C. Yeah -- here at $work the SQL Server team (once in a while we cross no-man's land and converse) has some fancy technology that sits directly on top of the transaction log and exposes an API that you can use to peek into the river of data running through the log and do stuff with it. In our case, they use it to triage extracts from about 100 or so distributed databases into a centralized store in a relatively realtime fashion. HS/SR simply can't do that and there would be tremendous value in something that could. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte
Robert Haas robertmh...@gmail.com wrote: Hopefully that's not too hard to fix; the basic approach seems quite promising. After playing with trigram searches for name searches against copies of production database with appropriate indexing, our shop has chosen it as the new way to do name searches here. It's really nice. My biggest complaint is related to setting the threshold for the % operator. It seems to me that there should be a GUC to control the default, and that there should be a way to set the threshold for each % operator in a query (if there is more than one). The function names which must be used on the connection before running the queries don't give any clue that they are related to trigrams: show_limit() and set_limit() are nearly useless for conveying the semantics of what they do. Even with those issues, trigram similarity searching is IMO one of the top five coolest things about PostgreSQL and should be promoted heavily. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Future In-Core Replication
If we just had that much in core - that is, the ability to efficiently extra tuple inserts, updates, and deletes on a logical level - it would be much easier to build a good logical replication system around PostgreSQL than it is today, and the existing systems could be adapted to deliver higher performance by making use of the new infrastructure. Well, this *is* the purpose of the cluster-hackers group, to add backend support which would make external replication systems easier to build and more efficient. So far the only real feature to come out of that has been the Command Triggers, but if you read the TODO list of that group you'll see that it's a laundry list of things replication systems need support for in the backend. http://wiki.postgresql.org/wiki/ClusterFeatures http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit What puts me off about this let's start from first principles approach is that in our community we have years of experience (in a couple cases, over a decade) with logical-level replication. It seems like the last thing we need is Yet Another PostgreSQL Replication System, started over from scratch and years away from being production quality. Don't we have enough external replication systems with not enough developers behind them? Even if improving an existing replication system proves to be impossible, it would make more sense to start with an analysis of the strengths and deficiencies of Slony, Londiste, Bucardo, etc., than with some kind of clean-room approach. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Future In-Core Replication
On Mon, Apr 30, 2012 at 11:43 PM, Josh Berkus j...@agliodbs.com wrote: Well, this *is* the purpose of the cluster-hackers group Well, I tried all available means to discuss my ideas before organising an external meeting. You can think of the InCore meeting as an extension of the cluster hackers meeting if you wish. What puts me off about this let's start from first principles approach is that in our community we have years of experience (in a couple cases, over a decade) with logical-level replication. It seems like the last thing we need is Yet Another PostgreSQL Replication System, started over from scratch and years away from being production quality. Don't we have enough external replication systems with not enough developers behind them? I've tried very hard to express my admiration and respect for developers of many replication systems both personally when we meet in person and on list. And I've mentioned that I'd like to include as many ideas as possible in an in-core approach. Yes, we have many external replication systems. Many, many people have expressed the desire for more coherent features in core. I believe we can achieve production quality code in 1, maybe 2 releases. That is only possible by building on what we already have and reusing the concepts, experience and perhaps even code from other projects. We are closer to that than your statement allows. Yes, we have enough external replication systems and that is one reason why we need to put things in core and not just create another external system. Even if improving an existing replication system proves to be impossible, it would make more sense to start with an analysis of the strengths and deficiencies of Slony, Londiste, Bucardo, etc., than with some kind of clean-room approach. I'm not sure why you think I would *not* be starting with that analysis. I look forward to discussing this in person, where I'm sure it will be easier. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Future In-Core Replication
Well, this *is* the purpose of the cluster-hackers group Well, I tried all available means to discuss my ideas before organising an external meeting. You can think of the InCore meeting as an extension of the cluster hackers meeting if you wish. That comment wasn't for you, it was for other folks who didn't seem to be aware that some of these ideas had already been discussed. You, of course, know because you were there. And, if you need a 3-hour meeting there's no question that you need the evening thing. There's just not enough time in anyone's schedule. I'm not sure why you think I would *not* be starting with that analysis. That wasn't the impression I got from your prior emails to hackers. Thanks for the clarification! -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Future In-Core Replication
Those are the basic requirements that I am trying to address. There are a great many important details, but the core of this is probably what I would call logical replication, that is shipping changes to other nodes in a way that does not tie us to the same physical representation that recovery/streaming replication does now. Of course, non-physical replication can take many forms. Guessing from shipping changes to other nodes, you seem to implicitly aim at asynchronous replication? If so, I am afraid it will force users to pay some cost to migrate from existig applications. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Call for Lightning Talks for pgCon
Hackers, users, pgCon attendees: You want to give a lightning talk at pgCon! Yes, you do. The fun, the glory, the laughter, the everlasting fame! These can all be yours. Be one of the ten brave and true who put together five minutes about PostgreSQL tools, experiences, forks, ideas, websites, or even (especially) jokes.Anything from hacking wal files to the PostgreSQL drinking game is an acceptable topic for the lighting talks. A short schedule: - Right Now: send your lightning talk idea to li...@pgcon.org. I'll need a title, speaker full name, speaker cell phone number, and brief (one sentence) description. - This Friday: I'll get back to you with acceptance (or not) - Friday, May 11th (or sooner) you get me your PDF slides for the talk. - Thursday, May 17, 5:15 PM: meet Magnus Hagander in the plenary presentation room for your order of speaking. - Thursday, May 17, 5:30PM to 6:30PM: you and 9 others deliver your talks Fine print: Lightning talks are strictly five (5) minutes in length, and speakers who run over will be cut off. PDF slides or browser access only, which will be presented on the conference laptop, so no demos, animations, private network access, or installed software. Lightning talks are subject to pgCon's anti-discrimination policy. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers