Re: [HACKERS] Compression of full-page-writes
(2013/10/15 13:33), Amit Kapila wrote: Snappy is good mainly for un-compressible data, see the link below: http://www.postgresql.org/message-id/CAAZKuFZCOCHsswQM60ioDO_hk12tA7OG3YcJA8v=4yebmoa...@mail.gmail.com This result was gotten in ARM architecture, it is not general CPU. Please see detail document. http://www.reddit.com/r/programming/comments/1aim6s/lz4_extremely_fast_compression_algorithm/c8y0ew9 I found compression algorithm test in HBase. I don't read detail, but it indicates snnapy algorithm gets best performance. http://blog.erdemagaoglu.com/post/4605524309/lzo-vs-snappy-vs-lzf-vs-zlib-a-comparison-of In fact, most of modern NoSQL storages use snappy. Because it has good performance and good licence(BSD license). I think it is bit difficult to prove that any one algorithm is best for all kind of loads. I think it is necessary to make best efforts in community than I do the best choice with strict test. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CF 2013-09 Wrap Up
On Mon, Oct 14, 2013 at 01:56:42PM -0500, Mike Blackwell wrote: CF 2013-09 will be wrapping up this week. As a reminder, beginning on the official CF end date (11/15), patches Waiting for Author will be Returned with Feedback. Authors are welcome to add their patch to the next CF (2013-11). Sounds good. Any patches marked Needs Review will be automatically moved to the next CF. We will try to make sure that all patches in the current CF have received at least one review. The combined effect of those two statements is not clear to me. Does that mean you'll retain never-reviewed patches and automatically move patches that have received at least one review? Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add use of asprintf()
On Tue, Oct 15, 2013 at 10:55 AM, David Rowley dgrowle...@gmail.com wrote: Though this is not yet enough to get the windows build work with me... I'm still getting link failures for isolationtester.c D:\Postgres\c\pgsql.sln (default target) (1) - D:\Postgres\c\isolationtester.vcxproj (default target) (89) - (Link target) - isolationtester.obj : error LNK2019: unresolved external symbol _pg_strdup referenced in function _try_complete_step [D:\Postgres\c\isolationtester.vcxproj] isolationtester.obj : error LNK2019: unresolved external symbol _pg_asprintf referenced in function _try_complete_step [D:\Postgres\c\isolationtester.vcxproj ] .\Release\isolationtester\isolationtester.exe : fatal error LNK1120: 2 unresolved externals [D:\Postgres\c\isolationtester.vcxproj] 1 Warning(s) I guess this is down to a make file error somewhere. Can you please try the attached patch ?. I hope it will solve the problem. David Win_isolationtester_fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows
On Mon, Oct 14, 2013 at 11:00 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Oct 10, 2013 at 9:34 AM, Amit Kapila amit.kapil...@gmail.com wrote: On further analysis, I found that hang occurs in some of Windows API(FindFirstFile, RemoveDirectroy) when symlink path (pg_tblspc/spcoid/TABLESPACE_VERSION_DIRECTORY) is used in these API's. For above testcase, it will hang in path destroy_tablespace_directories-ReadDir-readdir-FindFirstFile Well, that sucks. So it's a Windows bug. It's not clear to me that we should do anything about this at all, except perhaps document that people should avoid long tablespace path names on an unknown set of Windows versions. There are few more relatively minor issues with long paths in Windows. For Example: In function CreateTableSpace(), below check protects to create tablespace on longer paths. if (strlen(location) + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 + OIDCHARS + 1 + OIDCHARS + 1 + OIDCHARS MAXPGPATH) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), errmsg(tablespace location \%s\ is too long, location))); MAXPGPATH is defined to be 1024, whereas the windows API's used in PG have limit of 260 due to which error comes directly from API's use rather than from above check. So, one of the change I am thinking is to define MAXPGPATH for windows separately. We should not be in the business of working around any and every bug coming out of Redmond. This bug leads to an uninterruptible hang (I am not able to kill process by task manager or any other way) and the corresponding backend started consuming ~100% of CPU, so user doesn't have much options but to restart his m/c. Any form of shutdown of PG is also not successful. I had proposed to fix this issue based on its severity, but if you feel that we should keep the onus of such usage on user, then I think I can try to fix other relatively minor problems on usage of long paths. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add use of asprintf()
On Tue, Oct 15, 2013 at 8:00 PM, Asif Naeem anaeem...@gmail.com wrote: On Tue, Oct 15, 2013 at 10:55 AM, David Rowley dgrowle...@gmail.comwrote: Though this is not yet enough to get the windows build work with me... I'm still getting link failures for isolationtester.c D:\Postgres\c\pgsql.sln (default target) (1) - D:\Postgres\c\isolationtester.vcxproj (default target) (89) - (Link target) - isolationtester.obj : error LNK2019: unresolved external symbol _pg_strdup referenced in function _try_complete_step [D:\Postgres\c\isolationtester.vcxproj] isolationtester.obj : error LNK2019: unresolved external symbol _pg_asprintf referenced in function _try_complete_step [D:\Postgres\c\isolationtester.vcxproj ] .\Release\isolationtester\isolationtester.exe : fatal error LNK1120: 2 unresolved externals [D:\Postgres\c\isolationtester.vcxproj] 1 Warning(s) I guess this is down to a make file error somewhere. Can you please try the attached patch ?. I hope it will solve the problem. Thanks that combined with my patch above seems to get the build running again. In summary I've attached a patch which is both of these combined. I've not run any regression tests yet as that seems to be broken, but perhaps it is something changed with my build environment. The attached is probably worth applying to get the windows build farm members building again to see if they'll go green. Regards David Rowley David asprintf_windows_fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Description on bind message
In manual of 48.5. Message Formats section, there is a description of Bind message. Int16[C] The parameter format codes. Each must presently be zero (text) or one (binary). This could be completely non-existent field in the current implementation of PostgreSQL. I think the fact is not very clear from the description. It would be nice the description is something like: Int16[C] The parameter format codes. Each must presently be zero (text) or one (binary). This field does not exist if the number of prameter values is 0. Same thing can be said to following: Int16[R] The result-column format codes. Each must presently be zero (text) or one (binary). -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Description on bind message
On 10/15/13 11:02 AM, Tatsuo Ishii wrote: In manual of 48.5. Message Formats section, there is a description of Bind message. Int16[C] The parameter format codes. Each must presently be zero (text) or one (binary). This could be completely non-existent field in the current implementation of PostgreSQL. I think the fact is not very clear from the description. It would be nice the description is something like: Int16[C] The parameter format codes. Each must presently be zero (text) or one (binary). This field does not exist if the number of prameter values is 0. This is already explicitly said in the description of the previous field: The number of parameter format codes that follow (denoted C below). This can be zero to indicate that there are no parameters or that the parameters all use the default format (text); or one, in which case the specified format code is applied to all parameters; or it can equal the actual number of parameters. Also the documentation for the array notation explains this: An array of k n-bit integers, each in network byte order. The array length k is always determined by an earlier field in the message. Eg. Int16[M].. This part seems clear enough to me, and I don't think repeating that *three times* is necessary. We already say it twice. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Description on bind message
This is already explicitly said in the description of the previous field: The number of parameter format codes that follow (denoted C below). This can be zero to indicate that there are no parameters or that the parameters all use the default format (text); or one, in which case the specified format code is applied to all parameters; or it can equal the actual number of parameters. I know it. The confusing part is this: This can be zero to indicate that there are no parameters or that the parameters all use the default format (text); Here zero has double meaning. I am not convinced until I actually looked into the source code what this actually means. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On 12 October 2013 11:30 Tom Lane wrote: Haribabu kommi haribabu.ko...@huawei.com writes: To handle the above case instead of directly resetting the dead tuples as zero, how if the exact dead tuples are removed from the table stats. With this approach vacuum gets triggered frequently thus it reduces the bloat. This does not seem like a very good idea as-is, because it will mean that n_dead_tuples can diverge arbitrarily far from reality over time, as a result of accumulation of errors. It also doesn't seem like a very good idea that VACUUM sets n_live_tuples while only adjusting n_dead_tuples incrementally; ideally those counters should move in the same fashion. In short, I think this patch will create at least as many problems as it fixes. What would make more sense to me is for VACUUM to estimate the number of remaining dead tuples somehow and send that in its message. However, since the whole point here is that we aren't accounting for transactions that commit while VACUUM runs, it's not very clear how to do that. Another way to look at it is that we want to keep any increments to n_dead_tuples that occur after VACUUM takes its snapshot. Maybe we could have VACUUM copy the n_dead_tuples value as it exists when VACUUM starts, and then send that as the value to subtract when it's done? Taking of n_dead_tuples copy and pass the same at the vacuum end to subtract from table stats may not be correct, as vacuum may not be cleaned all the dead tuples because of tuple visibility To other transactions. How about resets the n_dead_tuples as zero if it goes negative because of errors? Regards, Hari babu. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Standby catch up state change
Hello, I wonder if there is an issue with the way state change happens from WALSNDSTATE_CATCHUP to WALSNDSTATE_STREAMING. Please note my question is solely based on a strange behavior reported by a colleague and my limited own code reading. The colleague is trying out replication with a networking middleware and noticed that the master logs the debug message about standby catching up, but the write_location in the pg_stat_replication view takes minutes to reflect the actual catch up location. ISTM that the following code in walsender.c assumes that the standby has caught up once master sends all the required WAL. 1548 /* Do we have any work to do? */ 1549 Assert(sentPtr = SendRqstPtr); 1550 if (SendRqstPtr = sentPtr) 1551 { 1552 *caughtup = true; 1553 return; 1554 } But what if the standby has not yet received all the WAL data sent by the master ? It can happen for various reasons such as caching at the OS level or the network layer on the sender machine or any other intermediate hops. Should we not instead wait for the standby to have received all the WAL before declaring that it has caught up ? If a failure happens while the data is still in the sender's buffer, the standby may not actually catch up to the desired point contrary to the LOG message displayed on the master. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Standby catch up state change
On 2013-10-15 15:51:46 +0530, Pavan Deolasee wrote: Should we not instead wait for the standby to have received all the WAL before declaring that it has caught up ? If a failure happens while the data is still in the sender's buffer, the standby may not actually catch up to the desired point contrary to the LOG message displayed on the master. I don't think that'd be a good idea - the caughtup logic is used to determine whether we need to wait for further wal to be generated locally if we haven't got anything else to do. And we only need to do so when we reached the end of the WAL. Also, we'd have to reset caughtup everytime we send data (in XLogSend()), that'd be horrible. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standby catch up state change
On Tue, Oct 15, 2013 at 3:59 PM, Andres Freund and...@2ndquadrant.comwrote: I don't think that'd be a good idea - the caughtup logic is used to determine whether we need to wait for further wal to be generated locally if we haven't got anything else to do. And we only need to do so when we reached the end of the WAL. Obviously I do not understand the logic caughtup fully, but don't you think the log message about standby having caught up with master while it hasn't because the sender has buffered a lot of data, is wrong ? Or are you saying those are two different things really ? Also, we'd have to reset caughtup everytime we send data (in XLogSend()), that'd be horrible. Sorry, I did not get that. I was only arguing that the log message about standby having caught up with master should be delayed until standby has actually received the WAL, not much about the actual implementation. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Standby catch up state change
On 2013-10-15 16:12:56 +0530, Pavan Deolasee wrote: On Tue, Oct 15, 2013 at 3:59 PM, Andres Freund and...@2ndquadrant.comwrote: I don't think that'd be a good idea - the caughtup logic is used to determine whether we need to wait for further wal to be generated locally if we haven't got anything else to do. And we only need to do so when we reached the end of the WAL. Obviously I do not understand the logic caughtup fully, but don't you think the log message about standby having caught up with master while it hasn't because the sender has buffered a lot of data, is wrong ? Or are you saying those are two different things really ? The message is logged when the state changes because the state is important for the behaviour of replication (e.g. that node becomes elegible for sync rep). I don't think delaying the message is a good idea. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITHIN GROUP patch
On 10/09/2013 04:19 PM, Pavel Stehule wrote: I checked a conformance with ANSI SQL - and I didn't find any issue. I found so following error message is not too friendly (mainly because this functionality will be new) postgres=# select dense_rank(3,3,2) within group (order by num desc, odd) from test4; ERROR: Incorrect number of arguments for hypothetical set function LINE 1: select dense_rank(3,3,2) within group (order by num desc, od... ^ postgres=# select dense_rank(3,3,2) within group (order by num desc) from test4; ERROR: Incorrect number of arguments for hypothetical set function LINE 1: select dense_rank(3,3,2) within group (order by num desc) fr... ^ postgres=# select dense_rank(3,3) within group (order by num desc) from test4; ERROR: Incorrect number of arguments for hypothetical set function LINE 1: select dense_rank(3,3) within group (order by num desc) from... ^ postgres=# select dense_rank(3,3) within group (order by num desc, num) from test4; dense_rank 3 (1 row) Probably some hint should be there? In addition to Pavel's review, I have finally finished reading the patch. Here are some notes, mainly on style: First of all, it no longer compiles on HEAD because commit 4d212bac1752e1bad6f3aa6242061c393ae93a0a stole oid 3968. I modified that locally to be able to continue my review. Some of the error messages do not comply with project style. That is, they begin with a capital letter. Ordered set functions cannot have transition functions Ordered set functions must have final functions Invalid argument types for hypothetical set function Invalid argument types for ordered set function Incompatible change to aggregate definition Too many arguments to ordered set function Ordered set finalfns must not be strict Cannot have multiple ORDER BY clauses with WITHIN GROUP Cannot have DISTINCT and WITHIN GROUP together Incorrect number of arguments for hypothetical set function Incorrect number of direct arguments to ordered set function %s And in pg_aggregate.c I found a comment with a similar problem that doesn't match its surrounding code: Oidtranssortop = InvalidOid; /* Can be omitted */ I didn't find any more examples like that, but I did see several block comments that weren't complete sentences whereas I think they should be. Also a lot of the code comments say I and I don't recall seeing that elsewhere. I may be wrong, but I would prefer if they were more neutral. The documentation has a number of issues. collateindex.pl complains of duplicated index entries for PERCENTILE CONTINUOUS and PERCENTILE DISCRETE. This is because the index markup is used for both overloaded versions. This is the same mistake Bruce made and then corrected in commit 5dcc48c2c76cf4b2b17c8e14fe3e588ae0c8eff3. if there are multiple equally good result should have an s on the end in func.sgml. Table 9-49 has an extra empty column. That should either be removed, or filled in with some kind of comment text like other similar tables. Apart from that, it looks good. There is some mismatched coding styles in there but the next run of pgindent should catch them so it's no big deal. I haven't yet exercised the actual functionality of the new functions, nor have I tried to create my own. Andrew alerted me to a division by zero bug in one of them, so I'll be looking forward to catching that. So, more review to come. -- Vik
Re: [HACKERS] Standby catch up state change
On Tue, Oct 15, 2013 at 4:16 PM, Andres Freund and...@2ndquadrant.comwrote: I don't think delaying the message is a good idea. Comment in walsender.c says: /* * If we're in catchup state, move to streaming. This is an * important state change for users to know about, since before * this point data loss might occur if the primary dies and we * need to failover to the standby. */ IOW it claims no data loss will occur after this point. But if the WAL is cached on the master side, isn't this a false claim i.e. the data loss can still occur even after master outputs the log message and changes the state to streaming. Or am I still getting it wrong ? Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Standby catch up state change
On 2013-10-15 16:29:47 +0530, Pavan Deolasee wrote: On Tue, Oct 15, 2013 at 4:16 PM, Andres Freund and...@2ndquadrant.comwrote: I don't think delaying the message is a good idea. Comment in walsender.c says: /* * If we're in catchup state, move to streaming. This is an * important state change for users to know about, since before * this point data loss might occur if the primary dies and we * need to failover to the standby. */ IOW it claims no data loss will occur after this point. But if the WAL is cached on the master side, isn't this a false claim i.e. the data loss can still occur even after master outputs the log message and changes the state to streaming. Or am I still getting it wrong ? I think you're over-intrepreting it. We don't actually rely on the data being confirmed received anywhere. And the message doesn't say anything about everything safely being written out. So, if you want to adjust that comment, go for it, but I am pretty firmly confirmed that this isn't worth changing logic. Note that the ready_to_stop logic *does* make sure everything's flushed. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL renegotiation
On 09/23/2013 10:51 PM, Alvaro Herrera wrote: + /* are we in the middle of a renegotiation? */ + static bool in_ssl_renegotiation = false; + Since this was committed, I'm getting the following warning: be-secure.c:105:13: warning: ‘in_ssl_renegotiation’ defined but not used [-Wunused-variable] -- Vik -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Statistics collection for CLUSTER command
On 09/16/2013 08:26 AM, Satoshi Nagayasu wrote: (2013/08/08 20:52), Vik Fearing wrote: As part of routine maintenance monitoring, it is interesting for us to have statistics on the CLUSTER command (timestamp of last run, and number of runs since stat reset) like we have for (auto)ANALYZE and (auto)VACUUM. Patch against today's HEAD attached. I would add this to the next commitfest but I seem to be unable to log in with my community account (I can log in to the wiki). Help appreciated. I have reviewed the patch. Thank you for your review. Succeeded to build with the latest HEAD, and passed the regression tests. Looks good enough, and I'd like to add a test case here, not only for the view definition, but also working correctly. Please take a look at attached one. Looks good to me. Attached is a rebased patch with those tests added. -- Vik *** a/doc/src/sgml/monitoring.sgml --- b/doc/src/sgml/monitoring.sgml *** *** 979,984 postgres: replaceableuser/ replaceabledatabase/ replaceablehost/ re --- 979,989 daemon/entry /row row + entrystructfieldlast_cluster//entry + entrytypetimestamp with time zone//entry + entryLast time at which commandCLUSTER/ was issued on this table/entry + /row + row entrystructfieldvacuum_count//entry entrytypebigint//entry entryNumber of times this table has been manually vacuumed *** *** 1001,1006 postgres: replaceableuser/ replaceabledatabase/ replaceablehost/ re --- 1006,1016 entryNumber of times this table has been analyzed by the autovacuum daemon/entry /row + row + entrystructfieldcluster_count//entry + entrytypebigint//entry + entryNumber of times commandCLUSTER/ has been issued on this table/entry + /row /tbody /tgroup /table *** a/src/backend/catalog/system_views.sql --- b/src/backend/catalog/system_views.sql *** *** 410,419 CREATE VIEW pg_stat_all_tables AS pg_stat_get_last_autovacuum_time(C.oid) as last_autovacuum, pg_stat_get_last_analyze_time(C.oid) as last_analyze, pg_stat_get_last_autoanalyze_time(C.oid) as last_autoanalyze, pg_stat_get_vacuum_count(C.oid) AS vacuum_count, pg_stat_get_autovacuum_count(C.oid) AS autovacuum_count, pg_stat_get_analyze_count(C.oid) AS analyze_count, ! pg_stat_get_autoanalyze_count(C.oid) AS autoanalyze_count FROM pg_class C LEFT JOIN pg_index I ON C.oid = I.indrelid LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) --- 410,421 pg_stat_get_last_autovacuum_time(C.oid) as last_autovacuum, pg_stat_get_last_analyze_time(C.oid) as last_analyze, pg_stat_get_last_autoanalyze_time(C.oid) as last_autoanalyze, + pg_stat_get_last_cluster_time(C.oid) as last_cluster, pg_stat_get_vacuum_count(C.oid) AS vacuum_count, pg_stat_get_autovacuum_count(C.oid) AS autovacuum_count, pg_stat_get_analyze_count(C.oid) AS analyze_count, ! pg_stat_get_autoanalyze_count(C.oid) AS autoanalyze_count, ! pg_stat_get_cluster_count(C.oid) AS cluster_count FROM pg_class C LEFT JOIN pg_index I ON C.oid = I.indrelid LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) *** a/src/backend/commands/cluster.c --- b/src/backend/commands/cluster.c *** *** 35,40 --- 35,41 #include commands/vacuum.h #include miscadmin.h #include optimizer/planner.h + #include pgstat.h #include storage/bufmgr.h #include storage/lmgr.h #include storage/predicate.h *** *** 407,412 cluster_rel(Oid tableOid, Oid indexOid, bool recheck, bool verbose, --- 408,417 verbose); /* NB: rebuild_relation does heap_close() on OldHeap */ + + /* Report CLUSTER to the stats collector, but not VACUUM FULL */ + if (indexOid != InvalidOid) + pgstat_report_cluster(OldHeap); } /* *** a/src/backend/postmaster/pgstat.c --- b/src/backend/postmaster/pgstat.c *** *** 292,297 static void pgstat_recv_resetsinglecounter(PgStat_MsgResetsinglecounter *msg, in --- 292,298 static void pgstat_recv_autovac(PgStat_MsgAutovacStart *msg, int len); static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len); static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len); + static void pgstat_recv_cluster(PgStat_MsgCluster *msg, int len); static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len); static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len); static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len); *** *** 1385,1390 pgstat_report_analyze(Relation rel, --- 1386,1412 } /* + * pgstat_report_cluster() - + * + * Tell the collector about the table we just CLUSTERed. + *
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Fri, Oct 11, 2013 at 2:30 PM, Peter Geoghegan p...@heroku.com wrote: But that's simpler than any of the alternatives that I see. Does there really need to be a new snapshot type with one tiny difference that apparently doesn't actually affect conventional clients of MVCC snapshots? I think that's the wrong way of thinking about it. If you're introducing a new type of snapshot, or tinkering with the semantics of an existing one, I think that's a reason to reject the patch straight off. We should be looking for a design that doesn't require that. If we can't find one, I'm not sure we should do this at all. I'm confused by this. We need to lock a row not visible to our snapshot under conventional rules. I think we can rule out serialization failures at read committed. That just leaves changing something about the visibility rules of an existing snapshot type, or creating a new snapshot type, no? It would also be unacceptable to update a tuple, and not have the new row version (which of course will still have information from the future) visible to our snapshot - what would regular RETURNING return? So what do you have in mind? I don't think that locking a row and updating it are really that distinct anyway. The benefit of locking is that we don't have to update. We can delete, for example. Well, the SQL standard way of doing this type of operation is MERGE. The alternative we know exists in other databases is REPLACE; there's also INSERT .. ON DUPLICATE KEY update. In all of those cases, whatever weirdness exists around MVCC is confined to that one command. I tend to think we should do similarly, with the goal that HeapTupleSatisfiesMVCC need not change at all. I don't have the only vote here, of course, but my feeling is that that's more likely to be a good route. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] removing old ports and architectures
On 2013-10-13 16:56:12 +0200, Tom Lane wrote: More to the point for this specific case, it seems like our process ought to be (1) select a preferably-small set of gcc atomic intrinsics that we want to use. I suggest: * pg_atomic_load_u32(uint32 *) * uint32 pg_atomic_store_u32(uint32 *) * uint32 pg_atomic_exchange_u32(uint32 *ptr, uint32 val) * bool pg_atomic_compare_exchange_u32(uint32 *ptr, uint32 *expected, uint32 newval) * uint32 pg_atomic_fetch_add_u32(uint32 *ptr, uint32 add) * uint32 pg_atomic_fetch_sub_u32(uint32 *ptr, uint32 add) * uint32 pg_atomic_fetch_and_u32(uint32 *ptr, uint32 add) * uint32 pg_atomic_fetch_or_u32(uint32 *ptr, uint32 add) * u64 variants of the above * bool pg_atomic_test_set(void *ptr) * void pg_atomic_clear(void *ptr) Ontop of that we can generically implement: * pg_atomic_add_until_u32(uint32 *ptr, uint32 val, uint32 limit) * pg_atomic_(add|sub|and|or)_fetch_u32() * u64 variants of the above We might also want to provide a generic implementation of the math operations based on pg_atomic_compare_exchange() to make it easier to bring up a new architecture. I think we should leave 64bit support optional for now. Opinions? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
From: Magnus Hagander mag...@hagander.net On Oct 12, 2013 2:13 AM, MauMau maumau...@gmail.com wrote: I'm not sure if many use XA features, but I saw the questions and answer a few times, IIRC. In the trouble situation, PostgreSQL outputs an intuitive message like increase max_prepared_transactions, so many users might possibly have been able to change the setting and solve the problem themselves without asking for help, feeling stress like Why do I have to set this? For example, max_prepared_transactions is called hideous creature in the following page: https://community.jboss.org/wiki/InstallPostgreSQLOnFedora?_sscc=t Anybody who follows that page is screwed anyway. I notice they recommend running regular VACUUM FULL across the whole database, so it's obvious they know nothing about postgresql. There's nothing we can do about what people write on random pages around the Internet. Regular VACUUM FULL is certainly overkill. Apart from that, having to set max_prepared_transactions seems to make PostgreSQL difficult for people with that level of knowledge, doesn't it? I wonder if there are other major DBMSs which require marameter configuration and server restart to use distributed transactions. According to the below page, the amount of memory consumed for this is (770 + 270 * max_locks_per_transaction) * max_prepared_transactions. With the default setting of maxconnections=100 and max_locks_per_transaction=64, this is only 180KB. So the overhead is negligible. You are assuming memory is the only overhead. I don't think it is. Having a quick look at the source code, just setting max_prepared_transactions to non-zero seems to produce almost no processing overhead. If the goal is to make PostgreSQL more friendly and run smoothly without frustration from the start and not perfect tuning, I think max_prepared_transactions=max_connections is an easy and good item. If the goal is limited to auto-tuning memory sizes, this improvement can be treated separately. Frankly, I think we'd help 1000 times more users of we enabled a few wal writers by default and jumped the wal level. Mainly so they could run one off base backup. That's used by orders of magnitude more users than XA. Agreed. The default of non-zero max_wal_senders and wal_level 'archive' would be beneficial for more users. Likewise, non-zero max_prepared_transactons would improve the impression of PostgreSQL (for limited number of users, though), and it wouldn't do any harm. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On Mon, Oct 14, 2013 at 9:51 AM, Andres Freund and...@2ndquadrant.com wrote: Well, I just think relying on specific symbol names in the .so file is kind of unfortunate. It means that, for example, you can't have multiple output plugins provided by a single .so. And in general I think it's something that we've tried to minimize. But that's not really different when you rely on _PG_init doing it's thing, right? Sure, that's true. But in general I think magic symbol names aren't a particularly good design. But there's only so much information available here. Why not just have a format that logs it all? Because we do not know what all is? Also, how would we handle replication sets and such that all of the existing replication solutions have generically? I don't see how you can fail to know what all is. There's only a certain set of facts available. I mean you could log irrelevant crap like a random number that you just picked or the sum of all numeric values in the column, but nobody's likely to want that. What people are going to want is the operation performed (insert, update, or delete), all the values in the new tuple, the key values from the old tuple, the transaction ID, and maybe some meta-information about the transaction (such as the commit timestamp). What I'd probably do is emit the data in CSV format, with the first column of each line being a single character indicating what sort of row this is: H means a header row, defining the format of subsequent rows (H,table_name,new_column1,...,new_columnj,old_key_column1,...,old_key_columnk; a new header row is emitted only when the column list changes); I, U, or D means an insert, update, or delete, with column 2 being the transaction ID, column 3 being the table name, and the remaining columns matching the last header row for emitted for that table, T means meta-information about a transaction, whatever we have (e.g. T,txn_id,commit_time). There's probably some further tweaking of that that could be done, and I might be overlooking some salient details, like maybe we want to indicate the column types as well as their names, but the range of things that someone can want to do here is not unlimited. The point, for me anyway, is that someone can write a crappy Perl script to apply changes from a file like this in a day. My contention is that there are a lot of people who will want to do just that, for one reason or another. The plugin interface has awesome power and flexibility, and really high-performance replication solutions will really benefit from that. But regular people don't want to write C code; they just want to write a crappy Perl script. And I think we can facilitate that without too much work. Oh, yuck. So that means you have to write an extra WAL record for EVERY heap insert, update, or delete to a catalog table? OUCH. Yes. We could integrate it into the main record without too many problems, but it didn't seem like an important optimization and it would have higher chances of slowing down wal_level logical. Hmm. I don't know whether that's an important optimization or not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On 2013-10-15 21:41:18 +0900, MauMau wrote: Likewise, non-zero max_prepared_transactons would improve the impression of PostgreSQL (for limited number of users, though), and it wouldn't do any harm. I've seen several sites shutting down because of forgotten prepared transactions causing bloat and anti-wraparound shutdowns. A big, big -1 for changing that default. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
From: Dimitri Fontaine dimi...@2ndquadrant.fr The reason why that parameter default has changed from 5 to 0 is that some people would mistakenly use a prepared transaction without a transaction manager. Few only people are actually using a transaction manager that it's better to have them have to set PostgreSQL. I guess this problem is not unique to PostgreSQL. I think PostgreSQL can be more friendly for normal users (who use external transaction manager), and does not need to be too conservative because of people who do irregular things. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On Mon, Oct 14, 2013 at 5:07 PM, Andres Freund and...@2ndquadrant.com wrote: So, see the attatched benchmark skript. I've always done using a disk bound and a memory bound (using eatmydata, preventing fsyncs) run. * unpatched run, wal_level = hot_standby, eatmydata * unpatched run, wal_level = hot_standby * patched run, wal_level = hot_standby, eatmydata * patched run, wal_level = hot_standby * patched run, wal_level = logical, eatmydata * patched run, wal_level = logical Based on those results, there's no difference above noise for wal_level=hot_standby, with or without the patch. With wal_level=logical there's a measurable increase in wal traffic (~12-17%), but no performance decrease above noise. From my POV that's ok, those are really crazy catalog workloads. Any increase in WAL traffic will translate into a performance hit once the I/O channel becomes saturated, but I agree those numbers don't sound terrible for that faily-brutal test case. Actually, I was more concerned about the hit on non-catalog workloads. pgbench isn't a good test because the key column is so narrow; but suppose we have a table like (a text, b integer, c text) where (a, c) is the primary key and those strings are typically pretty long - say just short enough that we can still index the column. It'd be worth testing both workloads where the primary key doesn't change (so the only overhead is figuring out that we need not log it) and those where it does (where we're double-logging most of the tuple). I assume the latter has to produce a significant hit to WAL volume, and I don't think there's much we can do about that; but the former had better be nearly free. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Tue, Oct 15, 2013 at 2:47 PM, MauMau maumau...@gmail.com wrote: From: Dimitri Fontaine dimi...@2ndquadrant.fr The reason why that parameter default has changed from 5 to 0 is that some people would mistakenly use a prepared transaction without a transaction manager. Few only people are actually using a transaction manager that it's better to have them have to set PostgreSQL. I guess this problem is not unique to PostgreSQL. I think PostgreSQL can be more friendly for normal users (who use external transaction manager), and does not need to be too conservative because of people who do irregular things. I would say *using* an external transaction manager *is* the irregular thing. The current default *is* friendly for normal users, for example see the comments from Andres about what happens if you make a mistake. So I definitely agree with your sentiment that we should be more friendly for normal users - but in this case we are. If I look through all the customers I've worked with, only a handful have actually used a transaction manager. And of those, at least half of them were using it even though they didn't need it, because they didn't know what it was. But the argument about being friendly for new users should definitely have us change wal_level and max_wal_senders. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows
On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, that sucks. So it's a Windows bug. It's not clear to me that we should do anything about this at all, except perhaps document that people should avoid long tablespace path names on an unknown set of Windows versions. We should not be in the business of working around any and every bug coming out of Redmond. It's sort of incomprehensible to me that Microsoft has a bug like this and apparently hasn't fixed it. But I think I still favor trying to work around it. When people try to use a long data directory name and it freezes the system, some of them will blame us rather than Microsoft. We've certainly gone to considerable lengths to work around extremely strange bugs in various compiler toolchains, even relatively obscure ones. I don't particularly see why we shouldn't do the same here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows
On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, that sucks. So it's a Windows bug. It's not clear to me that we should do anything about this at all, except perhaps document that people should avoid long tablespace path names on an unknown set of Windows versions. We should not be in the business of working around any and every bug coming out of Redmond. It's sort of incomprehensible to me that Microsoft has a bug like this and apparently hasn't fixed it. But I think I still favor trying to work around it. When people try to use a long data directory name and it freezes the system, some of them will blame us rather than Microsoft. We've certainly gone to considerable lengths to work around extremely strange bugs in various compiler toolchains, even relatively obscure ones. I don't particularly see why we shouldn't do the same here. I agree we'll probably want to work around it in the end, but I still think it should be put to Microsoft PSS if we can. The usual - have we actually produced a self-contained example that does just this (and doesn't include the full postgres support) and submitted it to *microsoft* for comments? Not talking about their end user forums, but the actual microsoft support services? (AFAIK at least EDB, and probably other pg companies as well, have agreements with MS that lets you get access to their real support. I know I used to have it at my last job, and used it a number of times during the initial porting work. The people backing that one are generally pretty good) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Tue, Oct 15, 2013 at 03:11:22PM +0900, KONDO Mitsumasa wrote: (2013/10/15 13:33), Amit Kapila wrote: Snappy is good mainly for un-compressible data, see the link below: http://www.postgresql.org/message-id/CAAZKuFZCOCHsswQM60ioDO_hk12tA7OG3YcJA8v=4yebmoa...@mail.gmail.com This result was gotten in ARM architecture, it is not general CPU. Please see detail document. http://www.reddit.com/r/programming/comments/1aim6s/lz4_extremely_fast_compression_algorithm/c8y0ew9 I found compression algorithm test in HBase. I don't read detail, but it indicates snnapy algorithm gets best performance. http://blog.erdemagaoglu.com/post/4605524309/lzo-vs-snappy-vs-lzf-vs-zlib-a-comparison-of In fact, most of modern NoSQL storages use snappy. Because it has good performance and good licence(BSD license). I think it is bit difficult to prove that any one algorithm is best for all kind of loads. I think it is necessary to make best efforts in community than I do the best choice with strict test. Regards, -- Mitsumasa KONDO NTT Open Source Software Center Google's lz4 is also a very nice algorithm with 33% better compression performance than snappy and 2X the decompression performance in some benchmarks also with a bsd license: https://code.google.com/p/lz4/ Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On 2013-10-15 08:42:20 -0400, Robert Haas wrote: On Mon, Oct 14, 2013 at 9:51 AM, Andres Freund and...@2ndquadrant.com wrote: Well, I just think relying on specific symbol names in the .so file is kind of unfortunate. It means that, for example, you can't have multiple output plugins provided by a single .so. And in general I think it's something that we've tried to minimize. But that's not really different when you rely on _PG_init doing it's thing, right? Sure, that's true. But in general I think magic symbol names aren't a particularly good design. It allows you to use the shared libary both as a normal extension loaded via shared_preload_library or adhoc and as an output plugin which seems like a sensible goal. We could have a single _PG_init_output_plugin() symbol that fills in such a struct which would then not conflict with using the .so independently. If you prefer that I'll change things around. We can't do something like 'output_plugin_in_progress' before calling _PG_init() because _PG_init() won't be called again if the shared object is already loaded... But there's only so much information available here. Why not just have a format that logs it all? Because we do not know what all is? Also, how would we handle replication sets and such that all of the existing replication solutions have generically? I don't see how you can fail to know what all is. There's only a certain set of facts available. I mean you could log irrelevant crap like a random number that you just picked or the sum of all numeric values in the column, but nobody's likely to want that. What people are going to want is the operation performed (insert, update, or delete), all the values in the new tuple, the key values from the old tuple, the transaction ID, and maybe some meta-information about the transaction (such as the commit timestamp). Some will want all column names included because that makes replication into different schemas/databases easier, others won't because it makes replicating the data more complicated and expensive. Lots will want the primary key as a separate set of columns even for inserts, others not. There's also datatypes of values and null representation. What I'd probably do is emit the data in CSV format, with the first column of each line being a single character indicating what sort of row this is: H means a header row, defining the format of subsequent rows (H,table_name,new_column1,...,new_columnj,old_key_column1,...,old_key_columnk; a new header row is emitted only when the column list changes); I, U, or D means an insert, update, or delete, with column 2 being the transaction ID, column 3 being the table name, and the remaining columns matching the last header row for emitted for that table, T means meta-information about a transaction, whatever we have (e.g. T,txn_id,commit_time). There's two issues I have with this: a) CSV seems like a bad format for this. If a transaction inserts into multiple tables the number of columns will constantly change. Many CSV parsers don't deal with that all too gracefully. E.g. you can't even load the data into another postgres database as an audit log. If we go for CSV I think we should put the entire primary key as one column (containing all the columns) and the entire row another. We also don't have any nice facilities for actually writing CSV - so we'll need to start extracting escaping code from COPY. In the end all that will make the output plugin very hard to use as an example because the code will get more complicated. b) Emitting new row descriptors everytime the schema changes will require keeping track of the schema. I think that won't be trivial. It also makes consumption of the data more complicated in comparison to including the description with every row. Both are even more true once we extend the format to support streaming of transactions while they are performed. But regular people don't want to write C code; they just want to write a crappy Perl script. And I think we can facilitate that without too much work. I think the generic output plugin should be a separate one from the example one (which is the one included in the patchset). Oh, yuck. So that means you have to write an extra WAL record for EVERY heap insert, update, or delete to a catalog table? OUCH. Yes. We could integrate it into the main record without too many problems, but it didn't seem like an important optimization and it would have higher chances of slowing down wal_level logical. Hmm. I don't know whether that's an important optimization or not. Based on the benchmark I'd say no. If we discover we need to go there we can do so later. I don't forsee this to be really problematic. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list
Re: [HACKERS] logical changeset generation v6.2
On 2013-10-15 08:49:26 -0400, Robert Haas wrote: On Mon, Oct 14, 2013 at 5:07 PM, Andres Freund and...@2ndquadrant.com wrote: So, see the attatched benchmark skript. I've always done using a disk bound and a memory bound (using eatmydata, preventing fsyncs) run. * unpatched run, wal_level = hot_standby, eatmydata * unpatched run, wal_level = hot_standby * patched run, wal_level = hot_standby, eatmydata * patched run, wal_level = hot_standby * patched run, wal_level = logical, eatmydata * patched run, wal_level = logical Based on those results, there's no difference above noise for wal_level=hot_standby, with or without the patch. With wal_level=logical there's a measurable increase in wal traffic (~12-17%), but no performance decrease above noise. From my POV that's ok, those are really crazy catalog workloads. Any increase in WAL traffic will translate into a performance hit once the I/O channel becomes saturated, but I agree those numbers don't sound terrible for that faily-brutal test case. Well, the parallel workloads were fsync saturated although probably not throughput, that's why I added them. But yes, it's not the same as a throughput saturated IO channel. Probably the worst case real-world workload is one that uses lots and lots of ON COMMIT DROP temporary tables. Actually, I was more concerned about the hit on non-catalog workloads. pgbench isn't a good test because the key column is so narrow; but suppose we have a table like (a text, b integer, c text) where (a, c) is the primary key and those strings are typically pretty long - say just short enough that we can still index the column. It'd be worth testing both workloads where the primary key doesn't change (so the only overhead is figuring out that we need not log it) and those where it does (where we're double-logging most of the tuple). I assume the latter has to produce a significant hit to WAL volume, and I don't think there's much we can do about that; but the former had better be nearly free. Ah, ok. Then I misunderstood you. Is there a specific overhead you are afraid of in the pkey-doesn't-change scenario? The changed wal logging (buffer in a separate rdata entry) or the check whether the primary key has changed? The only way I have been able to measure differences in that scenario was to load a table with a low fillfactor and wide tuples, checkpoint, and then update lots of rows. On wal_level=logical that will result in full-page-images and tuple data being logged which can be noticeable if you have really large tuples, even if the pkey doesn't change. We could optimize that by not actually logging the tuple data in that case but just include the tid so we could extract things from the Bkp block ourselves. But that will complicate the code and doesn't yet seem warranted. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Triggers on foreign tables
On Mon, Oct 14, 2013 at 5:24 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: And, I also want some comments from committers, not only from mine. +1 +1 /me pokes head up. I know I'm going to annoy people with this comment, but I feel like it's going to have to be made at some point by somebody, so here goes: I don't see the point of this feature. If you want a trigger on a table, why not set it on the remote side? A trigger on the foreign table won't be enforced consistently; it'll only work when the update is routed through the foreign table, not when people access the underlying table on the remote side through any other mechanism. The number of useful things you can do this way seems fairly small. Perhaps you could use a row-level trigger for RLS, to allow only certain rows on the foreign side to be updated, but even that seems like a slightly strange design: generally it'd be better to enforce the security as close to the target object as possible. There's another issue that concerns me here also: performance. IIUC, an update of N tuples on the remote side currently requires N+1 server round-trips. That is unspeakably awful, and we really oughta be looking for ways to make that number go down, by pushing the whole update to the remote side. But obviously that won't be possible if there's a per-row trigger that has to be evaluated on the local side. Now, assuming somebody comes up with code that implements that optimization, we can just disable it when there are local-side triggers. But, then you're back to having terrible performance. So even if the use case for this seemed really broad, I tend to think performance concerns would sink most of the possible real-world uses. I could, of course, be all wet -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On 2013-10-15 15:17:58 +0200, Andres Freund wrote: If we go for CSV I think we should put the entire primary key as one column (containing all the columns) and the entire row another. What about columns like: * action B|I|U|D|C * xid * timestamp * tablename * key name * key column names * key column types * new key column values * column names * column types * column values * candidate_key_changed? * old key column values And have output plugin options * include-column-types * include-column-names * include-primary-key If something isn't included it's simply left out. What still need to be determined is: * how do we separate and escape multiple values in one CSV column * how do we represent NULLs Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Triggers on foreign tables
Robert, * Robert Haas (robertmh...@gmail.com) wrote: /me pokes head up. I know I'm going to annoy people with this comment, but I feel like it's going to have to be made at some point Perhaps some folks will be annoyed- I'm not annoyed, but I don't really agree. :) by somebody, so here goes: I don't see the point of this feature. If you want a trigger on a table, why not set it on the remote side? A trigger on the foreign table won't be enforced consistently; it'll only work when the update is routed through the foreign table, not when people access the underlying table on the remote side through any other mechanism. The number of useful things you can do this way seems fairly small. Perhaps you could use a row-level trigger for RLS, to allow only certain rows on the foreign side to be updated, but even that seems like a slightly strange design: generally it'd be better to enforce the security as close to the target object as possible. I can certainly see use-cases for this, a very simple one being a way to keep track of what's been updated/inserted/whatever through this particular foreign table (essentially, an auditing table). The *remote* side might not be ideal for tracking that information and you might want the info locally and remotely anyway. There's another issue that concerns me here also: performance. IIUC, an update of N tuples on the remote side currently requires N+1 server round-trips. That is unspeakably awful, and we really oughta be looking for ways to make that number go down, by pushing the whole update to the remote side. But obviously that won't be possible if there's a per-row trigger that has to be evaluated on the local side. Now, assuming somebody comes up with code that implements that optimization, we can just disable it when there are local-side triggers. But, then you're back to having terrible performance. So even if the use case for this seemed really broad, I tend to think performance concerns would sink most of the possible real-world uses. Performance, while a concern, should probably be secondary when there are valid use-cases for this where the performance wouldn't be a problem for users. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] logical changeset generation v6.2
On Tue, Oct 15, 2013 at 9:17 AM, Andres Freund and...@2ndquadrant.com wrote: It allows you to use the shared libary both as a normal extension loaded via shared_preload_library or adhoc and as an output plugin which seems like a sensible goal. We could have a single _PG_init_output_plugin() symbol that fills in such a struct which would then not conflict with using the .so independently. If you prefer that I'll change things around. I think part of the problem may be that you're using the library name to identify the output plugin. I'm not excited about that design. For functions, you give the function a name and that is a pointer to where to actually find the function, which may be a 2-tuple library-name, function-name, or perhaps just a 1-tuple builtin-function-name, or maybe the whole text of a PL/pgsql procedure that should be compiled. Perhaps this ought to work similarly. Create a function in pg_proc which returns the structure containing the function pointers. Then, when that output plugin is selected, it'll automatically trigger loading the correct shared library if that's needed; and the shared library name may (but need not) match the output plugin name. What I'd probably do is emit the data in CSV format, with the first column of each line being a single character indicating what sort of row this is: H means a header row, defining the format of subsequent rows (H,table_name,new_column1,...,new_columnj,old_key_column1,...,old_key_columnk; a new header row is emitted only when the column list changes); I, U, or D means an insert, update, or delete, with column 2 being the transaction ID, column 3 being the table name, and the remaining columns matching the last header row for emitted for that table, T means meta-information about a transaction, whatever we have (e.g. T,txn_id,commit_time). There's two issues I have with this: a) CSV seems like a bad format for this. If a transaction inserts into multiple tables the number of columns will constantly change. Many CSV parsers don't deal with that all too gracefully. E.g. you can't even load the data into another postgres database as an audit log. We can pick some other separator. I don't think ragged CSV is a big problem; I'm actually more worried about having an easy way to handle embedded commas and newlines and so on. But I'd be fine with tab-separated data or something too, if you think that's better. What I want is something that someone can parse with a script that can be written in a reasonable amount of time in their favorite scripting language. I predict that if we provide something like this we'll vastly expand the number of users who can make use of this new functionality. User: So, what's new in PostgreSQL 9.4? Hacker: Well, now we have logical replication! User: Why is that cool? Hacker: Well, streaming replication is awesome for HA, but it has significant limitations. And trigger-based systems are very mature, but the overhead is high and their lack of core integration makes them hard to use. With this technology, you can build systems that will replicate individual tables or even parts of tables, multi-master systems, and lots of other cool stuff. User: Wow, that sounds great. How do I use it? Hacker: Well, first you write an output plugin in C using a special API. User: Hey, do you know whether the MongoDB guys came to this conference? Let's try that again. User: Wow, that sounds great. How do I use it? Hacker: Well, currently, the output gets dumped as a series of text files that are designed to be parsed using a scripting language. We have sample parsers written in Perl and Python that you can use as-is or hack up to meet your needs. Now, some users are still going to head for the hills. But at least from where I sit it sounds a hell of a lot better than the first answer. We're not going to solve all of the tooling problems around this technology in one release, for sure. But as far as 95% of our users are concerned, a C API might as well not exist at all. People WILL try to machine parse the output of whatever demo plugins we provide; so I think we should try hard to provide at least one such plugin that is designed to make that as easy as possible. If we go for CSV I think we should put the entire primary key as one column (containing all the columns) and the entire row another. We also don't have any nice facilities for actually writing CSV - so we'll need to start extracting escaping code from COPY. In the end all that will make the output plugin very hard to use as an example because the code will get more complicated. b) Emitting new row descriptors everytime the schema changes will require keeping track of the schema. I think that won't be trivial. It also makes consumption of the data more complicated in comparison to including the description with every row. Both are even more true once we extend the format to support streaming of transactions while they are
Re: [HACKERS] logical changeset generation v6.2
On 10/15/2013 01:42 PM, Robert Haas wrote: On Mon, Oct 14, 2013 at 9:51 AM, Andres Freund and...@2ndquadrant.com wrote: Well, I just think relying on specific symbol names in the .so file is kind of unfortunate. It means that, for example, you can't have multiple output plugins provided by a single .so. And in general I think it's something that we've tried to minimize. But that's not really different when you rely on _PG_init doing it's thing, right? Sure, that's true. But in general I think magic symbol names aren't a particularly good design. But there's only so much information available here. Why not just have a format that logs it all? Because we do not know what all is? Also, how would we handle replication sets and such that all of the existing replication solutions have generically? I don't see how you can fail to know what all is. We instinctively know what all is - as in the famous case of buddhist ordering a hamburger - Make me All wit Everything :) - but the requirements of different replications systems vary wildly. ... What people are going to want is the operation performed (insert, update, or delete), all the values in the new tuple, the key values from the old tuple, For multi-master / conflict resolution you may also want all old values to make sure that they have not changed on target. the difference in WAL volume can be really significant, especially in the case of DELETE, where there are no new columns. for some forms of conflict resolution we may even want to know the database user who initiated the operation. and possibly even some session variables like very_important=yes. ... The point, for me anyway, is that someone can write a crappy Perl script to apply changes from a file like this in a day. My contention is that there are a lot of people who will want to do just that, for one reason or another. The plugin interface has awesome power and flexibility, and really high-performance replication solutions will really benefit from that. But regular people don't want to write C code; they just want to write a crappy Perl script. And I think we can facilitate that without too much work. just provide a to-csv or to-json plugin and the crappy perl guys are happy. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On 10/15/2013 02:47 PM, Andres Freund wrote: On 2013-10-15 15:17:58 +0200, Andres Freund wrote: If we go for CSV I think we should put the entire primary key as one column (containing all the columns) and the entire row another. just use JSON :) What about columns like: * action B|I|U|D|C * xid * timestamp * tablename * key name * key column names * key column types * new key column values * column names * column types * column values * candidate_key_changed? * old key column values And have output plugin options * include-column-types * include-column-names * include-primary-key If something isn't included it's simply left out. What still need to be determined is: * how do we separate and escape multiple values in one CSV column * how do we represent NULLs or borrow whatever possible from pg_dump as they have needed to solve most of the same problems already and consistency is good in general Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On Tue, Oct 15, 2013 at 9:47 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-10-15 15:17:58 +0200, Andres Freund wrote: If we go for CSV I think we should put the entire primary key as one column (containing all the columns) and the entire row another. What about columns like: * action B|I|U|D|C BEGIN and COMMIT? * xid * timestamp * tablename * key name * key column names * key column types * new key column values * column names * column types * column values * candidate_key_changed? * old key column values Repeating the column names for every row strikes me as a nonstarter. If the plugin interface isn't rich enough to provide a convenient way to avoid that, then it needs to be fixed so that it is, because it will be a common requirement. Sure, some people may want JSON or XML output that reiterates the labels every time, but for a lot of people that's going to greatly increase the size of the output and be undesirable for that reason. What still need to be determined is: * how do we separate and escape multiple values in one CSV column * how do we represent NULLs I consider the escaping a key design decision. Ideally, it should be something that's easy to reverse from a scripting language; ideally also, it should be something similar to how we handle COPY. These goals may be in conflict; we'll have to pick something. I'm not sure that having multiple values in one column is a good plan, because now you need multiple levels of parsing to unpack the row. I'd rather just have a flat column list with a key somewhere explaining how to interpret the data. But I'm prepared to give in on that point so long as we can demonstrate that the format can be easily parsed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On Tue, Oct 15, 2013 at 10:09 AM, Hannu Krosing ha...@krosing.net wrote: I don't see how you can fail to know what all is. We instinctively know what all is - as in the famous case of buddhist ordering a hamburger - Make me All wit Everything :) - but the requirements of different replications systems vary wildly. That's true to some degree, but let's not exaggerate the degree to which it is true. For multi-master / conflict resolution you may also want all old values to make sure that they have not changed on target. The patch as proposed doesn't make that information available. If you want that to be an option, now would be the right time to argue for it. for some forms of conflict resolution we may even want to know the database user who initiated the operation. and possibly even some session variables like very_important=yes. Well, if you have requirements like logging very_important=yes, then you're definitely into the territory where you need your own output plugin. I have no problem telling people who want that sort of thing that they've got to go write C code. What I'm trying to do, as Larry Wall once said, is to make simple things simple and hard things possible. The output plugin interface accomplishes the latter, but, by itself, not the former. And I think we can facilitate that without too much work. just provide a to-csv or to-json plugin and the crappy perl guys are happy. Yep, that's exactly what I'm advocating for. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On 2013-10-15 10:20:55 -0400, Robert Haas wrote: For multi-master / conflict resolution you may also want all old values to make sure that they have not changed on target. The patch as proposed doesn't make that information available. If you want that to be an option, now would be the right time to argue for it. I don't think you necessarily want it for most MM solutions, but I agree it will be useful for some scenarios. I think the ReorderBufferChange struct needs a better way to distinguish between old-key and old-tuple now, but I'd rather implement the facililty for logging the full old tuple in a separate patch. The patchset is big enough as is, lets not tack on more features. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On 2013-10-15 10:09:05 -0400, Robert Haas wrote: On Tue, Oct 15, 2013 at 9:17 AM, Andres Freund and...@2ndquadrant.com wrote: It allows you to use the shared libary both as a normal extension loaded via shared_preload_library or adhoc and as an output plugin which seems like a sensible goal. We could have a single _PG_init_output_plugin() symbol that fills in such a struct which would then not conflict with using the .so independently. If you prefer that I'll change things around. I think part of the problem may be that you're using the library name to identify the output plugin. I'm not excited about that design. For functions, you give the function a name and that is a pointer to where to actually find the function, which may be a 2-tuple library-name, function-name, or perhaps just a 1-tuple builtin-function-name, or maybe the whole text of a PL/pgsql procedure that should be compiled. That means you allow trivial remote code execution since you could try to load system() or something else that's available in every shared object. Now you can argue that that's OK since we have special checks for replication connections, but I'd rather not go there. Perhaps this ought to work similarly. Create a function in pg_proc which returns the structure containing the function pointers. Then, when that output plugin is selected, it'll automatically trigger loading the correct shared library if that's needed; and the shared library name may (but need not) match the output plugin name. I'd like to avoid relying on inserting stuff into pg_proc because that makes it harder to extract WAL from a HS standby. Requiring to configure that on the primary to extract data on the standby seems confusing to me. But perhaps that's the correct solution :/ Now, some users are still going to head for the hills. But at least from where I sit it sounds a hell of a lot better than the first answer. We're not going to solve all of the tooling problems around this technology in one release, for sure. But as far as 95% of our users are concerned, a C API might as well not exist at all. People WILL try to machine parse the output of whatever demo plugins we provide; so I think we should try hard to provide at least one such plugin that is designed to make that as easy as possible. Well, just providing the C API + an example in a first step didn't work out too badly for FDWs. I am pretty sure that once released there will soon be extensions for it on PGXN or whatever for special usecases. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for reserved connections for replication users
On Tue, Oct 15, 2013 at 12:13 AM, Amit Kapila amit.kapil...@gmail.com wrote: If we think this way, then may be we should have max_user_connections instead of max_connections and then max_wal_connections. But still there are other's like pg_basebackup who needs connections and tomorrow there can be new such entities which need connection. Also we might need to have different infrastructure in code to make these options available to users. I think having different parameters to configure maximum connections for different entities can complicate both code as well as user's job. Renaming max_connections is far too big a compatibility break to consider without far more benefit than what this patch is aiming at. I'm not prepared to endure the number of beatings I'd have to take if we did that. But I also agree that making max_wal_senders act as both a minimum and a maximum is no good. +1 to everything Josh Berkus said. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for reserved connections for replication users
On 2013-10-15 10:29:58 -0400, Robert Haas wrote: On Tue, Oct 15, 2013 at 12:13 AM, Amit Kapila amit.kapil...@gmail.com wrote: If we think this way, then may be we should have max_user_connections instead of max_connections and then max_wal_connections. But still there are other's like pg_basebackup who needs connections and tomorrow there can be new such entities which need connection. Also we might need to have different infrastructure in code to make these options available to users. I think having different parameters to configure maximum connections for different entities can complicate both code as well as user's job. Renaming max_connections is far too big a compatibility break to consider without far more benefit than what this patch is aiming at. I'm not prepared to endure the number of beatings I'd have to take if we did that. +many But I also agree that making max_wal_senders act as both a minimum and a maximum is no good. +1 to everything Josh Berkus said. Josh said we should treat replication connections in a separate pool from normal database connections, right? So you withdraw your earlier objection to that? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On Tue, Oct 15, 2013 at 10:27 AM, Andres Freund and...@2ndquadrant.com wrote: I think part of the problem may be that you're using the library name to identify the output plugin. I'm not excited about that design. For functions, you give the function a name and that is a pointer to where to actually find the function, which may be a 2-tuple library-name, function-name, or perhaps just a 1-tuple builtin-function-name, or maybe the whole text of a PL/pgsql procedure that should be compiled. That means you allow trivial remote code execution since you could try to load system() or something else that's available in every shared object. Now you can argue that that's OK since we have special checks for replication connections, but I'd rather not go there. Well, obviously you can't let somebody load any library they want. But that's pretty much true anyway; LOAD had better be confined to superusers unless there is something (like a pg_proc entry) that provides prior authorization for that specific load. Perhaps this ought to work similarly. Create a function in pg_proc which returns the structure containing the function pointers. Then, when that output plugin is selected, it'll automatically trigger loading the correct shared library if that's needed; and the shared library name may (but need not) match the output plugin name. I'd like to avoid relying on inserting stuff into pg_proc because that makes it harder to extract WAL from a HS standby. Requiring to configure that on the primary to extract data on the standby seems confusing to me. But perhaps that's the correct solution :/ That's a reasonable concern. I don't have another idea at the moment, unless we want to allow replication connections to issue LOAD commands. Then you can LOAD the library, so that the plug-in is registered under the well-known name you expect it to have, and then use that name to start replication. Now, some users are still going to head for the hills. But at least from where I sit it sounds a hell of a lot better than the first answer. We're not going to solve all of the tooling problems around this technology in one release, for sure. But as far as 95% of our users are concerned, a C API might as well not exist at all. People WILL try to machine parse the output of whatever demo plugins we provide; so I think we should try hard to provide at least one such plugin that is designed to make that as easy as possible. Well, just providing the C API + an example in a first step didn't work out too badly for FDWs. I am pretty sure that once released there will soon be extensions for it on PGXN or whatever for special usecases. I suspect so, too. But I also think that if that's the only thing available in the first release, a lot of users will get a poor initial impression. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Release note fix for timeline item
On Tue, Oct 15, 2013 at 02:32:47PM +0900, KONDO Mitsumasa wrote: Sorry for my reply late... (2013/10/08 23:26), Bruce Momjian wrote: First, I want to apologize for not completing the release notes earlier so that others could review them. I started working on the release notes on Friday, but my unfamiliarity with the process and fear of making a mistake caused many delays. I have improved the documentation on the process which will hopefully help next time. There isn't anything in particular that I was dissatisfied about it. You are right that there is alot of details skipped in the release note text. I have developed the attached patch which I think does a better job. Is it OK? Yes, off course! Thanks for your sincere action! Thanks, patch applied back through 9.1. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for reserved connections for replication users
On Tue, Oct 15, 2013 at 10:34 AM, Andres Freund and...@2ndquadrant.com wrote: But I also agree that making max_wal_senders act as both a minimum and a maximum is no good. +1 to everything Josh Berkus said. Josh said we should treat replication connections in a separate pool from normal database connections, right? So you withdraw your earlier objection to that? I don't think that's what he said. Here's what I was referring to: $ Changing max_wal_senders requires a restart. As such, we currently $ advise users to set the setting generously: as many replication $ connections as you think you'll ever need, plus two. If $ max_wal_senders is a reservation which could cause the user to run out $ of other connections sooner than expected, then the user is faced with a $ new hard to set parameter: they don't want to set it too high *or* too $ low. $ $ This would result in a lot of user frustration as they try to get thier $ connection configuration right and have to restart the server multiple $ times. I find few new features worth making it *harder* to configure $ PostgreSQL, and reserved replication connections certainly don't qualify. $ $ If it's worth having reserved replication connections (and I can see $ some reasons to want it), then we need a new GUC for this: $ reserved_walsender_connections -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for reserved connections for replication users
On 2013-10-15 10:36:41 -0400, Robert Haas wrote: On Tue, Oct 15, 2013 at 10:34 AM, Andres Freund and...@2ndquadrant.com wrote: But I also agree that making max_wal_senders act as both a minimum and a maximum is no good. +1 to everything Josh Berkus said. Josh said we should treat replication connections in a separate pool from normal database connections, right? So you withdraw your earlier objection to that? I don't think that's what he said. Here's what I was referring to: $ Changing max_wal_senders requires a restart. As such, we currently $ advise users to set the setting generously: as many replication $ connections as you think you'll ever need, plus two. If $ max_wal_senders is a reservation which could cause the user to run out $ of other connections sooner than expected, then the user is faced with a $ new hard to set parameter: they don't want to set it too high *or* too $ low. $ $ This would result in a lot of user frustration as they try to get thier $ connection configuration right and have to restart the server multiple $ times. I find few new features worth making it *harder* to configure $ PostgreSQL, and reserved replication connections certainly don't qualify. $ $ If it's worth having reserved replication connections (and I can see $ some reasons to want it), then we need a new GUC for this: $ reserved_walsender_connections I am referring to http://archives.postgresql.org/message-id/525C31D3.3010006%40agliodbs.com : On 10/14/2013 10:51 AM, Andres Freund wrote: Imo the complications around this prove my (way earlier) point that it'd be much better to treat replication connections as something entirely different to normal SQL connections. There's really not much overlap here and while there's some philosophical point to be made about it all being connections, from a practical POV treating them separately seems better. Given that replication connections don't even appear in pg_stat_activity now, I'd agree with you. I still fail to see what the point is in treating those classes of connections together. It only serves to confuse users and makes considerations way much more complicated. There's no need for reserved replication connections or anything like it if would separate the pools. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doc Patch: Subquery section to say that subqueries can't modify data
On 08/06/2013 11:03 PM, Karl O. Pinc wrote: The attached documentation patch, doc-subqueries-v1.patch, applies against head. I wanted to document that subqueries can't modify data. This is mentioned in the documentation for SELECT and implied elsewhere but I was looking for something more than an 'in-passing' mention. (I wrote a bad query, modifying data in a subquery, couldn't recall where it was documented that you can't do this, and couldn't find the answer from the TOC or the index. Now that there's lots of statements with RETURNING clauses it's natural to want to use them in subqueries.) Hello, I am (finally) reviewing this patch. After reading your reasoning, David's rebuttal, and the patch itself; I'm wondering if this is needed or wanted at all. Supposing it is wanted, it creates more questions than it answers. The two biggies are: * In what other contexts can tabular subqueries be used? * What are other ways of integrating data returned by data modification statements? On a superficial level I find the number of commas a bit clunky, and parentheses is misspelled. The last 2 sentences of the first paragraph are something in the way of helpful hints and may not be appropriate, or even accurate. I've left them in for review. I think the last sentence (of the first paragraph) is a bit much, but the penultimate seems fine. I'm attaching an updated patch that I think is an improvement but it's still at a draft level and needs more copyediting. This new patch does not attempt to answer the two questions above. -- Vik diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml index c32c857..b134b66 100644 --- a/doc/src/sgml/queries.sgml +++ b/doc/src/sgml/queries.sgml @@ -549,7 +549,7 @@ SELECT * FROM my_table AS m WHERE my_table.a gt; 5;-- wrong SELECT * FROM people AS mother JOIN people AS child ON mother.id = child.mother_id; /programlisting Additionally, an alias is required if the table reference is a - subquery (see xref linkend=queries-subqueries). + subquery (see xref linkend=queries-subquery-derived-tables). /para para @@ -590,10 +590,10 @@ SELECT a.* FROM (my_table AS a JOIN your_table AS b ON ...) AS c /para /sect3 - sect3 id=queries-subqueries -titleSubqueries/title + sect3 id=queries-subquery-derived-tables +titleSubquery Derived Tables/title -indexterm zone=queries-subqueries +indexterm zone=queries-subquery-derived-tables primarysubquery/primary /indexterm @@ -1315,6 +1315,44 @@ SELECT DISTINCT ON (replaceableexpression/replaceable optional, replaceab /sect1 + sect1 id=queries-subqueries + titleSubqueries/title + + indexterm zone=queries-subqueries + primarysubquery/primary + /indexterm + + indexterm zone=queries-subqueries + primarysub-select/primary + /indexterm + + para + Subqueries, also called sub-selects, are queries written within + parentheses in the text of larger queries. The values produced by + subqueries may be scalar, or tabular. Scalar subqueries are used within expressions as described + in xref linkend=sql-syntax-scalar-subqueries. + Tabular subqueries may substitute for tables as described + in xref linkend=queries-subquery-derived-tables, generate array + content as described + in xref linkend=sql-syntax-array-constructors, have their + result content tested within expressions as described + in xref linkend=functions-subquery, or be used in other + contexts. Often either joins or subqueries can be used to produce + different query plans yielding identical output. + /para + + para + Subqueries may not modify database + content. link linkend=queries-withCommon Table + Expressions/link are one way to integrate data returned by data + modification statements, + i.e. commandINSERT/command/commandUPDATE/command/commandDELETE/command + statements with literalRETURNING/literal clauses, into larger + queries. + /para + /sect1 + + sect1 id=queries-union titleCombining Queries/title -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On 2013-10-15 10:15:14 -0400, Robert Haas wrote: On Tue, Oct 15, 2013 at 9:47 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-10-15 15:17:58 +0200, Andres Freund wrote: If we go for CSV I think we should put the entire primary key as one column (containing all the columns) and the entire row another. What about columns like: * action B|I|U|D|C BEGIN and COMMIT? That's B and C, yes. You'd rather not have them? When would you replay the commit without an explicit message telling you to? Repeating the column names for every row strikes me as a nonstarter. [...] Sure, some people may want JSON or XML output that reiterates the labels every time, but for a lot of people that's going to greatly increase the size of the output and be undesirable for that reason. But I argue that most simpler users - which are exactly the ones a generic output plugin is aimed at - will want all column names since it makes replay far easier. If the plugin interface isn't rich enough to provide a convenient way to avoid that, then it needs to be fixed so that it is, because it will be a common requirement. Oh, it surely is possibly to avoid repeating it. The output plugin interface simply gives you a relcache entry, that contains everything necessary. The output plugin would need to keep track of whether it has output data for a specific relation and it would need to check whether the table definition has changed, but I don't see how we could avoid that? What still need to be determined is: * how do we separate and escape multiple values in one CSV column * how do we represent NULLs I consider the escaping a key design decision. Ideally, it should be something that's easy to reverse from a scripting language; ideally also, it should be something similar to how we handle COPY. These goals may be in conflict; we'll have to pick something. Note that parsing COPYs is a major PITA from most languages... Perhaps we should make the default output json instead? With every action terminated by a nullbyte? That's probably easier to parse from various scripting languages than anything else. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On 2013-10-15 10:34:53 -0400, Robert Haas wrote: On Tue, Oct 15, 2013 at 10:27 AM, Andres Freund and...@2ndquadrant.com wrote: I think part of the problem may be that you're using the library name to identify the output plugin. I'm not excited about that design. For functions, you give the function a name and that is a pointer to where to actually find the function, which may be a 2-tuple library-name, function-name, or perhaps just a 1-tuple builtin-function-name, or maybe the whole text of a PL/pgsql procedure that should be compiled. That means you allow trivial remote code execution since you could try to load system() or something else that's available in every shared object. Now you can argue that that's OK since we have special checks for replication connections, but I'd rather not go there. Well, obviously you can't let somebody load any library they want. But that's pretty much true anyway; LOAD had better be confined to superusers unless there is something (like a pg_proc entry) that provides prior authorization for that specific load. Currently you can create users that have permissions for replication but which are not superusers. I think we should strive to providing that capability for changeset extraction as well. Perhaps this ought to work similarly. Create a function in pg_proc which returns the structure containing the function pointers. Then, when that output plugin is selected, it'll automatically trigger loading the correct shared library if that's needed; and the shared library name may (but need not) match the output plugin name. I'd like to avoid relying on inserting stuff into pg_proc because that makes it harder to extract WAL from a HS standby. Requiring to configure that on the primary to extract data on the standby seems confusing to me. But perhaps that's the correct solution :/ That's a reasonable concern. I don't have another idea at the moment, unless we want to allow replication connections to issue LOAD commands. Then you can LOAD the library, so that the plug-in is registered under the well-known name you expect it to have, and then use that name to start replication. But what's the advantage of that over the current situation or one where PG_load_output_plugin() is called? The current and related implementations allow you to only load libraries in some designated postgres directories and it doesn't allow you to call any arbitrary functions in there. Would you be content with a symbol PG_load_output_plugin being called that fills out the actual callbacks? Now, some users are still going to head for the hills. But at least from where I sit it sounds a hell of a lot better than the first answer. We're not going to solve all of the tooling problems around this technology in one release, for sure. But as far as 95% of our users are concerned, a C API might as well not exist at all. People WILL try to machine parse the output of whatever demo plugins we provide; so I think we should try hard to provide at least one such plugin that is designed to make that as easy as possible. Well, just providing the C API + an example in a first step didn't work out too badly for FDWs. I am pretty sure that once released there will soon be extensions for it on PGXN or whatever for special usecases. I suspect so, too. But I also think that if that's the only thing available in the first release, a lot of users will get a poor initial impression. I think lots of people will expect a builtin logical replication solution :/. Which seems a tad unlikely to arrive in 9.4. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Tue, Oct 15, 2013 at 5:15 AM, Robert Haas robertmh...@gmail.com wrote: Well, the SQL standard way of doing this type of operation is MERGE. The alternative we know exists in other databases is REPLACE; there's also INSERT .. ON DUPLICATE KEY update. In all of those cases, whatever weirdness exists around MVCC is confined to that one command. I tend to think we should do similarly, with the goal that HeapTupleSatisfiesMVCC need not change at all. I don't think that it's very pragmatic to define success in terms of not modifying a single visibility function. I feel it would be more useful to define it as providing acceptable, non-surprising semantics, while not regressing performance in other areas. The fact remains that you're going to have a create a new snapshot type even for this special case, so I don't see any win as regards managing invasiveness here. Quite the contrary, in fact. I don't have the only vote here, of course, but my feeling is that that's more likely to be a good route. Naturally we all want MERGE. It seems self-defeating to insist on something significantly harder that there is significant less demand for, though. I thought that there was at least informal agreement that this sort of approach was preferable to MERGE in its full generality, based on feedback at the 2012 developer meeting. I really don't think that what I've done here is any worse than INSERT...ON DUPLICATE KEY UPDATE in any of the areas you express concern about here. REPLACE has some serious problems, and I just don't see it as a viable alternative at all - just ask any MySQL user. MERGE is of course more flexible to what I have here in some ways, but actually less flexible in other ways. I think that the real point of MERGE is that it's defined in a way that serves data warehousing use cases very well: the semantics constrain things such that the executor only has to execute a single ModifyTable node that does inserts, updates and deletes in a single scan. That's great, but what if it's useful to do that CRUD (yes, this can include selects) to entirely different tables? Or what if the relevant DML will only come in a later statement in the same transaction? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On Tue, Oct 15, 2013 at 10:56 AM, Andres Freund and...@2ndquadrant.com wrote: That means you allow trivial remote code execution since you could try to load system() or something else that's available in every shared object. Now you can argue that that's OK since we have special checks for replication connections, but I'd rather not go there. Well, obviously you can't let somebody load any library they want. But that's pretty much true anyway; LOAD had better be confined to superusers unless there is something (like a pg_proc entry) that provides prior authorization for that specific load. Currently you can create users that have permissions for replication but which are not superusers. I think we should strive to providing that capability for changeset extraction as well. I agree. Perhaps this ought to work similarly. Create a function in pg_proc which returns the structure containing the function pointers. Then, when that output plugin is selected, it'll automatically trigger loading the correct shared library if that's needed; and the shared library name may (but need not) match the output plugin name. I'd like to avoid relying on inserting stuff into pg_proc because that makes it harder to extract WAL from a HS standby. Requiring to configure that on the primary to extract data on the standby seems confusing to me. But perhaps that's the correct solution :/ That's a reasonable concern. I don't have another idea at the moment, unless we want to allow replication connections to issue LOAD commands. Then you can LOAD the library, so that the plug-in is registered under the well-known name you expect it to have, and then use that name to start replication. But what's the advantage of that over the current situation or one where PG_load_output_plugin() is called? The current and related implementations allow you to only load libraries in some designated postgres directories and it doesn't allow you to call any arbitrary functions in there. Well, I've already said why I don't like conflating the library name and the plugin name. It rules out core plugins and libraries that provide multiple plugins. I don't have anything to add to that. Would you be content with a symbol PG_load_output_plugin being called that fills out the actual callbacks? Well, it doesn't fix the muddling of library names with output plugin names, but I suppose I'd find it a modest improvement. Well, just providing the C API + an example in a first step didn't work out too badly for FDWs. I am pretty sure that once released there will soon be extensions for it on PGXN or whatever for special usecases. I suspect so, too. But I also think that if that's the only thing available in the first release, a lot of users will get a poor initial impression. I think lots of people will expect a builtin logical replication solution :/. Which seems a tad unlikely to arrive in 9.4. Yep. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On Tue, Oct 15, 2013 at 11:02:39AM -0400, Robert Haas wrote: goals may be in conflict; we'll have to pick something. Note that parsing COPYs is a major PITA from most languages... Perhaps we should make the default output json instead? With every action terminated by a nullbyte? That's probably easier to parse from various scripting languages than anything else. I could go for that. It's not quite as compact as I might hope, but JSON does seem to make people awfully happy. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company Feeding such a JSON stream into a compression algorithm like lz4 or snappy should result in a pretty compact stream. The latest lz4 updates also have ability to use a pre-existing dictionary which would really help remove the redundant pieces. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 2013-10-15 11:11:24 -0400, Robert Haas wrote: I'm not saying go implement MERGE. I'm saying, make the insert-or-update operation a single statement, using some syntax TBD, instead of requiring the use of a new insert statement that makes invisible rows visible as a side effect, so that you can wrap that in a CTE and feed it to an update statement. That's complex and, AFAICS, unlike how any other database product handles this. I think we most definitely should provide a single statement variant. That's the one users yearn for. I also would like a variant where I can lock a row on conflict, for multimaster scenarios, but that doesn't necessarily have to be exposed to SQL. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CF 2013-09 Wrap Up
On Tue, Oct 15, 2013 at 1:39 AM, Noah Misch n...@leadboat.com wrote: On Mon, Oct 14, 2013 at 01:56:42PM -0500, Mike Blackwell wrote: Any patches marked Needs Review will be automatically moved to the next CF. We will try to make sure that all patches in the current CF have received at least one review. The combined effect of those two statements is not clear to me. Does that mean you'll retain never-reviewed patches and automatically move patches that have received at least one review? Yes on the latter part. We will try to get a quick review for not-yet-reviewed patches and move or return them based on the result of that review. If we fail to find a reviewer, the patches will get moved to the next CF. For those following along, here are the patches still needing a first look. They are for the most part performance or internals patches and could use the eye of someone more experienced. Please consider a quick review of one of them if you fit that description. We'd like everyone to get a fair shake here. ^_^ HStore Gin Speeduphttps://commitfest.postgresql.org/action/patch_view?id=1203 Performance Improvement by reducing WAL for Update Operationhttps://commitfest.postgresql.org/action/patch_view?id=1209 [PoC] pgstattuple2: block sampling to reduce physical readhttps://commitfest.postgresql.org/action/patch_view?id=1226 ECPG cursor readaheadhttps://commitfest.postgresql.org/action/patch_view?id=1195
Re: [HACKERS] logical changeset generation v6.2
On 2013-10-15 11:02:39 -0400, Robert Haas wrote: If the plugin interface isn't rich enough to provide a convenient way to avoid that, then it needs to be fixed so that it is, because it will be a common requirement. Oh, it surely is possibly to avoid repeating it. The output plugin interface simply gives you a relcache entry, that contains everything necessary. The output plugin would need to keep track of whether it has output data for a specific relation and it would need to check whether the table definition has changed, but I don't see how we could avoid that? Well, it might be nice if there were a callback for, hey, schema has changed! Seems like a lot of plugins will want to know that for one reason or another, and rechecking for every tuple sounds expensive. I don't really see how we could provide that in any useful manner. We could provide a callback that is called whenever another transaction has changed the schema, but there's nothing easily to be done about schema changes by the replayed transaction itself. And those are the only ones where meaningful schema changes can happen since the locks the source transaction has held will prevent most other schema changes. As much as I hate such code, I guess checking (and possibly storing) the ctid||xmin of the pg_class row is the easiest thing we could do :(. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Tue, Oct 15, 2013 at 8:07 AM, Peter Geoghegan p...@heroku.com wrote: Naturally we all want MERGE. It seems self-defeating to insist on something significantly harder that there is significant less demand for, though. I hasten to add: which is not to imply that you're insisting rather than expressing a sentiment. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On Tue, Oct 15, 2013 at 10:48 AM, Andres Freund and...@2ndquadrant.com wrote: What about columns like: * action B|I|U|D|C BEGIN and COMMIT? That's B and C, yes. You'd rather not have them? When would you replay the commit without an explicit message telling you to? No, BEGIN and COMMIT sounds good, actually. Just wanted to make sure I understood. Repeating the column names for every row strikes me as a nonstarter. [...] Sure, some people may want JSON or XML output that reiterates the labels every time, but for a lot of people that's going to greatly increase the size of the output and be undesirable for that reason. But I argue that most simpler users - which are exactly the ones a generic output plugin is aimed at - will want all column names since it makes replay far easier. Meh, maybe. If the plugin interface isn't rich enough to provide a convenient way to avoid that, then it needs to be fixed so that it is, because it will be a common requirement. Oh, it surely is possibly to avoid repeating it. The output plugin interface simply gives you a relcache entry, that contains everything necessary. The output plugin would need to keep track of whether it has output data for a specific relation and it would need to check whether the table definition has changed, but I don't see how we could avoid that? Well, it might be nice if there were a callback for, hey, schema has changed! Seems like a lot of plugins will want to know that for one reason or another, and rechecking for every tuple sounds expensive. What still need to be determined is: * how do we separate and escape multiple values in one CSV column * how do we represent NULLs I consider the escaping a key design decision. Ideally, it should be something that's easy to reverse from a scripting language; ideally also, it should be something similar to how we handle COPY. These goals may be in conflict; we'll have to pick something. Note that parsing COPYs is a major PITA from most languages... Perhaps we should make the default output json instead? With every action terminated by a nullbyte? That's probably easier to parse from various scripting languages than anything else. I could go for that. It's not quite as compact as I might hope, but JSON does seem to make people awfully happy. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Tue, Oct 15, 2013 at 11:07 AM, Peter Geoghegan p...@heroku.com wrote: On Tue, Oct 15, 2013 at 5:15 AM, Robert Haas robertmh...@gmail.com wrote: Well, the SQL standard way of doing this type of operation is MERGE. The alternative we know exists in other databases is REPLACE; there's also INSERT .. ON DUPLICATE KEY update. In all of those cases, whatever weirdness exists around MVCC is confined to that one command. I tend to think we should do similarly, with the goal that HeapTupleSatisfiesMVCC need not change at all. I don't think that it's very pragmatic to define success in terms of not modifying a single visibility function. I feel it would be more useful to define it as providing acceptable, non-surprising semantics, while not regressing performance in other areas. The fact remains that you're going to have a create a new snapshot type even for this special case, so I don't see any win as regards managing invasiveness here. Quite the contrary, in fact. Well, we might have to agree to disagree. I don't have the only vote here, of course, but my feeling is that that's more likely to be a good route. Naturally we all want MERGE. It seems self-defeating to insist on something significantly harder that there is significant less demand for, though. I thought that there was at least informal agreement that this sort of approach was preferable to MERGE in its full generality, based on feedback at the 2012 developer meeting. I really don't think that what I've done here is any worse than INSERT...ON DUPLICATE KEY UPDATE in any of the areas you express concern about here. REPLACE has some serious problems, and I just don't see it as a viable alternative at all - just ask any MySQL user. MERGE is of course more flexible to what I have here in some ways, but actually less flexible in other ways. I think that the real point of MERGE is that it's defined in a way that serves data warehousing use cases very well: the semantics constrain things such that the executor only has to execute a single ModifyTable node that does inserts, updates and deletes in a single scan. That's great, but what if it's useful to do that CRUD (yes, this can include selects) to entirely different tables? Or what if the relevant DML will only come in a later statement in the same transaction? I'm not saying go implement MERGE. I'm saying, make the insert-or-update operation a single statement, using some syntax TBD, instead of requiring the use of a new insert statement that makes invisible rows visible as a side effect, so that you can wrap that in a CTE and feed it to an update statement. That's complex and, AFAICS, unlike how any other database product handles this. Again, other people can have different opinions on this, and that's fine. I'm just giving you mine. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On 10/15/2013 05:52 AM, Magnus Hagander wrote: But the argument about being friendly for new users should definitely have us change wal_level and max_wal_senders. +1 for having replication supported out-of-the-box aside from pg_hba.conf. To put it another way: users are more likely to care about replication than they are about IO overhead on a non-replicated server. And for the users who care about IO overhead, they are more likely to much about in pg.conf *anyway* in order to set a slew of performance-tuning settings. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On 2013-10-15 10:19:06 -0700, Josh Berkus wrote: On 10/15/2013 05:52 AM, Magnus Hagander wrote: But the argument about being friendly for new users should definitely have us change wal_level and max_wal_senders. +1 for having replication supported out-of-the-box aside from pg_hba.conf. To put it another way: users are more likely to care about replication than they are about IO overhead on a non-replicated server. And for the users who care about IO overhead, they are more likely to much about in pg.conf *anyway* in order to set a slew of performance-tuning settings. But it will hurt people restoring backups using pg_restore -j. I think people might be rather dissapointed if that slows down by a factor of three. I think we really need to get to the point where we increase the wal level ondemand... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Tue, Oct 15, 2013 at 7:26 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-10-15 10:19:06 -0700, Josh Berkus wrote: On 10/15/2013 05:52 AM, Magnus Hagander wrote: But the argument about being friendly for new users should definitely have us change wal_level and max_wal_senders. +1 for having replication supported out-of-the-box aside from pg_hba.conf. To put it another way: users are more likely to care about replication than they are about IO overhead on a non-replicated server. And for the users who care about IO overhead, they are more likely to much about in pg.conf *anyway* in order to set a slew of performance-tuning settings. But it will hurt people restoring backups using pg_restore -j. I think people might be rather dissapointed if that slows down by a factor of three. I think we really need to get to the point where we increase the wal level ondemand... Yeha, there are really two things. If we can increase wal_level on demand, that would solve one of them. Turning that into a SIGHUP parameter would be great. I have no idea how hard it would be. In theory, couldn't we let it be sighup and then just have do_pg_start_backup() block until all backends have acknowledged that they are on the new WAL level somehow? (Yes, I realize this might be a big simplification, but I'm allowed to hope, no?) The other problem is max_wal_senders. I think that's a much smaller problem - setting that one to 5 or so by default shouldn't have a big impact. But without the wal_level changes, it would also be mostly pointless... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 2013-10-15 10:19:17 -0700, Peter Geoghegan wrote: On Tue, Oct 15, 2013 at 9:56 AM, Robert Haas robertmh...@gmail.com wrote: Well, I don't know that any of us can claim to have a lock on what the syntax should look like. Sure. But it's not just syntax. We're talking about functional differences too, since you're talking about mandating an update, which is a not the same as an update locked row only conditionally, or a delete. I think anything that only works by breaking visibility rules that way is a nonstarter. Doing that from the C level is one thing, exposing it this way seems a bad idea. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On 2013-10-15 19:29:50 +0200, Magnus Hagander wrote: On Tue, Oct 15, 2013 at 7:26 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-10-15 10:19:06 -0700, Josh Berkus wrote: On 10/15/2013 05:52 AM, Magnus Hagander wrote: But the argument about being friendly for new users should definitely have us change wal_level and max_wal_senders. +1 for having replication supported out-of-the-box aside from pg_hba.conf. To put it another way: users are more likely to care about replication than they are about IO overhead on a non-replicated server. And for the users who care about IO overhead, they are more likely to much about in pg.conf *anyway* in order to set a slew of performance-tuning settings. But it will hurt people restoring backups using pg_restore -j. I think people might be rather dissapointed if that slows down by a factor of three. I think we really need to get to the point where we increase the wal level ondemand... Yeha, there are really two things. If we can increase wal_level on demand, that would solve one of them. Turning that into a SIGHUP parameter would be great. I have no idea how hard it would be. In theory, couldn't we let it be sighup and then just have do_pg_start_backup() block until all backends have acknowledged that they are on the new WAL level somehow? (Yes, I realize this might be a big simplification, but I'm allowed to hope, no?) Depends on what you want to support. For basebackups, that should be doable with some pullups. It's unfortunately more complex than that for streaming rep - we really need persistent standby registration there. Otherwise the wal_level will fall back to minimal when the standby disconnects which will obviously break the standby. The other problem is max_wal_senders. I think that's a much smaller problem - setting that one to 5 or so by default shouldn't have a big impact. But without the wal_level changes, it would also be mostly pointless... Well, you currently cannot even set it when the wal_level isn't set appropriately, but that that should be easy enough to change. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for reserved connections for replication users
On 10/15/2013 07:36 AM, Robert Haas wrote: On Tue, Oct 15, 2013 at 10:34 AM, Andres Freund and...@2ndquadrant.com wrote: Josh said we should treat replication connections in a separate pool from normal database connections, right? So you withdraw your earlier objection to that? I don't think that's what he said. Here's what I was referring to: To clarify: I do, indeed, support the idea of treating replication connections as a pool outside of max_connections. Here's why: FATAL: connection limit exceeded for non-superusers SHOW max_connections; 100 SELECT COUNT(*) FROM pg_stat_activity; 94 SHOW superuser_reserved_connections; 3 ... search around quite a bit, eventually figure out that you have three replication connections open. We've already set up an illogical and hard-to-troubleshoot situation where replication connections do not appear in pg_stat_activity, yet they are counted against max_connections. You could argue that the same is true of superuser_reserved_connections, but there's a couple reasons why it isn't: 1) if superusers are actually connected, that shows up in pg_stat_activity (and given how many of our users run their apps as superuser, they get to max_connections out anyway). 2) the error message spells out that there may be superuser connections available. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Tue, Oct 15, 2013 at 7:32 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-10-15 19:29:50 +0200, Magnus Hagander wrote: On Tue, Oct 15, 2013 at 7:26 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-10-15 10:19:06 -0700, Josh Berkus wrote: On 10/15/2013 05:52 AM, Magnus Hagander wrote: But the argument about being friendly for new users should definitely have us change wal_level and max_wal_senders. +1 for having replication supported out-of-the-box aside from pg_hba.conf. To put it another way: users are more likely to care about replication than they are about IO overhead on a non-replicated server. And for the users who care about IO overhead, they are more likely to much about in pg.conf *anyway* in order to set a slew of performance-tuning settings. But it will hurt people restoring backups using pg_restore -j. I think people might be rather dissapointed if that slows down by a factor of three. I think we really need to get to the point where we increase the wal level ondemand... Yeha, there are really two things. If we can increase wal_level on demand, that would solve one of them. Turning that into a SIGHUP parameter would be great. I have no idea how hard it would be. In theory, couldn't we let it be sighup and then just have do_pg_start_backup() block until all backends have acknowledged that they are on the new WAL level somehow? (Yes, I realize this might be a big simplification, but I'm allowed to hope, no?) Depends on what you want to support. For basebackups, that should be doable with some pullups. It's unfortunately more complex than that for streaming rep - we really need persistent standby registration there. Otherwise the wal_level will fall back to minimal when the standby disconnects which will obviously break the standby. I was actually thinking the easier step might not be to do it dynamically as the standby registers - just allow it to be a SIGHUP parameter. So you'd still change it in postgresql.conf, but it would be ok with a reload rather than a restart. Yes, fully dynamic would be better, so if we could combined those two, that would make us require nothing for pg_basebackup, and just a reload for replication slaves. The point being that we wouldn't need a *restart* at any point - and that alond would be a big improvement. The other problem is max_wal_senders. I think that's a much smaller problem - setting that one to 5 or so by default shouldn't have a big impact. But without the wal_level changes, it would also be mostly pointless... Well, you currently cannot even set it when the wal_level isn't set appropriately, but that that should be easy enough to change. Yes, it would be a trivial change to allow that parametre to be set and then just give an error if you try to initiate streaming in that case. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Tue, Oct 15, 2013 at 10:29 AM, Andres Freund and...@2ndquadrant.com wrote: I think anything that only works by breaking visibility rules that way is a nonstarter. Doing that from the C level is one thing, exposing it this way seems a bad idea. What visibility rule is that? Upsert *has* to do effectively the same thing as what I've proposed - there is no getting away from it. So maybe the visibility rulebook (which as far as I can tell is the way things work today) needs to be updated. If we did, say, INSERT...ON DUPLICATE KEY UPDATE, we'd have to update a row with potentially no visible-to-snapshot version *at all*, and make a new version of that visible. That's just what it takes. What's the difference between that and just locking? If the only difference is that it isn't necessary to modify tqual.c because you're passing a tid directly, that isn't a user-visible difference - the rule has been broken just the same. Arguably, it's even more of a hack, since it's a special, out-of-band visibility exception. I'm happy to have total scrutiny of changes to tqual.c, but I'm surprised that the mere fact of it having been modified is being weighed so heavily. Another thing that I'm not clear on is how an update can be backed out of if the row is modified by another xact. As I think I've already illustrated, the row locking that takes place has to be kind of opportunistic. I'm sure you could do it, but it would probably be quite invasive. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 10/15/2013 08:11 AM, Robert Haas wrote: I'm not saying go implement MERGE. I'm saying, make the insert-or-update operation a single statement, using some syntax TBD, instead of requiring the use of a new insert statement that makes invisible rows visible as a side effect, so that you can wrap that in a CTE and feed it to an update statement. That's complex and, AFAICS, unlike how any other database product handles this. Hmmm. Is the plan NOT to eventually get to a single-statement upsert? If not, then I'm not that keen on this feature. I can't say that anybody I know who's migrating from MySQL would use a 2-statement version of upsert; if they were prepared for that, then they'd be prepared to just rewrite their stuff as proper insert/updates anyway. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Tue, Oct 15, 2013 at 10:58 AM, Josh Berkus j...@agliodbs.com wrote: Hmmm. Is the plan NOT to eventually get to a single-statement upsert? If not, then I'm not that keen on this feature. See the original e-mail in the thread for what I imagine idiomatic usage will look like. http://www.postgresql.org/message-id/cam3swzthwrktvurf1awaih8qthgnmzafydcnw8qju7pqhk5...@mail.gmail.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Tue, Oct 15, 2013 at 11:05 AM, Peter Geoghegan p...@heroku.com wrote: See the original e-mail in the thread for what I imagine idiomatic usage will look like. http://www.postgresql.org/message-id/cam3swzthwrktvurf1awaih8qthgnmzafydcnw8qju7pqhk5...@mail.gmail.com Note also that this doesn't preclude a variant with a more direct update part (not that I think that's all that compelling). Doing things this way was motivated by: 1) Serving the needs of logical changeset generation plugins, even if Andres doesn't think that needs to be exposed through SQL. He and I both want something that does this with low overhead (in particular, no subtransactions). 2) Getting something effective into the next release. MERGE-like flexibility seems like a very desirable thing. And the implementation's infrastructure can be used by an eventual MERGE implementation. 3) Being simple enough that huge bike shedding over syntax might not be necessary. Making insert statements grow an update tumor is likely to get messy fast. I know because I tried it myself. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Peter, Note also that this doesn't preclude a variant with a more direct update part (not that I think that's all that compelling). Doing things this way was motivated by: I can see the value in the CTE format for this for existing PostgreSQL users. (although, AFAICT it doesn't allow for the implementation of one of my personal desires, which is UPDATE ... ON NOT FOUND INSERT, for cases where updates are expected to occur 95% of the time, but that's another topic. Unless rejects for an Update could be the leftover rows, but then we're getting into full MERGE.). I'm just pointing out that this doesn't do much for the MySQL migration case; the rewrite is too complex to automate. I'd been assuming that we had some plans to implement a MySQL-friendly syntax for 9.5, and this version was a stepping stone to that. Does this version make a distinction between PRIMARY KEY constraints and UNIQUE indexes? If not, how does it pick among keys? If so, what about tables with no PRIMARY KEY for various reasons (like unique GiST indexes?) -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Tue, Oct 15, 2013 at 11:23 AM, Josh Berkus j...@agliodbs.com wrote: (although, AFAICT it doesn't allow for the implementation of one of my personal desires, which is UPDATE ... ON NOT FOUND INSERT, for cases where updates are expected to occur 95% of the time, but that's another topic. Unless rejects for an Update could be the leftover rows, but then we're getting into full MERGE.). This isn't really all that inefficient for that case. Certainly, the balance in cost between mostly-insert cases and mostly-update cases is a strength of my basic approach over others. Does this version make a distinction between PRIMARY KEY constraints and UNIQUE indexes? If not, how does it pick among keys? If so, what about tables with no PRIMARY KEY for various reasons (like unique GiST indexes?) We thought about prioritizing where to look (mostly as a performance optimization), but right now no. It works with amcanunique methods, which in practice means btrees. There is no such thing as a GiST unique index, so I guess you're referring to an exclusion constraint on an equality operator. That doesn't work with this, but why would you want it to? As for generalizing this to work with exclusion constraints, which I guess you might have also meant, that's a much more difficult and much less compelling proposition, in my opinion. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 10/15/2013 11:38 AM, Peter Geoghegan wrote: We thought about prioritizing where to look (mostly as a performance optimization), but right now no. It works with amcanunique methods, which in practice means btrees. There is no such thing as a GiST unique index, so I guess you're referring to an exclusion constraint on an equality operator. That doesn't work with this, but why would you want it to? As for generalizing this to work with exclusion constraints, which I guess you might have also meant, that's a much more difficult and much less compelling proposition, in my opinion. Yeah, that was one thing I was thinking of. Also, because you can't INDEX CONCURRENTLY a PK, I've been building a lot of databases which have no PKs, only UNIQUE indexes. Historically, this hasn't been an issue because aside from wonky annoyances (like the CONCURRENTLY case), Postgres doesn't distinguish between UNIQUE indexes and PRIMARY KEYs -- as, indeed, it shouldn't, since they're both keys, adn the whole concept of a primary key is a legacy of index-organized databases, which PostgreSQL is not. However, it does seem like the new syntax could be extended with and optional USING unqiue_index_name in the future (9.5), no? I'm just checking that we're not painting ourselves into a corner with this particular implementation. It's OK if it doesn't implement most things now; it's bad if it is impossible to build on and we have to support it forever. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Tue, Oct 15, 2013 at 11:55 AM, Josh Berkus j...@agliodbs.com wrote: However, it does seem like the new syntax could be extended with and optional USING unqiue_index_name in the future (9.5), no? There is no reason why we couldn't do that and just consider that one unique index. Whether we should is another question - I certainly think that mandating it would be very bad. I'm just checking that we're not painting ourselves into a corner with this particular implementation. It's OK if it doesn't implement most things now; it's bad if it is impossible to build on and we have to support it forever. I don't believe it does. In essence this just simply inserts a row, and rather than throwing a unique constraint violation, locks the row that prevented insertion from proceeding in respect of any tuple proposed for insertion where it does not. That's all. You can build lots of things with it that you can't today. Or you can not use it at all. So that covers semantics, I'd say. As for implementation: I believe that the implementation is by far the most forward thinking (in terms of building infrastructure for a proper MERGE) of any proposal to date. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 10/15/2013 12:03 PM, Peter Geoghegan wrote: On Tue, Oct 15, 2013 at 11:55 AM, Josh Berkus j...@agliodbs.com wrote: However, it does seem like the new syntax could be extended with and optional USING unqiue_index_name in the future (9.5), no? There is no reason why we couldn't do that and just consider that one unique index. Whether we should is another question - What's the shouldn't argument, if any? I certainly think that mandating it would be very bad. Agreed. If there is a PK, we should allow the user to use it implicitly. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] background workers, round three
2013/10/14 Robert Haas robertmh...@gmail.com: * ephemeral-precious-v1.patch AtEOXact_BackgroundWorker() is located around other AtEOXact_* routines. Doesn't it makes resource management complicated? In case when main process goes into error handler but worker process is still running in health, it may continue to calculate something and put its results on shared memory segment, even though main process suggest postmaster to kill it. Since I wrote this patch set, I've been thinking a lot more about error recovery. Obviously, one of the big problems as we think about parallel query is that you've now got multiple backends floating around, and if the transaction aborts (in any backend), the other backends don't automatically know that; they need some way to know that they, too, short abort processing. There are a lot of details to get right here, and the time I've spent on it so far convinces me that the problem is anything but easy. Having said that, I'm not too concerned about the particular issue that you raise here. The resources that need to be cleaned up during transaction abort are backend-private resources. If, for example, the user backend detaches a dynamic shared memory segment that is being used for a parallel computation, they're not actually *destroying* the segment; they are just detaching it *from their address space*. The last process to detach it will also destroy it. So the ordering in which the various processes detach it doesn't matter much. One of the things I do this is necessary is a set of on_dsm_detach callbacks that work pretty much the way that on_shmem_exit callbacks work today. Just as we can't detach from the main shared memory segment without releasing locks and buffer pins and lwlocks and our PGXACT, we can't release from a dynamic shared memory segment without performing any similar cleanup that is needed. I'm currently working on a patch for that. Hmm. It probably allows to clean-up smaller fraction of data structure constructed on dynamic shared memory segment, if we map / unmap for each transactions. All the ResourceOwnerRelease() callbacks are located prior to AtEOXact_BackgroundWorker(), it is hard to release resources being in use by background worker, because they are healthy running until it receives termination signal, but sent later. In addition, it makes implementation complicated if we need to design background workers to release resources if and when it is terminated. I don't think it is a good coding style, if we need to release resources in different location depending on context. Which specific resources are you concerned about? I assumed smaller chunks allocated on static or dynamic shared memory segment to be used for communicate between main process and worker processes because of my motivation. When we move a chunk of data to co-processor using asynchronous DMA transfer, API requires the source buffer is mlock()'ed to avoid unintentional swap out during DMA transfer. On the other hand, cost of mlock() operation is not ignorable, so it may be a reasonable design to lock a shared memory segment on start-up time then continue to use it, without unmapping. So, I wondered how to handle the situation when extension tries to manage a resource with smaller granularity than the one managed by PostgreSQL core. So, I'd like to propose to add a new invocation point of ResourceOwnerRelease() after all AtEOXact_* jobs, with new label something like RESOURCE_RELEASE_FINAL. In addition, AtEOXact_BackgroundWorker() does not synchronize termination of background worker processes being killed. Of course it depends on situation, I think it is good idea to wait for completion of worker processes to be terminated, to ensure resource to be released is backed to the main process if above ResourceOwnerRelease() do the job. Again, which resources are we talking about here? I tend to think it's an essential property of the system that we *shouldn't* have to care about the order in which processes are terminated. First, that will be difficult to control; if an ERROR or FATAL condition has occurred and we need to terminate, then there are real limits to what guarantees we can provide after that point. Second, it's also *expensive*. The point of parallelism is to make things faster; any steps we add that involve waiting for other processes to do things will eat away at the available gains. For a query that'll run for an hour that hardly matters, but for short queries it's important to avoid unnecessary overhead. Indeed, you are right. Error path has to be terminated soon. Probably, ResourceOwnerRelease() callback needs to inform healthy performing worker process the transaction got aborted thus no need to return its calculation result, using some way, if I implement it. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes
Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows
On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander mag...@hagander.net wrote: On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, that sucks. So it's a Windows bug. It's not clear to me that we should do anything about this at all, except perhaps document that people should avoid long tablespace path names on an unknown set of Windows versions. We should not be in the business of working around any and every bug coming out of Redmond. It's sort of incomprehensible to me that Microsoft has a bug like this and apparently hasn't fixed it. But I think I still favor trying to work around it. When people try to use a long data directory name and it freezes the system, some of them will blame us rather than Microsoft. We've certainly gone to considerable lengths to work around extremely strange bugs in various compiler toolchains, even relatively obscure ones. I don't particularly see why we shouldn't do the same here. I agree we'll probably want to work around it in the end, but I still think it should be put to Microsoft PSS if we can. The usual - have we actually produced a self-contained example that does just this (and doesn't include the full postgres support) and submitted it to *microsoft* for comments? I have written a self contained win32 console application with which the issue can be reproduced. The application project is attached with this mail. Here is brief description of the project: This project is created using MSVC 2010, but even if somebody doesn't have this version of VC, functions in file long_path.cpp can be copied and used in new project. In project settings, I have changed Character Set to Use Multi-Byte Character Set which is what Postgres uses. It takes 3 parameters as input: existingpath - path for which link will be created. this path should be an already existing path with one level less than actual path. For example, if we want to create a link for path E:/PG_Patch/Long_Path/path_dir/version_dir, then this should be E:/PG_Patch/Long_Path/path_dir. newpath - path where link needs to be created. it should be non-absolute path of format linked_path_dir/test_version curpath - path to set as current working directory path, it should be the location to prepend to newpath Currently I have used input parameters as E:/PG_Patch/Long_Path/path_dir linked_path_dir/test_version E:/PG_Patch/Long_Path/ Long path is much less than 260 char limit on windows, I have observed this problem with path length 130 (approx.) With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com long_path.rar Description: application/rar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Triggers on foreign tables
2013/10/15 Robert Haas robertmh...@gmail.com: On Mon, Oct 14, 2013 at 5:24 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: And, I also want some comments from committers, not only from mine. +1 +1 /me pokes head up. I know I'm going to annoy people with this comment, but I feel like it's going to have to be made at some point by somebody, so here goes: I don't see the point of this feature. If you want a trigger on a table, why not set it on the remote side? A trigger on the foreign table won't be enforced consistently; it'll only work when the update is routed through the foreign table, not when people access the underlying table on the remote side through any other mechanism. The number of useful things you can do this way seems fairly small. Perhaps you could use a row-level trigger for RLS, to allow only certain rows on the foreign side to be updated, but even that seems like a slightly strange design: generally it'd be better to enforce the security as close to the target object as possible. One reason we should support local triggers is that not all the data source of FDW support remote trigger. It required FDW drivers to have RDBMS as its backend, but no realistic assumption. For example, file_fdw is unavailable to implement remote triggers. One thing I'd like to know is, where is the goal of FDW feature. It seems to me, FDW goes into a feature to manage external data set as if regular tables. If it is right understanding, things we try to support on foreign table is things we're supporting on regular tables, such as triggers. There's another issue that concerns me here also: performance. IIUC, an update of N tuples on the remote side currently requires N+1 server round-trips. That is unspeakably awful, and we really oughta be looking for ways to make that number go down, by pushing the whole update to the remote side. But obviously that won't be possible if there's a per-row trigger that has to be evaluated on the local side. Now, assuming somebody comes up with code that implements that optimization, we can just disable it when there are local-side triggers. But, then you're back to having terrible performance. So even if the use case for this seemed really broad, I tend to think performance concerns would sink most of the possible real-world uses. We often have some case that we cannot apply fully optimized path because of some reasons, like view has security-barrier, qualifier contained volatile functions, and so on... Trigger may be a factor to prevent fully optimized path, however, it depends on the situation which one shall be prioritized; performance or functionality. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows
On Tue, Oct 15, 2013 at 4:14 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander mag...@hagander.net wrote: On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, that sucks. So it's a Windows bug. It's not clear to me that we should do anything about this at all, except perhaps document that people should avoid long tablespace path names on an unknown set of Windows versions. We should not be in the business of working around any and every bug coming out of Redmond. It's sort of incomprehensible to me that Microsoft has a bug like this and apparently hasn't fixed it. But I think I still favor trying to work around it. When people try to use a long data directory name and it freezes the system, some of them will blame us rather than Microsoft. We've certainly gone to considerable lengths to work around extremely strange bugs in various compiler toolchains, even relatively obscure ones. I don't particularly see why we shouldn't do the same here. I agree we'll probably want to work around it in the end, but I still think it should be put to Microsoft PSS if we can. The usual - have we actually produced a self-contained example that does just this (and doesn't include the full postgres support) and submitted it to *microsoft* for comments? I have written a self contained win32 console application with which the issue can be reproduced. The application project is attached with this mail. Here is brief description of the project: This project is created using MSVC 2010, but even if somebody doesn't have this version of VC, functions in file long_path.cpp can be copied and used in new project. In project settings, I have changed Character Set to Use Multi-Byte Character Set which is what Postgres uses. It takes 3 parameters as input: existingpath - path for which link will be created. this path should be an already existing path with one level less than actual path. For example, if we want to create a link for path E:/PG_Patch/Long_Path/path_dir/version_dir, then this should be E:/PG_Patch/Long_Path/path_dir. newpath - path where link needs to be created. it should be non-absolute path of format linked_path_dir/test_version curpath - path to set as current working directory path, it should be the location to prepend to newpath Currently I have used input parameters as E:/PG_Patch/Long_Path/path_dir linked_path_dir/test_version E:/PG_Patch/Long_Path/ Long path is much less than 260 char limit on windows, I have observed this problem with path length 130 (approx.) And this reliably reproduces the hang? On which Windows version(s)? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 2013-10-15 10:53:35 -0700, Peter Geoghegan wrote: On Tue, Oct 15, 2013 at 10:29 AM, Andres Freund and...@2ndquadrant.com wrote: I think anything that only works by breaking visibility rules that way is a nonstarter. Doing that from the C level is one thing, exposing it this way seems a bad idea. What visibility rule is that? The early return you added to HTSMVCC. At the very least it opens you to lots of halloween problem like scenarios. Upsert *has* to do effectively the same thing as what I've proposed - there is no getting away from it. So maybe the visibility rulebook (which as far as I can tell is the way things work today) needs to be updated. If we did, say, INSERT...ON DUPLICATE KEY UPDATE, we'd have to update a row with potentially no visible-to-snapshot version *at all*, and make a new version of that visible. That's just what it takes. What's the difference between that and just locking? If the only difference is that it isn't necessary to modify tqual.c because you're passing a tid directly, that isn't a user-visible difference - the rule has been broken just the same. Arguably, it's even more of a hack, since it's a special, out-of-band visibility exception. No, doing it in special case code is fundamentally different since those locations deal only with one row at a time. There's no scans that can pass over that row. That's why I think exposing the on conflict lock logic to anything but C isn't going to fly btw. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 2013-10-15 11:23:44 -0700, Josh Berkus wrote: (although, AFAICT it doesn't allow for the implementation of one of my personal desires, which is UPDATE ... ON NOT FOUND INSERT, for cases where updates are expected to occur 95% of the time, but that's another topic. Unless rejects for an Update could be the leftover rows, but then we're getting into full MERGE.). FWIW I can't see the above syntax as something working very well - you fundamentally have to SET every column and it only makes sense in UPDATEs that provably affect only one row. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 2013-10-15 11:55:06 -0700, Josh Berkus wrote: Also, because you can't INDEX CONCURRENTLY a PK, I've been building a lot of databases which have no PKs, only UNIQUE indexes. You know that you can add prebuilt primary keys using ALTER TABLE ... ADD CONSTRAINT ... PRIMARY KEY (...) USING indexname? Postgres doesn't distinguish between UNIQUE indexes and PRIMARY KEYs -- as, indeed, it shouldn't, since they're both keys, adn the whole concept of a primary key is a legacy of index-organized databases, which PostgreSQL is not. There's some other differences, fro one primary keys are automatically picked up by foreign keys if the referenced columns aren't specified, for another we do not yet automatically recognize NOT NULL UNIQUE columns in GROUP BY. However, it does seem like the new syntax could be extended with and optional USING unqiue_index_name in the future (9.5), no? Yes. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On 10/15/2013 07:56 AM, Andres Freund wrote: Well, just providing the C API + an example in a first step didn't work out too badly for FDWs. I am pretty sure that once released there will soon be extensions for it on PGXN or whatever for special usecases. I suspect so, too. But I also think that if that's the only thing available in the first release, a lot of users will get a poor initial impression. I think lots of people will expect a builtin logical replication solution :/. Which seems a tad unlikely to arrive in 9.4. Well, last I checked the Slony team is hard at work on building something which will be based on logical changesets. So there will likely be at least one tool available shortly after 9.4 is released. A good and flexible API is, IMHO, more important than having any finished solution. The whole reason why logical replication was outside core PG for so long is that replication systems have differing and mutually incompatible goals. A good API can support all of those goals; a user-level tool, no matter how good, can't. And, frankly, once the API is built, how hard will it be to write a script which does the simplest replication approach (replay all statements on slave)? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 10/15/2013 02:31 PM, Andres Freund wrote: On 2013-10-15 11:55:06 -0700, Josh Berkus wrote: Also, because you can't INDEX CONCURRENTLY a PK, I've been building a lot of databases which have no PKs, only UNIQUE indexes. You know that you can add prebuilt primary keys using ALTER TABLE ... ADD CONSTRAINT ... PRIMARY KEY (...) USING indexname? That still requires an ACCESS EXCLUSIVE lock, and then can't be dropped using DROP INDEX CONCURRENTLY. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PoC] pgstattuple2: block sampling to reduce physical read
On 11/10/13 17:49, Mark Kirkwood wrote: On 11/10/13 17:08, Satoshi Nagayasu wrote: (2013/10/11 7:32), Mark Kirkwood wrote: On 11/10/13 11:09, Mark Kirkwood wrote: On 16/09/13 16:20, Satoshi Nagayasu wrote: (2013/09/15 11:07), Peter Eisentraut wrote: On Sat, 2013-09-14 at 16:18 +0900, Satoshi Nagayasu wrote: I'm looking forward to seeing more feedback on this approach, in terms of design and performance improvement. So, I have submitted this for the next CF. Your patch fails to build: pgstattuple.c: In function ‘pgstat_heap_sample’: pgstattuple.c:737:13: error: ‘SnapshotNow’ undeclared (first use in this function) pgstattuple.c:737:13: note: each undeclared identifier is reported only once for each function it appears in Thanks for checking. Fixed to eliminate SnapshotNow. This seems like a cool idea! I took a quick look, and initally replicated the sort of improvement you saw: bench=# explain analyze select * from pgstattuple('pgbench_accounts'); QUERY PLAN Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual time=786.368..786.369 rows=1 loops=1) Total runtime: 786.384 ms (2 rows) bench=# explain analyze select * from pgstattuple2('pgbench_accounts'); NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00, dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00 QUERY PLAN Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual time=12.004..12.005 rows=1 loops=1) Total runtime: 12.019 ms (2 rows) I wondered what sort of difference eliminating caching would make: $ sudo sysctl -w vm.drop_caches=3 Repeating the above queries: bench=# explain analyze select * from pgstattuple('pgbench_accounts'); QUERY PLAN Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual time=9503.774..9503.776 rows=1 loops=1) Total runtime: 9504.523 ms (2 rows) bench=# explain analyze select * from pgstattuple2('pgbench_accounts'); NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00, dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00 QUERY PLAN Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual time=12330.630..12330.631 rows=1 loops=1) Total runtime: 12331.353 ms (2 rows) So the sampling code seems *slower* when the cache is completely cold - is that expected? (I have not looked at how the code works yet - I'll dive in later if I get a chance)! Thanks for testing that. It would be very helpful to improve the performance. Quietly replying to myself - looking at the code the sampler does 3000 random page reads... I guess this is slower than 163935 (number of pages in pgbench_accounts) sequential page reads thanks to os readahead on my type of disk (WD Velociraptor). Tweaking the number of random reads (i.e the sample size) down helps - but obviously that can impact estimation accuracy. Thinking about this a bit more, I guess the elapsed runtime is not the *only* theng to consider - the sampling code will cause way less disruption to the os page cache (3000 pages vs possibly lots more than 3000 for reading an entire ralation). Thoughts? I think it could be improved by sorting sample block numbers *before* physical block reads in order to eliminate random access on the disk. pseudo code: -- for (i=0 ; iSAMPLE_SIZE ; i++) { sample_block[i] = random(); } qsort(sample_block); for (i=0 ; iSAMPLE_SIZE ; i++) { buf = ReadBuffer(rel, sample_block[i]); do_some_stats_stuff(buf); } -- I guess it would be helpful for reducing random access thing. Any comments? Ah yes - that's a good idea (rough patch to your patch attached)! bench=# explain analyze select * from pgstattuple2('pgbench_accounts'); NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00, dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00 QUERY PLAN Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual time=9968.318..9968.319 rows=1 loops=1) Total runtime: 9968.443 ms (2 rows) Actually - correcting my compare function to sort the blocks in *increasing* order (doh), gets a better result: bench=# explain analyze select * from pgstattuple2('pgbench_accounts'); NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00, dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00 QUERY PLAN -- Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1
Re: [HACKERS] [PoC] pgstattuple2: block sampling to reduce physical read
On 11/10/13 17:33, Jaime Casanova wrote: also the name pgstattuple2, doesn't convince me... maybe you can use pgstattuple() if you use a second argument (percentage of the sample) to overload the function +1, that seems much nicer. Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for reserved connections for replication users
On Mon, 14 Oct 2013 11:52:57 +0530 Amit Kapila amit.kapil...@gmail.com wrote: On Sun, Oct 13, 2013 at 2:08 PM, Gibheer gibh...@zero-knowledge.org wrote: On Sun, 13 Oct 2013 11:38:17 +0530 Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Oct 10, 2013 at 3:17 AM, Gibheer gibh...@zero-knowledge.org wrote: On Mon, 7 Oct 2013 11:39:55 +0530 Amit Kapila amit.kapil...@gmail.com wrote: Robert Haas wrote: On Mon, Aug 5, 2013 at 2:04 AM, Andres Freund andres(at)2ndquadrant(dot)com wrote: Hmm. It seems like this match is making MaxConnections no longer mean the maximum number of connections, but rather the maximum number of non-replication connections. I don't think I support that definitional change, and I'm kinda surprised if this is sufficient to implement it anyway (e.g. see InitProcGlobal()). I don't think the implementation is correct, but why don't you like the definitional change? The set of things you can do from replication connections are completely different from a normal connection. So using separate pools for them seems to make sense. That they end up allocating similar internal data seems to be an implementation detail to me. Because replication connections are still connections. If I tell the system I want to allow 100 connections to the server, it should allow 100 connections, not 110 or 95 or any other number. I think that to reserve connections for replication, mechanism similar to superuser_reserved_connections be used rather than auto vacuum workers or background workers. This won't change the definition of MaxConnections. Another thing is that rather than introducing new parameter for replication reserved connections, it is better to use max_wal_senders as it can serve the purpose. Review for replication_reserved_connections-v2.patch, considering we are going to use mechanism similar to superuser_reserved_connections and won't allow definition of MaxConnections to change. Hi, I took the time and reworked the patch with the feedback till now. Thank you very much Amit! So this patch uses max_wal_senders together with the idea of the first patch I sent. The error messages are also adjusted to make it obvious, how it is supposed to be and all checks work, as far as I could tell. If I understand correctly, now the patch has implementation such that a. if the number of connections left are (ReservedBackends + max_wal_senders), then only superusers or replication connection's will be allowed b. if the number of connections left are ReservedBackend, then only superuser connections will be allowed. That is correct. So it will ensure that max_wal_senders is used for reserving connection slots from being used by non-super user connections. I find new usage of max_wal_senders acceptable, if anyone else thinks otherwise, please let us know. 1. +varnamesuperuser_reserved_connections/varname +varnamemax_wal_senders/varname only superuser and WAL connections +are allowed. Here minus seems to be missing before max_wal_senders and I think it will be better to use replication connections rather than WAL connections. This is fixed. 2. -new replication connections will be accepted. +new WAL or other connections will be accepted. I think as per new implementation, we don't need to change this line. I reverted that change. 3. + * reserved slots from max_connections for wal senders. If the number of free + * slots (max_connections - max_wal_senders) is depleted. Above calculation (max_connections - max_wal_senders) needs to include super user reserved connections. My first thought was, that I would not add it here. When superuser reserved connections are not set, then only max_wal_senders would count. But you are right, it has to be set, as 3 connections are reserved by default for superusers. + * slots (max_connections - superuser_reserved_connections - max_wal_senders) here it should be ReservedBackends rather than superuser_reserved_connections. fixed 4. + /* + * Although replication connections currently require superuser privileges, we + * don't allow them to consume the superuser reserved slots, which are + * intended for interactive use. */ if ((!am_superuser || am_walsender) ReservedBackends 0 !HaveNFreeProcs(ReservedBackends)) ereport(FATAL, (errcode(ERRCODE_TOO_MANY_CONNECTIONS), - errmsg(remaining connection slots are reserved for non-replication superuser connections))); + errmsg(remaining connection slots are reserved for superuser connections))); Will there be any problem if we do the above check before the check for wal senders and reserved replication connections (+ !HaveNFreeProcs(max_wal_senders + ReservedBackends))) and
Re: [HACKERS] PSQL return coder
I was avoiding ON_ERROR_STOP because I was using ON_ERROR_ROLLBACK, but have just realised that if I encase my SQL in a transaction then rollback will still happen. Perfect! James Sewell, PostgreSQL Team Lead / Solutions Architect __ Level 2, 50 Queen St, Melbourne VIC 3000 *P *(+61) 3 8370 8000 * **W* www.lisasoft.com *F *(+61) 3 8370 8099 On Fri, Oct 11, 2013 at 12:25 AM, Merlin Moncure mmonc...@gmail.com wrote: On Thu, Oct 10, 2013 at 1:52 AM, Tom Lane t...@sss.pgh.pa.us wrote: James Sewell james.sew...@lisasoft.com writes: My question is in a rollback scenario is it possible to get PSQL to return a non 0 exit status? Maybe you could use -c instead of -f? $ psql -c 'select 1; select 1/0' regression ERROR: division by zero $ echo $? 1 You won't need explicit BEGIN/END because this is already a single transaction. According to the man page, EXIT STATUS psql returns 0 to the shell if it finished normally, 1 if a fatal error of its own (out of memory, file not found) occurs, 2 if the connection to the server went bad and the session was not interactive, and 3 if an error occurred in a script and the variable ON_ERROR_STOP was set. So for a longer script ON_ERROR_STOP might be the ticket (which is usually a good idea anyways). merlin -- -- The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Re: [HACKERS] logical changeset generation v6.2
On Tue, Oct 15, 2013 at 10:09:05AM -0400, Robert Haas wrote: On Tue, Oct 15, 2013 at 9:17 AM, Andres Freund and...@2ndquadrant.com wrote: User: So, what's new in PostgreSQL 9.4? Hacker: Well, now we have logical replication! User: Why is that cool? Hacker: Well, streaming replication is awesome for HA, but it has significant limitations. And trigger-based systems are very mature, but the overhead is high and their lack of core integration makes them hard to use. With this technology, you can build systems that will replicate individual tables or even parts of tables, multi-master systems, and lots of other cool stuff. User: Wow, that sounds great. How do I use it? Hacker: Well, first you write an output plugin in C using a special API. User: Hey, do you know whether the MongoDB guys came to this conference? Let's try that again. User: Wow, that sounds great. How do I use it? Hacker: Well, currently, the output gets dumped as a series of text files that are designed to be parsed using a scripting language. We have sample parsers written in Perl and Python that you can use as-is or hack up to meet your needs. My version: Hacker: the output gets dumped as a series of JSON files. We have docs for this rev of the format and examples of consumers in Perl and Python you can use as-is or hack up to meet your needs. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On Tue, Oct 15, 2013 at 7:09 AM, Robert Haas robertmh...@gmail.com wrote: Let's try that again. User: Wow, that sounds great. How do I use it? Hacker: Well, currently, the output gets dumped as a series of text files that are designed to be parsed using a scripting language. We have sample parsers written in Perl and Python that you can use as-is or hack up to meet your needs. Have you heard of multicorn? Plugin authors can write a wrapper that spits out JSON or whatever other thing they like, which can be consumed by non C-hackers. Now, some users are still going to head for the hills. But at least from where I sit it sounds a hell of a lot better than the first answer. We're not going to solve all of the tooling problems around this technology in one release, for sure. But as far as 95% of our users are concerned, a C API might as well not exist at all. People WILL try to machine parse the output of whatever demo plugins we provide; so I think we should try hard to provide at least one such plugin that is designed to make that as easy as possible. I agree that this is important, but I wouldn't like to weigh it too heavily. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows
On Wed, Oct 16, 2013 at 2:04 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Oct 15, 2013 at 4:14 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander mag...@hagander.net wrote: On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, that sucks. So it's a Windows bug. It's not clear to me that we should do anything about this at all, except perhaps document that people should avoid long tablespace path names on an unknown set of Windows versions. We should not be in the business of working around any and every bug coming out of Redmond. It's sort of incomprehensible to me that Microsoft has a bug like this and apparently hasn't fixed it. But I think I still favor trying to work around it. When people try to use a long data directory name and it freezes the system, some of them will blame us rather than Microsoft. We've certainly gone to considerable lengths to work around extremely strange bugs in various compiler toolchains, even relatively obscure ones. I don't particularly see why we shouldn't do the same here. I agree we'll probably want to work around it in the end, but I still think it should be put to Microsoft PSS if we can. The usual - have we actually produced a self-contained example that does just this (and doesn't include the full postgres support) and submitted it to *microsoft* for comments? I have written a self contained win32 console application with which the issue can be reproduced. The application project is attached with this mail. Here is brief description of the project: This project is created using MSVC 2010, but even if somebody doesn't have this version of VC, functions in file long_path.cpp can be copied and used in new project. In project settings, I have changed Character Set to Use Multi-Byte Character Set which is what Postgres uses. It takes 3 parameters as input: existingpath - path for which link will be created. this path should be an already existing path with one level less than actual path. For example, if we want to create a link for path E:/PG_Patch/Long_Path/path_dir/version_dir, then this should be E:/PG_Patch/Long_Path/path_dir. newpath - path where link needs to be created. it should be non-absolute path of format linked_path_dir/test_version curpath - path to set as current working directory path, it should be the location to prepend to newpath Currently I have used input parameters as E:/PG_Patch/Long_Path/path_dir linked_path_dir/test_version E:/PG_Patch/Long_Path/ Long path is much less than 260 char limit on windows, I have observed this problem with path length 130 (approx.) And this reliably reproduces the hang? Yes, it produces hang whenever the length of 'curpath' parameter is greater then 130 (approx.). In above example, I used curpath of length 159. On which Windows version(s)? I used Windows 7 64bit to reproduce it. However the original user has reported this issue on Windows 2008 64bit, so this application should hang on other Windows 2008 64bit as well. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve setup for documentation building with FOP
On 9/16/13 12:19 PM, Alvaro Herrera wrote: The FOP-based build works fine for me. I gave the output a look. I like that text formatted with fixed-width font wraps at the right margin, instead of continuing beyond it; there are some strange artifacts about it (such as addition of hyphens in some places, say in the middle of a configure option); but I think it's nicer than the output of the other toolchain nevertheless. This willingness to break in the middle of pre formatted elements looks weird in some places; for example table 17.2 SSL Server File Usage; old output puts the option name on first line, file path on second line; new output puts option name on top, file name is split in half. The wrapping and hyphenation is a complex issue. I think we agree that it's generally better to do suboptimal wrapping than to have the content run off the page. Everything between that is, AFAICT, a matter of fine-tuning various parameters and attributes. In the US size PDF, look at page 386; below the gmake install-world there's the bottom-of-page horizontal line, but the paragraph continues below that. Bug? Yeah, no idea how that can happen. Section 17.8 Encryption Options is formatted differently; previous method used indentation for the paragraph after each item, new one uses a table. I like the old method better. This is notoriously bad in the superuser_reserved_connections entry in 18.3.1. Connection Settings, and others. Also, the synopsis in PQconnectStartParams entry in 31.1 Database Connection Control Functions looks a lot worse. This manifests in many places, and while it's an improvement in a minority of them, I think it's a net loss overall. Yep. Fixed by setting an option that turns it back to the old way. I like that the new output has horizontal lines at top and bottom of the text area of the page. In the old method, the page heading (above that line) contains the chapter number and title, while in FOP it only has title (no number). I find that number useful. Also, limiting the space for the title and wrapping if the title is too long seems pointless; see example of this in chapter High Availability, Load Balancing and Replication, where even the word Bal-ancing has been hyphenated. Made a note that fix that later. Formatting of note, tip and such seems a lot better in FOP. For warning, the old method used a surrounding box; the new one just has a Warning title and indented paragraph, just like for note et al. Doesn't look like a problem to me. It'd be nice to have pretty icons for these areas. Could be fine-tuned later. sidebar renders as grey-background box in FOP, white-background in old tool. Not sure about this; it looks like the only place with grey background in the whole manual. We only have one sidebar in the entire SGML source. Hmm, maybe it's there to keep the printers honest. ;-) It would be easy to change this if desired. Example 19.1 Example pg_hba.conf Entries is completely broken (no page break); renders normally in old method. There are other cases of this, including libpq sample programs and ECPG. I can't reproduce that. What version of FOP are you using? I have 1.1. url renders differently: it used to produce a footnote. FOP instead puts the link text inline. Not really sure about this. Changed back to footnote. The table 25.1 High Availability, Load Balancing, and Replication Feature Matrix contains a lot of [bull], which look odd. Presumably an image of a bull head would be better? This is an artifact of the SGML to XML conversion. Already fixed in separate commit. Tables 27-13 and 27-14 are misformatted in both implementations. Surely we can do better than overlaying the contents of cells ... This also needs to be addressed by adjusting the wrapping and hyphenation rules. Or we could manually tweak it by adding some zero width space characters. It would need to be checked how that would affect other output formats, however. The START_REPLICATION stuff in the Frontend/Backend Protocol chapter is completely broken. Maybe wrong markup? Also in StartupMessage. Same issue as with Encryption Options etc. above. Seems author resulted in nothing in the old toolchain, but now it does print the name of the author. There are only two authors mentioned, in NLS and GEQO, though. In the GEQO case it's a bit funny because the author is now mentioned twice. I propose to remove those two author sections. In the GEQO chapter, the information is already present, and in the NLS case, well, I don't care. Speaking of GEQO: the links in its 53.4 Further Reading section don't look well in the FOP. And the bibliography looks completely different. I see. That is customizable, but maybe not the way it looks in the old output. We might need to reassess the entire bibliography at some point anyway. It's not really well-maintained. Oh, the index at the end is not output. Fixed in separate
Re: [HACKERS] buildfarm failures on smew and anole
On Mon, 2013-10-14 at 18:14 -0400, Robert Haas wrote: I cleaned the semaphores on smew, but they came back. Whatever is crashing is leaving the semaphores lying around. Ugh. When did you do that exactly? I thought I fixed the problem that was causing that days ago, and the last 4 days worth of runs all show the too many clients error. I did it a few times over the weekend. At least twice less than 4 days ago. There are currently no semaphores left around, so whatever happened in the last run cleaned it up. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pg_sleep(interval)
On 09/30/2013 01:47 PM, Vik Fearing wrote: Yes, I understand you are trying to help, and I appreciate it! My opinion, and that of others as well from the original thread, is that this patch should either go in as is and break that one case, or not go in at all. I'm fine with either (although clearly I would prefer it went in otherwise I wouldn't have written the patch). I see this is marked as rejected in the commitfest app, but I don't see any note about who did it or why. I don't believe there is consensus for rejection on this list. In fact I think the opposite is true. May we have an explanation please from the person who rejected this without comment? -- Vik -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Tue, Oct 15, 2013 at 2:25 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-10-15 10:53:35 -0700, Peter Geoghegan wrote: On Tue, Oct 15, 2013 at 10:29 AM, Andres Freund and...@2ndquadrant.com wrote: I think anything that only works by breaking visibility rules that way is a nonstarter. Doing that from the C level is one thing, exposing it this way seems a bad idea. What visibility rule is that? The early return you added to HTSMVCC. At the very least it opens you to lots of halloween problem like scenarios. The term visibility rule as you've used it here is suggestive of some authoritative rule that should obviously never even be bent. I'd suggest that what Postgres does isn't very useful as an authority on this matter, because Postgres doesn't have upsert. Besides, today Postgres doesn't just bend the rules (that is, some kind of classic notion of MVCC as described in Concurrency Control in Distributed Database Systems or something), it totally breaks them, at least in READ COMMITTED mode (and what I've proposed here just occurs in RC mode). It is not actually in evidence that this approach introduces Halloween problems. In order for HTSMVCC to controversially indicate visibility under my scheme, it is not sufficient for the row version to just be exclusive locked by our xact without otherwise being visible - it must also *not be updated*. Now, I'll freely admit that this could still be problematic - there might have been a subtlety I missed. But since an actual example of where this is problematic hasn't been forthcoming, I take it that it isn't obvious to either yourself or Robert that it actually is. Any scheme that involves playing cute tricks with visibility (which is to say, any credible upsert implementation) needs very careful thought. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
(2013/10/15 22:01), k...@rice.edu wrote: Google's lz4 is also a very nice algorithm with 33% better compression performance than snappy and 2X the decompression performance in some benchmarks also with a bsd license: https://code.google.com/p/lz4/ If we judge only performance, we will select lz4. However, we should think another important factor which is software robustness, achievement, bug fix history, and etc... If we see unknown bugs, can we fix it or improve algorithm? It seems very difficult, because we only use it and don't understand algorihtms. Therefore, I think that we had better to select robust and having more user software. Regards, -- Mitsumasa KONDO NTT Open Source Software -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standby catch up state change
On Tue, Oct 15, 2013 at 4:51 PM, Andres Freund and...@2ndquadrant.comwrote: I think you're over-intrepreting it. I think you are right. Someone who understands the replication code very well advised us to use that log message as a way to measure how much time it takes to send all the missing WAL to a remote standby on a slow WAN link. While it worked well for all measurements, when we use a middleware which caches a lot of traffic on the sender side, this log message was very counter intuitive. It took several more minutes for the standby to actually receive all the WAL files and catch up after the message was displayed on the master side. But then as you said, may be relying on the message was not the best way to measure the time. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee