Re: [HACKERS] patch to slightly improve clarity of a comment in postgresql.conf.sample
Sure, I like that more. On Thu, Aug 25, 2011 at 9:29 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Aug 24, 2011 at 7:33 AM, Dougal Sutherland dou...@gmail.com wrote: The attached change to postgresql.conf.sample makes it more clear at a glance that the default value of listen_addresses is 'localhost', not 'localhost, *'. This would have saved a friend an hour or two of fiddling tonight. How about: # defaults to 'localhost'; use '*' for all -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Removal of useless include references
* Bruce Momjian wrote: The attached patch removes unneeded include references, and marks some includes as needing to be skipped by pgrminclude. There are several unrelated changes to pg_upgrade in that patch, too. -- Christian -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 08/25/2011 04:57 PM, Tomas Vondra wrote: (b) sends bgwriter stats (so that the buffers_checkpoint is updated) The idea behind only updating the stats in one chunk, at the end, is that it makes one specific thing easier to do. Let's say you're running a monitoring system that is grabbing snapshots of pg_stat_bgwriter periodically. If you want to figure out how much work a checkpoint did, you only need two points of data to compute that right now. Whenever you see either of the checkpoint count numbers increase, you just subtract off the previous sample; now you've got a delta for how many buffers that checkpoint wrote out. You can derive the information about the buffer counts involved that appears in the logs quite easily this way. The intent was to make that possible to do, so that people can figure this out without needing to parse the log data. Spreading out the updates defeats that idea. It also makes it possible to see the buffer writes more in real-time, as they happen. You can make a case for both approaches having their use cases; the above is just summarizing the logic behind why it's done the way it is right now. I don't think many people are actually doing things with this to the level where their tool will care. The most popular consumer of pg_stat_bgwriter data I see is Munin graphing changes, and I don't think it will care either way. Giving people the option of doing it the other way is a reasonable idea, but I'm not sure there's enough use case there to justify adding a GUC just for that. My next goal here is to eliminate checkpoint_segments, not to add yet another tunable extremely few users would ever touch. As for throwing more log data out, I'm not sure what new analysis you're thinking of that it allows. I/O gets increasingly spiky as you zoom in on it; averaging over a shorter period can easily end up providing less insight about trends. If anything, I spend more time summarizing the data that's already there, rather than wanting to break them down. It's already providing way too much detail for most people. Customers tell me they don't care to see checkpoint stats unless they're across a day or more of sampling, so even the current once every ~5 minutes is way more info than they want. I have all this log parsing code and things that look at pg_stat_bgwriter to collect that data and produce higher level reports. And lots of it would break if any of this patch is added and people turn it on. I imagine other log/stat parsing programs might suffer issues too. That's your other hurdle for change here: the new analysis techniques have to be useful enough to justify that some downstream tool disruption is inevitable. If you have an idea for how to use this extra data for something useful, let's talk about what that is and see if it's possible to build it in instead. This problem is harder than it looks, mainly because the way the OS caches writes here makes trying to derive hard numbers from what the background writer is doing impossible. When the database writes things out, and when they actually get written to disk, they are not the same event. The actual write is often during the sync phase, and not being able to tracking that beast is where I see the most problems at. The write phase, the easier part to instrument in the database, that is pretty boring. That's why the last extra logging I added here focused on adding visibility to the sync activity instead. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On Thu, Aug 25, 2011 at 22:57, Tomas Vondra t...@fuzzy.cz wrote: Hello, I'd like to propose a small patch that allows better checkpoint progress monitoring. The patch is quite simple - it adds a new integer GUC checkpoint_update_limit and every time checkpoint writes this number of buffers, it does two things: (a) logs a checkpoint status message into the server log, with info about total number of buffers to write, number of already written buffers, current and average write speed and estimate of remaining time (b) sends bgwriter stats (so that the buffers_checkpoint is updated) I believe this will make checkpoint tuning easier, especially with large shared bufferers and large when there's other write activity (so that it's difficult to see checkpoint I/O). The default value (0) means this continuous logging is disabled. This seems like the wrong thing to write to the log. It's really only useful *during* the checkpoint run, isn't it? If so, I think it should go in a pg_stat view. In theory, this could be the progress view or progress field talked about around Gregs previous patch - or it could just be modifying the commandstring in pg_stat_activity. Either way, it should be updated in shared memory in that case (like current query is), and not sent with a message to the collector. IMHO, of course ;) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 26 Srpen 2011, 9:35, Greg Smith wrote: On 08/25/2011 04:57 PM, Tomas Vondra wrote: (b) sends bgwriter stats (so that the buffers_checkpoint is updated) As for throwing more log data out, I'm not sure what new analysis you're thinking of that it allows. I/O gets increasingly spiky as you zoom in on it; averaging over a shorter period can easily end up providing less insight about trends. If anything, I spend more time summarizing the data that's already there, rather than wanting to break them down. It's already providing way too much detail for most people. Customers tell me they don't care to see checkpoint stats unless they're across a day or more of sampling, so even the current once every ~5 minutes is way more info than they want. I have all this log parsing code and things that look at pg_stat_bgwriter to collect that data and produce higher level reports. And lots of it would break if any of this patch is added and people turn it on. I imagine other log/stat parsing programs might suffer issues too. That's your other hurdle for change here: the new analysis techniques have to be useful enough to justify that some downstream tool disruption is inevitable. I was aware that by continuously updating pg_stat_bgwriter, the data won't be synchronized (i.e. the buffers_checkpoint counters will change while the number of requested/timed checkpoints remain the same). But does that really break the tools that process the data? When you're working with summarized data, the result should be more or less the same as the difference will be smoothed out by averaging etc. You can always see just one in progress checkpoint, so if you get 24 checkpoints a day, the difference will be 1/24 of a checkpoint. Yes, it's a difference. A really crazy workaround would be to change checkpoints_requested / checkpoints_timed to double, and use that to indicate current progress of the checkpoint. So for example 10.54 would mean 10 checkpoints completed, one checkpoint in progress, already written 54% of blocks. But yes, that's a bit crazy. If you have an idea for how to use this extra data for something useful, let's talk about what that is and see if it's possible to build it in instead. This problem is harder than it looks, mainly because the way the OS caches writes here makes trying to derive hard numbers from what the background writer is doing impossible. When the database writes things out, and when they actually get written to disk, they are not the same event. The actual write is often during the sync phase, and not being able to tracking that beast is where I see the most problems at. The write phase, the easier part to instrument in the database, that is pretty boring. That's why the last extra logging I added here focused on adding visibility to the sync activity instead. Hmmm, let me explain what led me to this patch - right now I'm doing a comparison of filesystems with various block sizes (both fs and db blocks). I've realized that the db block size significantly influences frequency of checkpoints and amount of data to write, so I'm collecting data from pg_stat_bgwriter too. The benchmark goes like this 1. collect pg_stat_bgwriter stats 2. run pgbench for 10 minutes 3. collect pg_stat_bgwriter stats (to compute difference with (1)) 4. kill the postmaster The problem is that when checkpoint stats are collected, there might be a checkpoint in progress and in that case the stats are incomplete. In some cases (especially with very small db blocks) this has significant impact because the checkpoints are less frequent. I can't infer this from other data (e.g. iostat), because that does allow me what I/O is caused by the checkpoint. Yes, this does not consider sync timing, but in my case that's not a big issue (the page cache is rather small so the data are actually forced to the drive soon). I could probably live with keeping the current pg_stat_bgwriter logic (i.e. updating just once) and writing checkpoint status just to the log. I don't think that should break any current tools that parse logs, because the message is completely different (prefixed with 'checkpoint status') so any reasonably written tool should be OK. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache
Robert, Thanks for your reviewing. For me, the line you removed from dml.out causes the regression tests to fail. Fixed. Why did I removed this line?? I don't understand what this is going for: + /* + * To boost up trusted procedure checks on db_procedure object + * class, we also confirm the decision when user calls a procedure + * labeled as 'tcontext'. + */ Can you explain? Yes. It also caches an expected security label when a client being labeled as scontext tries to execute a procedure being labeled as tcontext, to reduce number of system call invocations on fmgr_hook and needs_fmgr_hook. If the expected security label is not same with scontext, it means the procedure performs as a trusted procedure that switches security label of the client during its execution; like a security invoker function. A pair of security labels are the only factor to determine whether the procedure is a trusted-procedure, or not. Thus, it is suitable to cache in userspace avc. As an aside, the reason why we don't cache the default security label being assigned on newly created named objects (such as tables, ...) is that selinux allows to set up exceptional default security label on a particular name, so it does not suitable for avc structure. (I'm waiting for getting included this interface into libselinux.) sepgsql_avc_check_perms_label has a formatting error on the line that says result = false. It's not indented correctly. OK, I fixed it. Several functions do this: sepgsql_avc_check_valid(); do { ... } while (!sepgsql_avc_check_valid); I don't understand why we need a loop there. It enables to prevent inconsistent access control decision from concurrent security policy reloading. I want the following steps being executed in atomic. 1) Lookup object class number in kernel-side 2) Lookup permission bits in kernel-side 3) Ask kernel-side its access control decision. The selinux_status_update returns 1, if any status of selinux in kernel side (that requires to flush userspace caches) had been changed since the last invocation. In this case, we retry whole of the process from the beginning to ensure whole of access control decision being made by either old or new policy. Thus, I enclosed these blocks by do {...} while() loop. The comment for sepgql_avc_check_perms_label uses the word elsewhere when it really means otherwise. OK, I fixed it. Changing the calling sequence of sepgsql_get_label() would perhaps be better separated out into its own patch. OK, I reverted it. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp configure.in |4 +- contrib/sepgsql/Makefile |2 +- contrib/sepgsql/dml.c | 59 +++--- contrib/sepgsql/hooks.c| 64 +++--- contrib/sepgsql/proc.c | 68 ++- contrib/sepgsql/relation.c | 69 +++ contrib/sepgsql/schema.c | 39 ++-- contrib/sepgsql/selinux.c |2 +- contrib/sepgsql/sepgsql.h | 18 ++- contrib/sepgsql/uavc.c | 511 doc/src/sgml/sepgsql.sgml | 12 +- 11 files changed, 649 insertions(+), 199 deletions(-) diff --git a/configure.in b/configure.in index a844afc..b444358 100644 --- a/configure.in +++ b/configure.in @@ -964,8 +964,8 @@ fi # for contrib/sepgsql if test $with_selinux = yes; then - AC_CHECK_LIB(selinux, selinux_sepgsql_context_path, [], - [AC_MSG_ERROR([library 'libselinux', version 2.0.93 or newer, is required for SELinux support])]) + AC_CHECK_LIB(selinux, selinux_status_open, [], + [AC_MSG_ERROR([library 'libselinux', version 2.0.99 or newer, is required for SELinux support])]) fi # for contrib/uuid-ossp diff --git a/contrib/sepgsql/Makefile b/contrib/sepgsql/Makefile index 1978ccf..e273d8f 100644 --- a/contrib/sepgsql/Makefile +++ b/contrib/sepgsql/Makefile @@ -1,7 +1,7 @@ # contrib/sepgsql/Makefile MODULE_big = sepgsql -OBJS = hooks.o selinux.o label.o dml.o \ +OBJS = hooks.o selinux.o uavc.o label.o dml.o \ schema.o relation.o proc.o DATA_built = sepgsql.sql REGRESS = label dml misc diff --git a/contrib/sepgsql/dml.c b/contrib/sepgsql/dml.c index 22666b7..3199337 100644 --- a/contrib/sepgsql/dml.c +++ b/contrib/sepgsql/dml.c @@ -150,12 +150,11 @@ check_relation_privileges(Oid relOid, uint32 required, bool abort) { - char relkind = get_rel_relkind(relOid); - char *scontext = sepgsql_get_client_label(); - char *tcontext; + ObjectAddress object; char *audit_name; Bitmapset *columns; int index; + char relkind = get_rel_relkind(relOid); bool result = true; /* @@ -184,45 +183,43 @@ check_relation_privileges(Oid relOid, /* * Check permissions on the relation */ - tcontext = sepgsql_get_label(RelationRelationId, relOid, 0); - audit_name = getObjectDescriptionOids(RelationRelationId, relOid); + object.classId = RelationRelationId; + object.objectId = relOid; + object.objectSubId = 0; + audit_name =
Re: [HACKERS] tsvector concatenation - backend crash
Hi Attached SQL files gives (at least in my hands) a reliable backend crash with this stacktrace .. reproduced on both 9.0.4 and HEAD. I'm sorry I cannot provide a more trimmed down set of vectors the reproduces the bug, thus the obsfucated dataset. But even deleting single terms in the vectors make the bug go away. Ok, I found 8.3.0 to be good so i ran a git bisect on it.. it gave me this commit: e6dbcb72fafa4031c73cc914e829a6dec96ab6b6 is the first bad commit commit e6dbcb72fafa4031c73cc914e829a6dec96ab6b6 Author: Tom Lane t...@sss.pgh.pa.us Date: Fri May 16 16:31:02 2008 + Extend GIN to support partial-match searches, and extend tsquery to support prefix matching using this facility. Teodor Sigaev and Oleg Bartunov :04 04 febf59ba02bcd4ce3863e880c6bbd989e0b7b1d2 5e96383e628dd27b5c68b0186af18f80fb7ef129 M doc :04 04 b920deca6f074b83dd5d2bd0446785a23019d11a 3f10e54cdeac63129f34865adcadf34ff74ff9a8 M src bisect run success Which means that 8.3 releases are OK, but 8.4 and forward has the problem. Which at least touches the same area.. the patch is allthogh over 3K lines, and my C-skills are not that good. Attached is the git bisect script.. just for the archives. Jesper git-bisect-script Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Questions and experiences writing a Foreign Data Wrapper
I wrote: I wrote a FDW for Oracle to a) learn some server coding and b) see how well the FDW API works for me. I have released the software on PgFoundry: http://oracle-fdw.projects.postgresql.org/ Would it make sense to mention that in chapter 5.10 of the documentation? Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removal of useless include references
Tom Lane wrote: Bruce Momjian br...@momjian.us writes: It has been years since I ran src/tools/pginclude/pgrminclude to remove unnecessary include files. (I have already fixed things so include files can be compiled on their own.) The attached patch removes unneeded include references, and marks some includes as needing to be skipped by pgrminclude. I am sure applying this patch will break builds on some platforms and some option combinations so I will monitor the buildfarm when I apply it and make adjustments. The last time you did this was in July 2006. It took us two weeks to mostly recover, but we were still dealing with some fallout in December, cf http://archives.postgresql.org/pgsql-hackers/2006-12/msg00491.php We had the buildfarm then, had had it for a couple years. The notion that watching the buildfarm is enough is fully disproven by history. Unless you have a better test plan than last time (which this isn't), I don't think this should be done at all. The benefits are microscopic and the pain real. I don't have a better plan. There are #ifdef code blocks that often don't get processed and therefore this can't be done better. I will abandon the idea. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsvector concatenation - backend crash
Jesper Krogh jes...@krogh.cc writes: On 2011-08-26 05:28, Tom Lane wrote: Hm ... I can reproduce this on one of my usual machines, but not another. What platform are you on exactly? 64 bit Ubuntu Lucid (amd64). Huh, weird ... because the platform it's not failing for me on is Fedora 14 x86_64. Which is annoying, because that machine has better tools for looking for memory stomps than the 32-bit HP box where I do see the problem. Anyway, will see what I can find. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Questions and experiences writing a Foreign Data Wrapper
On 08/26/2011 07:27 AM, Albe Laurenz wrote: I wrote: I wrote a FDW for Oracle to a) learn some server coding and b) see how well the FDW API works for me. I have released the software on PgFoundry: http://oracle-fdw.projects.postgresql.org/ Would it make sense to mention that in chapter 5.10 of the documentation? I don't think so, any more than any other external module should be mentioned in the docs. There are known FDWs for several well known external databases, several file formats, and more exotic data sources such as twitter. I don't think we want to maintain a list of these in the docs. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Questions and experiences writing a Foreign Data Wrapper
Andrew Dunstan and...@dunslane.net writes: On 08/26/2011 07:27 AM, Albe Laurenz wrote: http://oracle-fdw.projects.postgresql.org/ Would it make sense to mention that in chapter 5.10 of the documentation? I don't think so, any more than any other external module should be mentioned in the docs. There are known FDWs for several well known external databases, several file formats, and more exotic data sources such as twitter. I don't think we want to maintain a list of these in the docs. Wiki page, maybe? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Fast GiST index build
On Thu, Aug 25, 2011 at 11:08 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Could you share the test scripts, patches and data sets etc. needed to reproduce the tests you've been running? I'd like to try them out on a test server. 1) I've updated links to the datasets on the wiki page. 2) Script for index quality testing fastbuild_test.php is in the attachment. In order to run it you need PHP with pdo and pdo_pgsql modules. Also plantuner moduler is required (it is used to force planer to use specific index). After running that script following query returns relative score of index quality: select indexname, avg(count::real/(select count from test_result a2 where a2.indexname = 'usnoa2_idx3' and a2.predicate = a1.predicate and a2.tablename = a1.tablename)::real) from test_result a1 where a1.tablename = 'usnoa2' group by indexname; where 'usnoa2' - table name, 'usnoa2_idx3' - name of index which quality was assumed to be 1. 3) Patch which makes plantuner work with HEAD is also in attachment. 4) Patch with my split algorithm implementation is attached. Now it's form is appropriate only for testing purposes. 5) For indexes creation I use simple script which is attached as 'indexes.sql'. Also, similar script with different index names I'm running with my split patch. Feel free to ask questions about all this stuff. -- With best regards, Alexander Korotkov. fastbuild_test.php.gz Description: GNU Zip compressed data plantuner.patch.gz Description: GNU Zip compressed data my_split.patch.gz Description: GNU Zip compressed data select pg_stat_statements_reset(); set log_statement_stats = on; set synchronize_seqscans = off; create index uniform_idx1 on uniform using gist(point) with (buffering=on); create index uniform_idx2 on uniform using gist(point) with (buffering=auto); create index uniform_idx3 on uniform using gist(point) with (buffering=off); create index usnoa2_idx1 on usnoa2 using gist(point) with (buffering=on); create index usnoa2_idx2 on usnoa2 using gist(point) with (buffering=auto); create index usnoa2_idx3 on usnoa2 using gist(point) with (buffering=off); create index usnoa2_shuffled_idx1 on usnoa2_shuffled using gist(point) with (buffering=on); create index usnoa2_shuffled_idx2 on usnoa2_shuffled using gist(point) with (buffering=auto); create index usnoa2_shuffled_idx3 on usnoa2_shuffled using gist(point) with (buffering=off); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Fast GiST index build
On Thu, Aug 25, 2011 at 10:53 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: In the tests on the first version of patch I found index quality of regular build much better than it of buffering build (without neighborrelocation). Now it's similar, though it's because index quality of regular index build become worse. There by in current tests regular index build is faster than in previous. I see following possible causes of it: 1) I didn't save source random data. So, now it's a new random data. 2) Some environment parameters of my test setup may alters, though I doubt. Despite these possible explanation it seems quite strange for me. That's pretty surprising. Assuming the data is truly random, I wouldn't expect a big difference in the index quality of one random data set over another. If the index quality depends so much on, say, the distribution of the few first tuples that are inserted to it, that's a quite interesting find on its own, and merits some further research. Yeah, it's pretty strange. Using same random datasets in different tests can help to exclude onepossible cause of difference. In order to compare index build methods on more qualitative indexes, I've tried to build indexes with my double sorting split method (see: http://syrcose.ispras.ru/2011/**files/SYRCoSE2011_Proceedings.** pdf#page=36http://syrcose.ispras.ru/2011/files/SYRCoSE2011_Proceedings.pdf#page=36). So on uniform dataset search is faster in about 10 times! And, as it was expected, regular index build becomes much slower. It runs more than 60 hours and while only 50% of index is complete (estimated by file sizes). Also, automatic switching to buffering build shows better index quality results in all the tests. While it's hard for me to explain that. Hmm, makes me a bit uneasy that we're testing with a modified page splitting algorithm. But if the new algorithm is that good, could you post that as a separate patch, please? I've post it in another message and I will try to get it into more appropriate form. Let me clarify this a little. I don't think my split algorithm is 10 times better than state of the art algorithms. I think that currently used new linear split shows unreasonably bad results in may cases. For example, uniformly distributed data is pretty easy case. And with almost any splitting algorithm we can get index with almost zero overlaps. But new linear split produces huge overlaps in this case. That's why I decided to make some experiments with another split algorithm. -- With best regards, Alexander Korotkov.
Re: [HACKERS] Questions and experiences writing a Foreign Data Wrapper
On Fri, Aug 26, 2011 at 3:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Dunstan and...@dunslane.net writes: On 08/26/2011 07:27 AM, Albe Laurenz wrote: http://oracle-fdw.projects.postgresql.org/ Would it make sense to mention that in chapter 5.10 of the documentation? I don't think so, any more than any other external module should be mentioned in the docs. There are known FDWs for several well known external databases, several file formats, and more exotic data sources such as twitter. I don't think we want to maintain a list of these in the docs. Wiki page, maybe? For example this one: http://wiki.postgresql.org/wiki/Foreign_data_wrappers -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore --no-post-data and --post-data-only
On 08/25/2011 06:15 PM, Andrew Dunstan wrote: But we could also add these switches to pg_dump too if people feel it's worthwhile. I haven't looked but the logic should not be terribly hard. Something like the attached, in fact, which seems pretty simple. cheers andrew diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index f6cd7eb..e9b4cc6 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -140,6 +140,8 @@ static int column_inserts = 0; static int no_security_labels = 0; static int no_unlogged_table_data = 0; static int serializable_deferrable = 0; +static int exclude_post_data = 0; +static int post_data_only = 0; static void help(const char *progname); @@ -334,6 +336,8 @@ main(int argc, char **argv) {use-set-session-authorization, no_argument, use_setsessauth, 1}, {no-security-labels, no_argument, no_security_labels, 1}, {no-unlogged-table-data, no_argument, no_unlogged_table_data, 1}, + {no-post-data, no_argument, exclude_post_data, 1}, + {post-data-only, no_argument, post_data_only, 1}, {NULL, 0, NULL, 0} }; @@ -790,7 +794,7 @@ main(int argc, char **argv) dumpStdStrings(g_fout); /* The database item is always next, unless we don't want it at all */ - if (include_everything !dataOnly) + if (include_everything !dataOnly !post_data_only) dumpDatabase(g_fout); /* Now the rearrangeable objects. */ @@ -876,6 +880,8 @@ help(const char *progname) printf(_( --no-unlogged-table-datado not dump unlogged table data\n)); printf(_( --quote-all-identifiers quote all identifiers, even if not key words\n)); printf(_( --serializable-deferrable wait until the dump can run without anomalies\n)); + printf(_( --no-post-data do not dump constraints, indexes, rules, triggers\n)); + printf(_( --post-data-onlyonly dump constraints, indexes, rules, triggers\n)); printf(_( --use-set-session-authorization\n use SET SESSION AUTHORIZATION commands instead of\n ALTER OWNER commands to set ownership\n)); @@ -7023,6 +7029,25 @@ collectComments(Archive *fout, CommentItem **items) static void dumpDumpableObject(Archive *fout, DumpableObject *dobj) { + + int skip = 0; + + switch (dobj-objType) + { + case DO_INDEX: + case DO_TRIGGER: + case DO_CONSTRAINT: + case DO_FK_CONSTRAINT: + case DO_RULE: + skip = exclude_post_data; + break; + default: + skip = post_data_only; + } + + if (skip) + return; + switch (dobj-objType) { case DO_NAMESPACE: -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_upgrade problem
On Fri, Aug 26, 2011 at 12:18:55AM -0400, Bruce Momjian wrote: OK, this was very helpful. I found out that there is a bug in current 9.0.X, 9.1.X, and HEAD that I introduced recently when I excluded temp tables. (The bug is not in any released version of pg_upgrade.) The attached, applied patches should fix it for you. I assume you are running 9.0.X, and not 9.0.4. pg_upgrade worked. Now I'm doing reindex and later on vacuumdb -az. will keep you posted. Best regards, depesz -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore --no-post-data and --post-data-only
On Fri, Aug 26, 2011 at 11:22 AM, Andrew Dunstan and...@dunslane.net wrote: But we could also add these switches to pg_dump too if people feel it's worthwhile. I haven't looked but the logic should not be terribly hard. Something like the attached, in fact, which seems pretty simple. It seems like there are three sets of things you might want here: pre-data, data, post-data. So in the end we could end up with: --pre-data-only --post-data-only --data-only --no-pre-data --no-post-data --no-data And then maybe someone will want just the create index commands and not the constraint commands. It seems like it might be more elegant to come up with a single switch where you can list which things you want: --sections='predata data' --sections='postdata' --sections='index' Just thinking out loud -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore --no-post-data and --post-data-only
On Fri, 2011-08-26 at 12:46 -0400, Robert Haas wrote: --sections='predata data' --sections='postdata' --sections='index' Agreed. After command line options reach a certain level of complexity, I think it's worth looking for a more general way to express them. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore --no-post-data and --post-data-only
On 08/26/2011 12:46 PM, Robert Haas wrote: On Fri, Aug 26, 2011 at 11:22 AM, Andrew Dunstanand...@dunslane.net wrote: But we could also add these switches to pg_dump too if people feel it's worthwhile. I haven't looked but the logic should not be terribly hard. Something like the attached, in fact, which seems pretty simple. It seems like there are three sets of things you might want here: pre-data, data, post-data. So in the end we could end up with: --pre-data-only --post-data-only --data-only --no-pre-data --no-post-data --no-data And then maybe someone will want just the create index commands and not the constraint commands. It seems like it might be more elegant to come up with a single switch where you can list which things you want: --sections='predata data' --sections='postdata' --sections='index' Just thinking out loud I knew there would be some bike-shedding about how we specify these things, which is why I haven't written docs yet. All the possibilities you specify except for the indexes section can be done by using these switches in combination with -s and -a. For anything more fine-grained, I'm inclined to say that people need to roll their own. pg_restore's --list and --use-list give you extremely fine-grained control. I have working scripts which use these for example to filter out londiste and pgq objects, certain large tables, audit objects and more. As an example of the complexity I think we should avoid, which section would UNIQUE and PRIMARY KEY constraints belong in? constraints because that's what they are, or indexes because that's what they create? No matter which answer you choose someone will claim you have violated POLA. Chopping things into pre-data, data and post-data would get us around 99% of the cases we could reasonably provide for in my experience. That seems enough :-) I don't have anything in principle against your '--sections=foo bar' suggestion, but it would be more work to program. Simpler, and probably more consistent with how we do other things, would be allowing multiple --section options, if we don't want to have named options such as I have provided. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 08/26/2011 03:54 AM, Magnus Hagander wrote: In theory, this could be the progress view or progress field talked about around Gregs previous patch - or it could just be modifying the commandstring in pg_stat_activity. Right. The whole progress indicator idea is hard to do for queries in general. But there's enough of these other progress indicator ideas around now that it may be worth putting a standard way to handle them in here. It sounds like that would be sufficient to address the area Tomas is trying to instrument better. I badly want a progress indicator on CREATE INDEX CONCURRENTLY too, to at least let me know what phase of the build process it's on. That's turned into a major headache recently. If we run with the idea of just allowing backends to publish a progress text string, I think this one maps into a similar space as the autovacuum one. Publishing how many seconds the operation has been running for may be reasonable too. Whether the overhead of the timing calls necessary to compute that will be high or not depends on the refresh rate of the progress info. My suggestion before was to name these as key=value pairs for easy parsing; here's three examples now: autovacumm: pgbench_accounts h=182701 m=301515 d=321345 s=62.231 (cache hits, cache misses, dirty writes, seconds) background writer: checkpoint b=511 t=3072 s=5.321 (buffers written, total, seconds) create index concurrently: pgbench_accounts p=1 b=62 t=6213 s=81.232 (phase, blocks processed, total block estimate, seconds) I think that the idea of making this easily human readable is optimistic, because it will make all these strings big enough to start mattering. Given that, we almost have to assume the only consumers of this data will be able to interpret it using the documentation. I'd be happy with just the minimal data set in each case, not including any statistics you can easily derive from the values given (like the MB/s readings). Adding that figure in particular to more of the log messages would be nice though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 26 Srpen 2011, 9:54, Magnus Hagander wrote: This seems like the wrong thing to write to the log. It's really only useful *during* the checkpoint run, isn't it? If so, I think it should go in a pg_stat view. In theory, this could be the progress view or progress field talked about around Gregs previous patch - or it could just be modifying the commandstring in pg_stat_activity. Either way, it should be updated in shared memory in that case (like current query is), and not sent with a message to the collector. I personally find it handy for example when I need to find out why performance degraded at a certain point in the past. Now I can see there was a checkpoint (thanks to log_checkpoints=on), but I don't know any details about it's progress. I already collect info from the pg_stat_bgwriter, that's why I thought I could update it more often. The log file is a natural destination for such information, IMHO. I see that as an extension to the current checkpoint messages that are written to the log. The commandstring approach is probably fine for interactive work, but not very handy when you need to analyze something that already happened. Unless you collected the data, of course. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 26 Srpen 2011, 19:17, Greg Smith wrote: On 08/26/2011 03:54 AM, Magnus Hagander wrote: In theory, this could be the progress view or progress field talked about around Gregs previous patch - or it could just be modifying the commandstring in pg_stat_activity. Right. The whole progress indicator idea is hard to do for queries in general. But there's enough of these other progress indicator ideas around now that it may be worth putting a standard way to handle them in here. It sounds like that would be sufficient to address the area Tomas is trying to instrument better. I badly want a progress indicator on CREATE INDEX CONCURRENTLY too, to at least let me know what phase of the build process it's on. That's turned into a major headache recently. If we run with the idea of just allowing backends to publish a progress text string, I think this one maps into a similar space as the autovacuum one. Publishing how many seconds the operation has been running for may be reasonable too. Whether the overhead of the timing calls necessary to compute that will be high or not depends on the refresh rate of the progress info. My suggestion before was to name these as key=value pairs for easy parsing; here's three examples now: autovacumm: pgbench_accounts h=182701 m=301515 d=321345 s=62.231 (cache hits, cache misses, dirty writes, seconds) background writer: checkpoint b=511 t=3072 s=5.321 (buffers written, total, seconds) create index concurrently: pgbench_accounts p=1 b=62 t=6213 s=81.232 (phase, blocks processed, total block estimate, seconds) I think that the idea of making this easily human readable is optimistic, because it will make all these strings big enough to start mattering. Given that, we almost have to assume the only consumers of this data will be able to interpret it using the documentation. I'd be happy with just the minimal data set in each case, not including any statistics you can easily derive from the values given (like the MB/s readings). Adding that figure in particular to more of the log messages would be nice though. I'm a bit confused - are you talking about updating process title or about writing the info to log? The process title is probably fine for watching the progress interactively, but it really does not solve what I need. I need to be able to infer the progress for past events, so I'd have to sample the 'ps ax' output regularly. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore --no-post-data and --post-data-only
On Fri, Aug 26, 2011 at 1:15 PM, Andrew Dunstan and...@dunslane.net wrote: I don't have anything in principle against your '--sections=foo bar' suggestion, but it would be more work to program. Simpler, and probably more consistent with how we do other things, would be allowing multiple --section options, if we don't want to have named options such as I have provided. I wouldn't object to that, but more work to program probably means about an extra 10 lines of code in this particular case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore --no-post-data and --post-data-only
Excerpts from Robert Haas's message of vie ago 26 15:36:36 -0300 2011: On Fri, Aug 26, 2011 at 1:15 PM, Andrew Dunstan and...@dunslane.net wrote: I don't have anything in principle against your '--sections=foo bar' suggestion, but it would be more work to program. Simpler, and probably more consistent with how we do other things, would be allowing multiple --section options, if we don't want to have named options such as I have provided. I wouldn't object to that, but more work to program probably means about an extra 10 lines of code in this particular case. The --section=data --section=indexes proposal seems very reasonable to me -- more so than --sections='data indexes'. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inputting relative datetimes
On Aug 25, 2011, at 5:08 AM, Dean Rasheed wrote: Funny you should mention intervals... timestamptz 'today' - interval '5 days' timestamptz 'now' + interval '2 hours' Yes, but what I am trying to achieve is a way of entering such relative timestamps using a single input value, so that absolute and relative timestamps can both be bound to a SQL query using just one variable. Even if the community doesn't want to add this to core, I think it would be a great add-on to put on PGXN. If you don't feel up to writing it themselves, perhaps you would pay one of the consulting companies to do it for them? -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore --no-post-data and --post-data-only
On Aug 26, 2011, at 12:15 PM, Andrew Dunstan wrote: I knew there would be some bike-shedding about how we specify these things, which is why I haven't written docs yet. While we're debating what shade of yellow to paint the shed... My actual use case is to be able to be able to inject SQL into a SQL-formatted dump either pre- or post-data (I'm on 8.3, so I don't actually dump any data; I'm *mostly* emulating the ability to dump data on just certain tables). So for what I'm doing, the ideal interface would be a way to tell pg_dump When you're done dumping all table structures but before you get to any constraints, please run $COMMAND and inject it's output into the dump output. For some of the data obfuscation we're doing it would be easiest if $COMMAND was a perl script instead of SQL, but we could probably convert it. Of course, many other folks actually need the ability to just spit out specific portions of the dump; I'm hoping we can come up with something that supports both concepts. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Cryptic error message in low-memory conditions
Hello list, This is something that I've only recently somewhat pinned down to a cause... Some Postgres servers will error out for a while with the following error message: expected authentication request from server, but received c If one uses Their Favorite Search Engine, this message is scattered around the internet, all in reference to Postgres, I think, but none of the top results seem to have any lucid responses or cause listed. We've seen this reproduce -- sometimes for minutes at a time -- and after catching one in the act I am reasonably confident that one common cause of this is systems that are low on memory, which I confirmed by looking at postgres logs and matching them up against our monitoring system. Critical statistics first: the systems run Linux with overcommit off, so malloc returns NULL now and again. There is no OOM killer activity. SSL is the transport, and SQL role password authentication is in use. There is no swap. Here's an example of the various kinds of failure one can get from connecting to a system that is low on memory: 2011-08-26 16:03:06 | INFO psql? failed with exception #PGError: FATAL: out of memory DETAIL: Failed on request of size 488. 2011-08-26 16:02:27 | INFO psql? failed with exception #PGError: expected authentication request from server, but received c 2011-08-26 16:01:51 | INFO psql? failed with exception #PGError: expected authentication request from server, but received c 2011-08-26 16:01:15 | INFO psql? failed with exception #PGError: expected authentication request from server, but received c 2011-08-26 16:00:39 | INFO psql? failed with exception #PGError: expected authentication request from server, but received c 2011-08-26 16:00:01 | INFO psql? failed with exception #PGError: expected authentication request from server, but received c 2011-08-26 15:59:25 | INFO psql? failed with exception #PGError: expected authentication request from server, but received c 2011-08-26 15:58:48 | INFO psql? failed with exception #PGError: expected authentication request from server, but received c 2011-08-26 15:58:12 | INFO psql? failed with exception #PGError: FATAL: out of memory On the backend side, one can see that often there is a failure to fork, which is basically expected in this condition. Various statements will be reporting OOM also. The commonality of an error message that does not say anything about being out of memory is representative of the norm, and nominally one does not get any express indication that the system is out of memory, but otherwise responsive. This puts someone doing monitoring (like us) in a tricky position: the utilizer of the database is free to use their memory -- that's what it's for -- but the problem is we cannot determine that the server is basically online, if fully utilized. This defeats the ever-common authenticate and run SELECT 1; basic monitoring style frequently used to determine the most basic levels of uptime. Should the 'out of memory' conditions were delivered most of the time we could act differently, but for now we basically have to assume that postgres is offline and poke around. It's also interesting to note that the systems are basically responsive (ssh can always seem to fork, as I'm poking around tools like 'ls' et al seem to be fine), and sometimes the load average isn't even extreme -- a leaky application with too many connections can cause this, so it's not like every tiny last scrap of memory has been consumed. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsvector concatenation - backend crash
jes...@krogh.cc writes: Attached SQL files gives (at least in my hands) a reliable backend crash with this stacktrace .. reproduced on both 9.0.4 and HEAD. I'm sorry I cannot provide a more trimmed down set of vectors the reproduces the bug, thus the obsfucated dataset. But even deleting single terms in the vectors make the bug go away. I found it. tsvector_concat does this to compute the worst-case output size needed: /* conservative estimate of space needed */ out = (TSVector) palloc0(VARSIZE(in1) + VARSIZE(in2)); Unfortunately, that's not really worst case: it could be that the output will require more alignment padding bytes than the inputs did, if there is a mix of lexemes with and without position data. For example, if in1 contains one lexeme of odd length without position data, and in2 contains one lexeme of even length with position data (and no pad byte), and in1's lexeme sorts before in2's, then we will need a pad byte in the second lexeme where there was none before. The core of the fix is to suppose that we might need a newly-added pad byte for each lexeme: out = (TSVector) palloc0(VARSIZE(in1) + VARSIZE(in2) + i1 + i2); which really is an overestimate but I don't feel a need to be tenser about it. What I actually committed is a bit longer because I added some comments and some Asserts ... Ok, I found 8.3.0 to be good so i ran a git bisect on it.. it gave me this commit: e6dbcb72fafa4031c73cc914e829a6dec96ab6b6 is the first bad commit commit e6dbcb72fafa4031c73cc914e829a6dec96ab6b6 Author: Tom Lane t...@sss.pgh.pa.us Date: Fri May 16 16:31:02 2008 + Extend GIN to support partial-match searches, and extend tsquery to support prefix matching using this facility. AFAICT this is a red herring: the bug exists all the way back to where tsvector_concat was added, in 8.3. I think the reason that your test case happens to not crash before this commit is that it changed the sort ordering rules for lexemes. As you can see from my minimal example above, we might need different numbers of pad bytes depending on how the lexemes sort relative to each other. Anyway, patch is committed; thanks for the report! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore --no-post-data and --post-data-only
Alvaro Herrera alvhe...@commandprompt.com writes: The --section=data --section=indexes proposal seems very reasonable to me -- more so than --sections='data indexes'. +1 ... not only easier to code and less squishily defined, but more like the existing precedent for other pg_dump switches, such as --table. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore --no-post-data and --post-data-only
On 08/26/2011 04:46 PM, Jim Nasby wrote: On Aug 26, 2011, at 12:15 PM, Andrew Dunstan wrote: I knew there would be some bike-shedding about how we specify these things, which is why I haven't written docs yet. While we're debating what shade of yellow to paint the shed... My actual use case is to be able to be able to inject SQL into a SQL-formatted dump either pre- or post-data (I'm on 8.3, so I don't actually dump any data; I'm *mostly* emulating the ability to dump data on just certain tables). So for what I'm doing, the ideal interface would be a way to tell pg_dump When you're done dumping all table structures but before you get to any constraints, please run $COMMAND and inject it's output into the dump output. For some of the data obfuscation we're doing it would be easiest if $COMMAND was a perl script instead of SQL, but we could probably convert it. Of course, many other folks actually need the ability to just spit out specific portions of the dump; I'm hoping we can come up with something that supports both concepts. Well, the Unix approach is to use tools that do one thing well to build up more complex tools. Making pg_dump run some external command to inject things into the stream seems like the wrong thing given this philosophy. Use pg_dump to get the bits you want (pre-data, post-data) and sandwich them around whatever else you want. As for getting data from just certain tables, I just posted a patch for pg_dump to exclude data for certain tables, and we could look at providing a positive as well as a negative filter if there is sufficient demand. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removal of useless include references
Excerpts from Bruce Momjian's message of vie ago 26 01:35:45 -0300 2011: It has been years since I ran src/tools/pginclude/pgrminclude to remove unnecessary include files. (I have already fixed things so include files can be compiled on their own.) The attached patch removes unneeded include references, and marks some includes as needing to be skipped by pgrminclude. In btree_gist I think you should remove #include postgres.h from the .h file and put it in the .c files instead, as is customary. I think that would make the other changes incorrect. ltree.h and pg_upgrade.h also get this wrong. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cryptic error message in low-memory conditions
Daniel Farina dan...@heroku.com writes: Some Postgres servers will error out for a while with the following error message: expected authentication request from server, but received c [ and this seems to be triggered by fork failures in the server ] spockFascinating./spock I poked at this for awhile by the expedient of hot-wiring postmaster.c to always fail the fork request: *** src/backend/postmaster/postmaster.c.origTue Aug 23 17:13:09 2011 --- src/backend/postmaster/postmaster.cFri Aug 26 19:08:25 2011 *** *** 3182,3188 #ifdef EXEC_BACKEND pid = backend_forkexec(port); #else/* !EXEC_BACKEND */ ! pid = fork_process(); if (pid == 0)/* child */ { free(bn); --- 3182,3189 #ifdef EXEC_BACKEND pid = backend_forkexec(port); #else/* !EXEC_BACKEND */ ! pid = -1; // fork_process(); ! errno = ENOMEM; if (pid == 0)/* child */ { free(bn); The basic case did what I was expecting: $ psql dbname=regression psql: could not fork new process for connection: Cannot allocate memory which was unsurprising since I'm quite sure that code path got tested in basically this fashion, back in the dark ages. But sure enough, over a TCP connection with SSL support turned on, it fails as described. strace shows the postmaster is sending what it's supposed to. I eventually figured out the problem: libpq sees the E response, thinks it must be talking to a postmaster too old to understand NEGOTIATE_SSL_CODE, and closes the socket and tries again. But *it forgets to clear out its input buffer*, so the body of the error message is still there waiting to be read, and the next connection sees the c as the first byte of the first postmaster response. The fix for this is *** src/interfaces/libpq/fe-connect.c.origThu Jul 28 10:39:57 2011 --- src/interfaces/libpq/fe-connect.cFri Aug 26 18:52:18 2011 *** *** 2064,2069 --- 2064,2072 conn-allow_ssl_try = false; /* Assume it ain't gonna handle protocol 3, either */ conn-pversion = PG_PROTOCOL(2, 0); + /* Discard any unread/unsent data */ + conn-inStart = conn-inCursor = conn-inEnd = 0; + conn-outCount = 0; /* Must drop the old connection */ closesocket(conn-sock); conn-sock = -1; and similarly in the other places in fe-connect.c where we're abandoning an open connection (not sure every one of those has to have it, but it seems a good safety practice). That gets us to the right place in the default SSL mode: $ psql host=localhost sslmode=prefer dbname=regression psql: could not fork new process for connection: Cannot allocate memory but we're still not there if you're trying to force SSL: $ psql host=localhost sslmode=require dbname=regression psql: server does not support SSL, but SSL was required The reason for this is that that same bit of code supposes that any E response must mean that the postmaster didn't recognize NEGOTIATE_SSL_CODE. It doesn't (and of course shouldn't) pay any attention to the actual textual error message. Now, seeing as how NEGOTIATE_SSL_CODE has been understood by every build since PG 7.0, I believe that this is dead code and we could remove it; it seems exceedingly unlikely that any modern build of libpq will ever be used to talk to a server that responds to that with E. In fact it's worse than just delivering a misleading error message in the require case, because if you're not doing require then what happens next is that the code forces the protocol level down to 2 and tries again, supposing that it must be dealing with an ancient server. In the normal case where you are talking to a server under load rather than a deliberately-broken one, it's entirely possible that the second connection attempt succeeds. And what you've got then is that the connection is operating in protocol 2 rather than what the user probably expected, disabling assorted functionality that he may well be depending on. I don't recall having seen reports that could match that syndrome, but then again Daniel's complaint is a new one on me too. So I'm thinking we'd be well advised to eliminate the assumption that an E response could be fixed by downgrading to protocol 2, and instead just make this code report the error message it got from the postmaster. That's more than a one-liner so I don't have a patch for it yet. Lastly, I noticed that if I tried this repeatedly on a Unix socket, I sometimes got psql: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. could not send startup packet: Broken pipe rather than the expected results.
Re: [HACKERS] Removal of useless include references
Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian br...@momjian.us writes: It has been years since I ran src/tools/pginclude/pgrminclude to remove unnecessary include files. (I have already fixed things so include files can be compiled on their own.) The attached patch removes unneeded include references, and marks some includes as needing to be skipped by pgrminclude. I am sure applying this patch will break builds on some platforms and some option combinations so I will monitor the buildfarm when I apply it and make adjustments. The last time you did this was in July 2006. It took us two weeks to mostly recover, but we were still dealing with some fallout in December, cf http://archives.postgresql.org/pgsql-hackers/2006-12/msg00491.php We had the buildfarm then, had had it for a couple years. The notion that watching the buildfarm is enough is fully disproven by history. Unless you have a better test plan than last time (which this isn't), I don't think this should be done at all. The benefits are microscopic and the pain real. I don't have a better plan. There are #ifdef code blocks that often don't get processed and therefore this can't be done better. I will abandon the idea. OK, try #2. I already had code that removed #if/#else/#endif code in *.h files for better testing, so I extended that to all *.c files. This reduces the size of the diff from 6.6k lines to 4.7k lines but it makes it much less likely that there will be problems from running pgrminclude. The current patch is here: http://momjian.us/expire/pgrminclude.diff I tested the patch on BSD and Linux. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removal of useless include references
Alvaro Herrera wrote: Excerpts from Bruce Momjian's message of vie ago 26 01:35:45 -0300 2011: It has been years since I ran src/tools/pginclude/pgrminclude to remove unnecessary include files. (I have already fixed things so include files can be compiled on their own.) The attached patch removes unneeded include references, and marks some includes as needing to be skipped by pgrminclude. In btree_gist I think you should remove #include postgres.h from the .h file and put it in the .c files instead, as is customary. I think that would make the other changes incorrect. ltree.h and pg_upgrade.h also get this wrong. Thanks, done. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inputting relative datetimes
On Fri, Aug 26, 2011 at 4:32 PM, Jim Nasby j...@nasby.net wrote: On Aug 25, 2011, at 5:08 AM, Dean Rasheed wrote: Funny you should mention intervals... timestamptz 'today' - interval '5 days' timestamptz 'now' + interval '2 hours' Yes, but what I am trying to achieve is a way of entering such relative timestamps using a single input value, so that absolute and relative timestamps can both be bound to a SQL query using just one variable. Even if the community doesn't want to add this to core, I think it would be a great add-on to put on PGXN. If you don't feel up to writing it themselves, perhaps you would pay one of the consulting companies to do it for them? Frankly, our current date parsing code is pretty darn strange and flaky. If nobody's found the energy to rationalize that, what are the chances that we can add a whole bunch more functionality without also adding a whole bunch more bugs? For examples of the sorts of things we haven't gotten around to fixing, see: http://archives.postgresql.org/pgsql-hackers/2011-03/msg01295.php Another problem here is that it seems possible, even likely, that everyone will have their own particular flavor of what they'd like to see accepted: two weeks ago, a week ago Tuesday, next Saturday, last Friday, two weeks from Saturday, Christmas plus three fortnights... if it weren't already spaghetti code... give it time. I'm not necessarily opposed to the idea (especially as a contrib module), but I'm a little nervous that we might be overestimating the extent to which Dean's needs are universal. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dropdb and dropuser: IF EXISTS
On Fri, Aug 26, 2011 at 12:08 AM, Josh Kupershmidt schmi...@gmail.com wrote: I noticed a few places where it would be handy if dropdb took a flag like --if-exists which would basically just add in the 'IF EXISTS' clause to the DROP DATABASE statement. For example, scripts like find_static or mbregress.sh use dropdb createdb, but they generate noisy errors from dropdb when run for the first time since there's no --if-exists flag. (They could just pipe 'DROP DATABASE IF EXISTS ...' to psql, but what's the point of having dropdb if it's not used?) Attached is a very quick patch implementing the --if-exists or -X option for dropdb and dropuser. I didn't bother adding in a check to make sure the server version was 8.2+ since we're not even supporting 8.1 nowadays, though that'd be easy enough to add in. +1 for --if-exists, but -X isn't doing a lot for me, especially since we've used -X for other purposes in other commands. I'd just skip having a short form for this one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cheaper snapshots redux
On Thu, Aug 25, 2011 at 6:24 PM, Jim Nasby j...@nasby.net wrote: On Aug 25, 2011, at 8:24 AM, Robert Haas wrote: My hope (and it might turn out that I'm an optimist) is that even with a reasonably small buffer it will be very rare for a backend to experience a wraparound condition. For example, consider a buffer with ~6500 entries, approximately 64 * MaxBackends, the approximate size of the current subxip arrays taken in aggregate. I hypothesize that a typical snapshot on a running system is going to be very small - a handful of XIDs at most - because, on the average, transactions are going to commit in *approximately* increasing XID order and, if you take the regression tests as representative of a real workload, only a small fraction of transactions will have more than one XID. So BTW, there's a way to actually gather some data on this by using PgQ (part of Skytools and used by Londiste). PgQ works by creating ticks at regular intervals, where a tick is basically just a snapshot of committed XIDs. Presumably Slony does something similar. I can provide you with sample data from our production systems if you're interested. Yeah, that would be great. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cheaper snapshots redux
On Thu, Aug 25, 2011 at 6:29 PM, Jim Nasby j...@nasby.net wrote: Actually, I wasn't thinking about the system dynamically sizing shared memory on it's own... I was only thinking of providing the ability for a user to change something like shared_buffers and allow that change to take effect with a SIGHUP instead of requiring a full restart. I agree. That would be awesome. Sadly, I don't have time to work on it. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On Fri, Aug 26, 2011 at 10:46:33AM +0200, Tomas Vondra wrote: Hmmm, let me explain what led me to this patch - right now I'm doing a comparison of filesystems with various block sizes (both fs and db blocks). I've realized that the db block size significantly influences frequency of checkpoints and amount of data to write, so I'm collecting data from pg_stat_bgwriter too. The benchmark goes like this 1. collect pg_stat_bgwriter stats 2. run pgbench for 10 minutes 3. collect pg_stat_bgwriter stats (to compute difference with (1)) 4. kill the postmaster The problem is that when checkpoint stats are collected, there might be a checkpoint in progress and in that case the stats are incomplete. In some cases (especially with very small db blocks) this has significant impact because the checkpoints are less frequent. Could you remove this hazard by adding a step 2a. psql -c CHECKPOINT? -- Noah Mischhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cheaper snapshots redux
On Tue, Aug 23, 2011 at 5:25 AM, Robert Haas robertmh...@gmail.com wrote: I've been giving this quite a bit more thought, and have decided to abandon the scheme described above, at least for now. It has the advantage of avoiding virtually all locking, but it's extremely inefficient in its use of memory in the presence of long-running transactions. For example, if there's an open transaction that's been sitting around for 10 million transactions or so and has an XID assigned, any new snapshot is going to need to probe into the big array for any XID in that range. At 8 bytes per entry, that means we're randomly accessing about ~80MB of memory-mapped data. That seems problematic both in terms of blowing out the cache and (on small machines) possibly even blowing out RAM. Nor is that the worst case scenario: a transaction could sit open for 100 million transactions. First i respectfully disagree with you on the point of 80MB. I would say that its very rare that a small system( with 1 GB RAM ) might have a long running transaction sitting idle, while 10 million transactions are sitting idle. Should an optimization be left, for the sake of a very small system to achieve high enterprise workloads? Second, if we make use of the memory mapped files, why should we think, that all the 80MB of data will always reside in memory? Won't they get paged out by the operating system, when it is in need of memory? Or do you have some specific OS in mind? Thanks, Gokul.