Re: [HACKERS] limiting hint bit I/O
On 01/18/2011 06:44 PM, Robert Haas wrote: On Tue, Jan 18, 2011 at 9:24 AM, Merlin Moncuremmonc...@gmail.com wrote: a few weeks back I hacked an experimental patch that removed the hint bit action completely. the results were very premature and/or incorrect, but my initial findings suggested that hint bits might not be worth the cost from performance standpoint. i'd like to see some more investigation in this direction before going with a complex application mechanism (although that would be beneficial vs the status quo). I think it's not very responsible to allege that hint bits aren't providing a benefit without providing the patch that you used and the tests that you ran. maybe I'm wrong but it seems it did post an experimental patch and also a tests used, see: http://archives.postgresql.org/pgsql-hackers/2010-12/msg01897.php This is a topic that needs careful analysis, and I think that saying hint bits don't provide a benefit... maybe... doesn't do anything but confuse the issue. How about doing some tests with the patch from my OP and posting the results? If removing hint bits entirely doesn't degrade performance, then surely the less-drastic approach I've taken here ought to be OK too. But in my testing, it didn't look too good. Andrea -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On 01/19/2011 09:03 AM, Andrea Suisani wrote: On 01/18/2011 06:44 PM, Robert Haas wrote: On Tue, Jan 18, 2011 at 9:24 AM, Merlin Moncuremmonc...@gmail.com wrote: a few weeks back I hacked an experimental patch that removed the hint bit action completely. the results were very premature and/or incorrect, but my initial findings suggested that hint bits might not be worth the cost from performance standpoint. i'd like to see some more investigation in this direction before going with a complex application mechanism (although that would be beneficial vs the status quo). I think it's not very responsible to allege that hint bits aren't providing a benefit without providing the patch that you used and the tests that you ran. maybe I'm wrong but it seems it did post an experimental patch and also ^^ he a tests used, see: ^^ the sorry for the typos (not enough caffeine I suppose :) Andrea -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]
On Tue, Jan 18, 2011 at 10:03:01AM +0200, Heikki Linnakangas wrote: That isn't ever going to happen, unless you'd like to give up hash joins and hash aggregation on text values. You could canonicalize the string first in the hash function. I'm not sure if we have all the necessary information at hand there, but at least with some encoding/locale-specific support functions it'd be possible. This is what strxfrm() was created for. strcoll(a,b) == strcmp(strxfrm(a),strxfrm(b)) Sure there's a cost, the question is only how much and whether it makes hash join unfeasible. I doubt it, since by definition it must be faster than strcoll(). I suppose a test would be interesting. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ Patriotism is when love of your own people comes first; nationalism, when hate for people other than your own comes first. - Charles de Gaulle signature.asc Description: Digital signature
Re: [HACKERS] Extending opfamilies for GIN indexes
Tom Lane t...@sss.pgh.pa.us writes: Oh, wait a minute: there's a bad restriction there, namely that a contrib module could only add loose operators that had different declared input types from the ones known to the core opclass. Otherwise there'd be a conflict with the contrib module and core needing to insert similarly-keyed support functions. This would actually be enough for contrib/intarray (because the core operator entries are for anyarray not for integer[]) but it is easy to foresee cases where that wouldn't be good enough. Seems like we'd need an additional key column in pg_amproc to really make this cover all cases. I would have though that such contrib would then need to offer their own opfamily and opclasses, and users would have to use the specific opclass manually like they do e.g. for text_pattern_ops. Can't it work that way? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python refactoring
On 18/01/11 23:22, Peter Eisentraut wrote: On mån, 2011-01-17 at 21:49 +0200, Peter Eisentraut wrote: On sön, 2011-01-02 at 12:41 +0100, Jan Urbański wrote: Here they are. There are 16 patches in total. They amount to what's currently in my refactor branch (and almost to what I've sent as the complete refactor patch, while doing the splitting I discovered a few useless hunks that I've discarded). Committed 0001, rest later. Today committed: 3, 5, 10, 11, 12, 13 Thanks, #2: It looks like this loses some information/formatting in the error message. Should we keep the pointing arrow there? --- a/src/pl/plpython/expected/plpython_error.out +++ b/src/pl/plpython/expected/plpython_error.out @@ -10,10 +10,7 @@ CREATE FUNCTION sql_syntax_error() RETURNS text SELECT sql_syntax_error(); WARNING: PL/Python: plpy.SPIError: unrecognized error in PLy_spi_execute_query CONTEXT: PL/Python function sql_syntax_error -ERROR: syntax error at or near syntax -LINE 1: syntax error -^ -QUERY: syntax error +ERROR: PL/Python: plpy.SPIError: syntax error at or near syntax CONTEXT: PL/Python function sql_syntax_error /* check the handling of uncaught python exceptions */ Yes, the message is less informative, because the error is reported by PL/Python, by wrapping the SPI message. I guess I could try to extract more info from the caught ErrorData and put it in the new error that PL/Python throws. But I'd like to do that as a refinement of the spi-in-subxacts patch, because the current behaviour is broken anyway - to catch the SPI error safely you need to wrap the whole thing in a subtransaction. This patch only gets rid of the global error state, and to do that I needed to raise some exception from the try/catch block around SPI calls. #7: This is unnecessary because the remaining fields are automatically initialized with NULL. The Python documentation uses initializations like that. The way I understand it, newer Python versions might add more fields at the end, and they will rely on the fact that they get automatically initialized even if the source code was for an older version. So I would rather not touch this, as it doesn't buy anything. OK. #16: This is probably pointless because pgindent will reformat this. If it does, than it's a shame. Maybe the comment should be moved above the if() then. Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusing comment in TransactionIdIsInProgress
On Tue, 2011-01-18 at 10:51 +0200, Heikki Linnakangas wrote: On 18.01.2011 07:15, Jim Nasby wrote: Shouldn't the comment read If first time through? /* * If not first time through, get workspace to remember main XIDs in. We * malloc it permanently to avoid repeated palloc/pfree overhead. */ if (xids == NULL) { ... xids = (TransactionId *) malloc(maxxids * sizeof(TransactionId)); Huh, yes, I'm amazed that no-one have noticed. I must've read that piece of code dozens of times in the last couple of years myself, and that sentence was even copy-pasted to GetConflictingVirtualXIDs() later in that file, including that thinko. Yes, I laughed when I saw that, having camped out in that code many nights. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python refactoring
On 19/01/11 02:06, Hitoshi Harada wrote: 2011/1/19 Peter Eisentraut pete...@gmx.net: On mån, 2011-01-17 at 21:49 +0200, Peter Eisentraut wrote: On sön, 2011-01-02 at 12:41 +0100, Jan Urbański wrote: Here they are. There are 16 patches in total. They amount to what's currently in my refactor branch (and almost to what I've sent as the complete refactor patch, while doing the splitting I discovered a few useless hunks that I've discarded). Committed 0001, rest later. Today committed: 3, 5, 10, 11, 12, 13 I have reviewed this original patches for myself as the fundamental of subsequent work, and have comments. - This is not in the patch, but around line 184 vis versa in comment seems like typo. Right, should certainly be vice versa. - -1 is used as the initial value of PLyTypeInfo.is_rowtype. Why not 0? See the comments in struct PLyTypeInfo: is_rowtype can be: -1 = not known yet (initial state); 0 = scalar datatype; 1 = rowtype; 2 = rowtype, but I/O functions not set up yet - PLy_(input|output)_tuple_funcs() in PLy_trigger_handler() is added. The comment says it should check for the possibility that the relation's tupdesc changed since last call. Is it true that tupdesc may change even in one statement? And it also says the two functions are responsible for not doing repetitive work, but ISTM they are not doing something to stop redundant work. The function doesn't seem like lightweight enough to be called on each row. Hm, you may be right. I haven't touched that part of the code, but now it seems to me that for triggers we do the I/O funcs lookup for every row. I could try to check that and fix it, but not sure if I'll have the time, and it's been that way before. Also, the CF is already closed in theory... - There are elog() and PLy_elog() overall, but it looks like to me that the usage is inconsistent. Moreover, elog(ERROR, PLyTypeInfo struct is initialized for a Datum); in PLy_(input|output)_tuple_funcs() should be Assert() not elog()? Well in theory PLy_elog should be used if there's a current Python exception set that you'd like to forward to Postgres, making it a elog(ERROR). It's true that the usage is sometimes a bit inconsistent, some of these inconsistencies are cleaned up in the next patches, but probably not all of them. As for the Assert/elog, I would prefer en elog, because if there are bugs in that code, using the wrong I/O functions could lead to unexpected results i production (where an Assert would not be present). - A line break should be added before PLy_add_exception() after static void Oops, you're right. - This is also not in the patch, but the comment /* these should only be called once at the first call * of plpython_call_handler. initialize the python interpreter * and global data. */ is bogus. PLy_init_interp() is called in _PG_init(). Yep, that comment looks misguided. That's all for now. Some of them must be my misunderstanding or trivial, but I post them for the record. Thanks, your feedback it's very valuable! Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication logging
On Tue, 2011-01-18 at 10:57 -0500, Tom Lane wrote: Magnus Hagander mag...@hagander.net writes: Is there *any* usecase for setting them differently though? I can't believe we're still engaged in painting this bikeshed. Let's just control it off log_connections and have done. Yes, this is a waste of time. Leave it as is because its there for a very good reason, as Robert already explained. The code was put in explicitly because debugging replication connections is quite important and prior to that addition, wasn't easy. It's a very separate thing from logging the hundreds/thousands of other connections on the system. I don't really care about neatness of code, or neatness of parameters. I want to be able to confirm the details of the connection. pg_stat_replication is dynamic, not a historical record of connections and reconnections. How else will you diagnose an erratic network, or an accidental change of authority? Replication is so important it isn't worth tidying away a couple of lines of log. My objective is to make replication work, not to reduce the size of the average log file by 1 or 2 lines. I'm particularly concerned that people make such changes too quickly. There are many things in this area of code that need changing, but also many more that do not. If we are to move forwards we need to avoid going one step forwards, one step back. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ToDo: support for parameters in EXECUTE statement
Hello The EXECUTE statement doesn't support a parametrization via SPI_execute_with_args call and PQexecParams too. It can be a security issue. If somebody use a prepared statement as protection to sql injection, then all security goes out, because he has to call EXECUTE without parametrization. Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python refactoring
2011/1/19 Jan Urbański wulc...@wulczer.org: On 19/01/11 02:06, Hitoshi Harada wrote: - -1 is used as the initial value of PLyTypeInfo.is_rowtype. Why not 0? See the comments in struct PLyTypeInfo: is_rowtype can be: -1 = not known yet (initial state); 0 = scalar datatype; 1 = rowtype; 2 = rowtype, but I/O functions not set up yet Well, I mean that 0 = inital, 1 = scalar, 2 = rowtype, 3 = rowtype but not set up sounds more intuitive to me. Of course this is only what I feel. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo: support for parameters in EXECUTE statement
On 19.01.2011 12:53, Pavel Stehule wrote: The EXECUTE statement doesn't support a parametrization via SPI_execute_with_args call and PQexecParams too. It can be a security issue. If somebody use a prepared statement as protection to sql injection, then all security goes out, because he has to call EXECUTE without parametrization. Why don't you use SPI_prepare and SPI_open_query ? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo: support for parameters in EXECUTE statement
2011/1/19 Heikki Linnakangas heikki.linnakan...@enterprisedb.com: On 19.01.2011 12:53, Pavel Stehule wrote: The EXECUTE statement doesn't support a parametrization via SPI_execute_with_args call and PQexecParams too. It can be a security issue. If somebody use a prepared statement as protection to sql injection, then all security goes out, because he has to call EXECUTE without parametrization. Why don't you use SPI_prepare and SPI_open_query ? I have to execute a session's prepared statement - created with PREPARE statement. Pavel -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on
On Tue, 2011-01-18 at 19:04 +, Magnus Hagander wrote: Log replication connections only when log_connections is on Previously we'd always log replication connections, with no way to turn them off. You noted that the code was there intentionally, yet you also couldn't see the reason. That is not a great reason to change it. It's especially not a great reason to make the change quickly. The log entry served a very specific purpose, which was helping people to diagnose problems with replication connectivity. It isn't practical or sensible to force people to use log_connections to help with that; that is a sledgehammer to crack a nut. Few people have it turned on in production, so turning it on after a problem happened doesn't help diagnose things. The negative impact of this was a couple of log lines. No bug. Plus I don't see any reason to introduce an incompatibility with the log output from 9.0. So removing this has almost no positive benefit, yet a clear negative one. We need to look at the negative aspects of changes before we make them. Let's concentrate on adding the major features, not rush through loads of minor changes. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Wed, Jan 19, 2011 at 06:14, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Jan 19, 2011 at 4:12 AM, Magnus Hagander mag...@hagander.net wrote: Ah, ok. I've added the errorcode now, PFA. I also fixed an error in the change for result codes I broke in the last patch. github branch updated as usual. Great. Thanks for the quick update! Here are another comments: + * IDENTIFICATION + * src/bin/pg_basebackup.c Typo: s/src/bin/pg_basebackup.c/src/bin/pg_basebackup/pg_basebackup.c Oops. + printf(_( -c, --checkpoint=fast|slow\n + set fast or slow checkpoinging\n)); Typo: s/checkpoinging/checkpointing The fast or slow seems to lead users to always choose fast. Instead, what about fast or smooth, fast or spread or immediate or delayed? Hmm. fast or spread seems reasonable to me. And I want to use fast for the fast version, because that's what we call it when you use pg_start_backup(). I'll go change it to spread for now - it's the one I can find used in the docs. You seem to have forgotten to support --checkpoint long option. The struct long_options needs to be updated. Wow, that clearly went too fast. Fixed as wlel. What if pg_basebackup receives a signal while doing a backup? For example, users might do Ctrl-C to cancel the long-running backup. We should define a signal handler and send a Terminate message to the server to cancel the backup? Nah, we'll just disconnect and it'll deal with things that way. Just like we do with e.g. pg_dump. I don't see the need to complicate it with that. (new patch on github in 5 minutes) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on
On Wed, Jan 19, 2011 at 13:36, Simon Riggs si...@2ndquadrant.com wrote: On Tue, 2011-01-18 at 19:04 +, Magnus Hagander wrote: Log replication connections only when log_connections is on Previously we'd always log replication connections, with no way to turn them off. You noted that the code was there intentionally, yet you also couldn't see the reason. That is not a great reason to change it. It's especially not a great reason to make the change quickly. Yes. And I brought it up in discussion, and the consensus was to change it to be consistent. And per that thread, there was also no consensus on ever *adding* it that way - if I'd realized it was that way back then, I would've objected at the time. I didn't realize it until I started seeing the systems in production more. The log entry served a very specific purpose, which was helping people to diagnose problems with replication connectivity. It isn't practical or sensible to force people to use log_connections to help with that; that is a sledgehammer to crack a nut. Few people have it turned on in production, so turning it on after a problem happened doesn't help diagnose things. Again, if you read the thread, the consensus was to do it the simple way and make it configurable by that one. The other suggestion was to turn log_connections into an enum, which was considered unnecessarily complicated. The negative impact of this was a couple of log lines. No bug. Plus I don't see any reason to introduce an incompatibility with the log output from 9.0. The inability to remove it has certainly annoyed me, and I know a few others who have commented on it. And it's inconsistent behavior, something we in general try to avoid. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump directory archive format / parallel pg_dump
On 19.01.2011 07:45, Joachim Wieland wrote: On Mon, Jan 17, 2011 at 5:38 PM, Jaime Casanovaja...@2ndquadrant.com wrote: This one is the last version of this patch? if so, commitfest app should be updated to reflect that Here are the latest patches all of them also rebased to current HEAD. Will update the commitfest app as well. What's the idea of storing the file sizes in the toc file? It looks like it's not used for anything. It would be nice to have this format match the tar format. At the moment, there's a couple of cosmetic differences: * TOC file is called TOC, instead of toc.dat * blobs TOC file is called BLOBS.TOC instead of blobs.toc * each blob is stored as blobs/oid.dat, instead of blob_oid.dat The only significant difference is that in the directory archive format, each data file has a header in the beginning. What are the benefits of the data file header? Would it be better to leave it out, so that the format would be identical to the tar format? You could then just tar up the directory to get a tar archive, or vice versa. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on
On Wed, 2011-01-19 at 13:44 +0100, Magnus Hagander wrote: On Wed, Jan 19, 2011 at 13:36, Simon Riggs si...@2ndquadrant.com wrote: On Tue, 2011-01-18 at 19:04 +, Magnus Hagander wrote: Log replication connections only when log_connections is on Previously we'd always log replication connections, with no way to turn them off. You noted that the code was there intentionally, yet you also couldn't see the reason. That is not a great reason to change it. It's especially not a great reason to make the change quickly. Yes. And I brought it up in discussion, and the consensus was to change it to be consistent. No it wasn't. Robert explained the reason it was there. Why make the change? Why make it quickly? Why avoid and ignore the CF? The negative impact of this was a couple of log lines. No bug. Plus I don't see any reason to introduce an incompatibility with the log output from 9.0. The inability to remove it has certainly annoyed me, and I know a few others who have commented on it. And it's inconsistent behavior, something we in general try to avoid. I care about diagnosing problems on production systems. There will be many more people annoyed by the inability to diagnose issues then there will be by people who care lots about a couple of log lines, or who care about a few people's views on inconsistency. Is your annoyance worth more than causing newbies problems and being unable to diagnose production systems? How will we diagnose erratic connection problems now? -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On Tue, Jan 18, 2011 at 12:44 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jan 18, 2011 at 9:24 AM, Merlin Moncure mmonc...@gmail.com wrote: a few weeks back I hacked an experimental patch that removed the hint bit action completely. the results were very premature and/or incorrect, but my initial findings suggested that hint bits might not be worth the cost from performance standpoint. i'd like to see some more investigation in this direction before going with a complex application mechanism (although that would be beneficial vs the status quo). I think it's not very responsible to allege that hint bits aren't providing a benefit without providing the patch that you used and the tests that you ran. This is a topic that needs careful analysis, and I think that saying hint bits don't provide a benefit... maybe... doesn't do anything but confuse the issue. How about doing some tests with the patch from my OP and posting the results? If removing hint bits entirely doesn't degrade performance, then surely the less-drastic approach I've taken here ought to be OK too. But in my testing, it didn't look too good. hm. well, I would have to agree on the performance hit -- I figure 5% scan penalty should be about the maximum you'd want to pay to get the i/o reduction. Odds are you're correct and I blew something...I'd be happy to test your patch. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on
On Wed, 2011-01-19 at 12:55 +, Simon Riggs wrote: How will we diagnose erratic connection problems now? The point here is that your effort to *remove* pointless log lines now means that we cannot diagnose production problems with replication unless we now *add* hundreds of pointless log lines. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On Wed, Jan 19, 2011 at 7:57 AM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Jan 18, 2011 at 12:44 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jan 18, 2011 at 9:24 AM, Merlin Moncure mmonc...@gmail.com wrote: a few weeks back I hacked an experimental patch that removed the hint bit action completely. the results were very premature and/or incorrect, but my initial findings suggested that hint bits might not be worth the cost from performance standpoint. i'd like to see some more investigation in this direction before going with a complex application mechanism (although that would be beneficial vs the status quo). I think it's not very responsible to allege that hint bits aren't providing a benefit without providing the patch that you used and the tests that you ran. This is a topic that needs careful analysis, and I think that saying hint bits don't provide a benefit... maybe... doesn't do anything but confuse the issue. How about doing some tests with the patch from my OP and posting the results? If removing hint bits entirely doesn't degrade performance, then surely the less-drastic approach I've taken here ought to be OK too. But in my testing, it didn't look too good. hm. well, I would have to agree on the performance hit -- I figure 5% scan penalty should be about the maximum you'd want to pay to get the i/o reduction. Odds are you're correct and I blew something...I'd be happy to test your patch. Ah, I tested your patch vs stock postgres vs my patch, basically your results are unhappily correct (mine was just a hair faster than yours which you'd expect). The differential was even wider on my laptop class hardware, maybe 26%. I also agree that even if the penalty was reduced or determined to be worth it anyways, your approach to move the setting/i/o around to appropriate places is the way to go vs wholesale removal, unless some way is found to reduce clog lookup penalty to a fraction of what it is now (not likely, I didn't profile but I bet a lot of the problem is the lw lock). Interesting I didn't notice this on my original test :(. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump directory archive format / parallel pg_dump
On Wed, Jan 19, 2011 at 7:47 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Here are the latest patches all of them also rebased to current HEAD. Will update the commitfest app as well. What's the idea of storing the file sizes in the toc file? It looks like it's not used for anything. It's part of the overall idea to make sure files are not inadvertently exchanged between different backups and that a file is not truncated. In the future I'd also like to add a checksum to the TOC so that a backup can be checked for integrity. This will cost performance but with the parallel backup it can be distributed to several processors. It would be nice to have this format match the tar format. At the moment, there's a couple of cosmetic differences: * TOC file is called TOC, instead of toc.dat * blobs TOC file is called BLOBS.TOC instead of blobs.toc * each blob is stored as blobs/oid.dat, instead of blob_oid.dat That can be done easily... The only significant difference is that in the directory archive format, each data file has a header in the beginning. What are the benefits of the data file header? Would it be better to leave it out, so that the format would be identical to the tar format? You could then just tar up the directory to get a tar archive, or vice versa. The header is there to identify a file, it contains the header that every other pgdump file contains, including the internal version number and the unique backup id. The tar format doesn't support compression so going from one to the other would only work for an uncompressed archive and special care must be taken to get the order of the tar file right. If you want to drop the header altogether, fine with me but if it's just for the tar - directory conversion, then I am failing to see what the use case of that would be. A tar archive has the advantage that you can postprocess the dump data with other tools but for this we could also add an option that gives you only the data part of a dump file (and uncompresses it at the same time if compressed). Once we have that however, the question is what anybody would then still want to use the tar format for... Joachim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on
On Wed, Jan 19, 2011 at 7:55 AM, Simon Riggs si...@2ndquadrant.com wrote: How will we diagnose erratic connection problems now? As Heikki pointed out, the slave still logs this information, so we can look there. If that's enough, yeah, you'll have to turn log_connections on on the master, but I don't think that will be very frequent, and it's not an unmanageable burden anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On 19.01.2011 15:56, Merlin Moncure wrote: On Wed, Jan 19, 2011 at 7:57 AM, Merlin Moncuremmonc...@gmail.com wrote: On Tue, Jan 18, 2011 at 12:44 PM, Robert Haasrobertmh...@gmail.com wrote: On Tue, Jan 18, 2011 at 9:24 AM, Merlin Moncuremmonc...@gmail.com wrote: a few weeks back I hacked an experimental patch that removed the hint bit action completely. the results were very premature and/or incorrect, but my initial findings suggested that hint bits might not be worth the cost from performance standpoint. i'd like to see some more investigation in this direction before going with a complex application mechanism (although that would be beneficial vs the status quo). I think it's not very responsible to allege that hint bits aren't providing a benefit without providing the patch that you used and the tests that you ran. This is a topic that needs careful analysis, and I think that saying hint bits don't provide a benefit... maybe... doesn't do anything but confuse the issue. How about doing some tests with the patch from my OP and posting the results? If removing hint bits entirely doesn't degrade performance, then surely the less-drastic approach I've taken here ought to be OK too. But in my testing, it didn't look too good. hm. well, I would have to agree on the performance hit -- I figure 5% scan penalty should be about the maximum you'd want to pay to get the i/o reduction. Odds are you're correct and I blew something...I'd be happy to test your patch. Ah, I tested your patch vs stock postgres vs my patch, basically your results are unhappily correct (mine was just a hair faster than yours which you'd expect). The differential was even wider on my laptop class hardware, maybe 26%. I also agree that even if the penalty was reduced or determined to be worth it anyways, your approach to move the setting/i/o around to appropriate places is the way to go vs wholesale removal, unless some way is found to reduce clog lookup penalty to a fraction of what it is now (not likely, I didn't profile but I bet a lot of the problem is the lw lock). Interesting I didn't notice this on my original test :(. One thing to note is that the current visibility-checking code is optimized for the case that the hint bit is set, and the codepath where it's not is not particularly fast. HeapTupleSatisfiesMVCC does a lot of things besides checking the clog. For xmin: 1. Check HEAP_MOVED_OFF / HEAP_MOVED_IN 2. Check if xmin is the current transaction with TransactionIdIsCurrentTransactionId() 3. Check if xmin is still in progress with TransactionIdIsInProgress() 4. And finally, check the clog with TransactionIdDidCommit() It would be nice to profile the code to see where the time really is spent. Most of it is probably in the clog access, but the TransactionIdInProgress() call can be quite expensive too if there's a lot of concurrent backends. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on
On Wed, 2011-01-19 at 09:08 -0500, Robert Haas wrote: On Wed, Jan 19, 2011 at 7:55 AM, Simon Riggs si...@2ndquadrant.com wrote: How will we diagnose erratic connection problems now? As Heikki pointed out, the slave still logs this information, so we can look there. If that's enough, yeah, you'll have to turn log_connections on on the master, but I don't think that will be very frequent, and it's not an unmanageable burden anyway. So now we have to check *all* of the standby logs in order to check that replication on the master is working without problems. There will be no default capability to tie up events on the master with failures of replication. Events occurring on multiple standbys will not be picked up as a correlated set of events. It's possible, but its not a realistic alternative. We've lost something and gained almost nothing. We need to avoid making this kind of improvement. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python refactoring
On Wed, Jan 19, 2011 at 11:09:56AM +0100, Jan Urbański wrote: On 19/01/11 02:06, Hitoshi Harada wrote: 2011/1/19 Peter Eisentraut pete...@gmx.net: On mån, 2011-01-17 at 21:49 +0200, Peter Eisentraut wrote: On sön, 2011-01-02 at 12:41 +0100, Jan Urbański wrote: Here they are. There are 16 patches in total. They amount to what's currently in my refactor branch (and almost to what I've sent as the complete refactor patch, while doing the splitting I discovered a few useless hunks that I've discarded). Committed 0001, rest later. Today committed: 3, 5, 10, 11, 12, 13 I have reviewed this original patches for myself as the fundamental of subsequent work, and have comments. - This is not in the patch, but around line 184 vis versa in comment seems like typo. Right, should certainly be vice versa. - -1 is used as the initial value of PLyTypeInfo.is_rowtype. Why not 0? See the comments in struct PLyTypeInfo: is_rowtype can be: -1 = not known yet (initial state); 0 = scalar datatype; 1 = rowtype; 2 = rowtype, but I/O functions not set up yet - PLy_(input|output)_tuple_funcs() in PLy_trigger_handler() is added. The comment says it should check for the possibility that the relation's tupdesc changed since last call. Is it true that tupdesc may change even in one statement? And it also says the two functions are responsible for not doing repetitive work, but ISTM they are not doing something to stop redundant work. The function doesn't seem like lightweight enough to be called on each row. Hm, you may be right. I haven't touched that part of the code, but now it seems to me that for triggers we do the I/O funcs lookup for every row. I could try to check that and fix it, but not sure if I'll have the time, and it's been that way before. Also, the CF is already closed in theory... If you're fixing things in PL/PythonU, such a change would certainly be in scope as an update to your patch, i.e. don't let the fact that the CF has started stop you from fixing it. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on
On Jan19, 2011, at 16:16 , Simon Riggs wrote: On Wed, 2011-01-19 at 09:08 -0500, Robert Haas wrote: On Wed, Jan 19, 2011 at 7:55 AM, Simon Riggs si...@2ndquadrant.com wrote: How will we diagnose erratic connection problems now? As Heikki pointed out, the slave still logs this information, so we can look there. If that's enough, yeah, you'll have to turn log_connections on on the master, but I don't think that will be very frequent, and it's not an unmanageable burden anyway. So now we have to check *all* of the standby logs in order to check that replication on the master is working without problems. There will be no default capability to tie up events on the master with failures of replication. Events occurring on multiple standbys will not be picked up as a correlated set of events. I don't see for what kind of problems logging replication connections is the right way to monitor replication. To check that all slaves are connected and are streaming WAL, one ought to monitor pg_stat_replication, no? Once a slave vanishes from there (or falls back too far) you'd inspect the *slave's* log to find the reason. Inspecting the master's log instead will always be a poor replacement, since some failure conditions won't leave a trace there (Connectivity problems, for example). Could you explain the failure condition you do have in mind where logging replication connections unconditionally is beneficial? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On Wed, Jan 19, 2011 at 8:56 AM, Merlin Moncure mmonc...@gmail.com wrote: Ah, I tested your patch vs stock postgres vs my patch, basically your results are unhappily correct (mine was just a hair faster than yours which you'd expect). The differential was even wider on my laptop class hardware, maybe 26%. I also agree that even if the penalty was reduced or determined to be worth it anyways, your approach to move the setting/i/o around to appropriate places is the way to go vs wholesale removal, unless some way is found to reduce clog lookup penalty to a fraction of what it is now (not likely, I didn't profile but I bet a lot of the problem is the lw lock). Interesting I didn't notice this on my original test :(. OK. My apologies for the email yesterday in which I forgot that you actually HAD posted a patch, but thanks for testing mine and posting your results (and thanks also to Andrea for pointing out the oversight to me). Here's a new version of the patch based on some experimentation with ideas I posted yesterday. At least on my Mac laptop, this is pretty effective at blunting the response time spike for the first table scan, and it converges to steady-state after about 20 tables scans. Rather than write every 20th page, what I've done here is make every 2000'th buffer allocation grant an allowance of 100 hint bit only writes. All dirty pages and the next 100 pages that are dirty-only-for-hint-bits get written out. Then we stop writing the dirty-only-for-hint-bits-pages until we get our next allowance of writes. The idea is to try to avoid creating a lot of random writes on each scan through the table. At least here, that seems to work pretty well - the initial scan is only about 25% slower than the steady state (rather than 6x or more slower). I am seeing occasional latency spikes that appear to be the result of the OS write cache filling up and deciding that it has to flush everything to disk before writing anything more. I'm not too concerned about that because this is a fairly artificial test case (one doesn't usually sit around doing consecutive SELECT sum(1) FROM s commands) but it seems like pretty odd behavior. The system sits there doing no writes at all as I'm sending more and more dirty pages into the system buffer cache and then, boom, write storm. I haven't yet tested to see if the same behavior occurs on Linux. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company bm-hint-bits-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on
Simon Riggs si...@2ndquadrant.com writes: So now we have to check *all* of the standby logs in order to check that replication on the master is working without problems. There will be no default capability to tie up events on the master with failures of replication. Events occurring on multiple standbys will not be picked up as a correlated set of events. This is pure FUD. If you suspect such a problem, turn on log_connections. If you aren't suspecting such a problem, you likely aren't looking in the postmaster log anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on
On Wed, 2011-01-19 at 16:42 +0100, Florian Pflug wrote: Could you explain the failure condition you do have in mind where logging replication connections unconditionally is beneficial? Sure. Replication drops and immediately reconnects during night. When did that happen? How many times did it happen? For how long did the disconnection last? Why did it happen? How does that correlate with other situations? That info is not available easily. pg_stat_replication is necessary, and is separate from pg_stat_activity because replication connections are not the same as normal connections. We even have a separate permission for replication, to further demonstrate that. Replication connections being logged differently from normal connections was not an anomaly, nor was it wasteful. By treating them the same we are forced to log too much, instead of just enough. Replication connections are not even logged unconditionally. You need to enable replication before they appear in the log. And that is exactly the time the log entries are helpful. The question we should have asked is Why is removing those log entries helpful?. I shouldn't have to justify putting something back, when the good reason for its existence was previously explained and there was no consensus to remove in the first place. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python refactoring
On 19/01/11 16:35, David Fetter wrote: On Wed, Jan 19, 2011 at 11:09:56AM +0100, Jan Urbański wrote: On 19/01/11 02:06, Hitoshi Harada wrote: - PLy_(input|output)_tuple_funcs() in PLy_trigger_handler() is added. The comment says it should check for the possibility that the relation's tupdesc changed since last call. Is it true that tupdesc may change even in one statement? And it also says the two functions are responsible for not doing repetitive work, but ISTM they are not doing something to stop redundant work. The function doesn't seem like lightweight enough to be called on each row. Hm, you may be right. I haven't touched that part of the code, but now it seems to me that for triggers we do the I/O funcs lookup for every row. I could try to check that and fix it, but not sure if I'll have the time, and it's been that way before. Also, the CF is already closed in theory... I looked again and it seems that PLy_(input|output)_tuple_funcs does indeed avoid repetitive work. Observe that in the loop going over the TupleDesc attributes there's if (arg-in.r.atts[i].typoid == desc-attrs[i]-atttypid) continue; /* already set up this entry */ which avoids the syscache lookup and calling PLy_input_datum_func2. Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On Wed, Jan 19, 2011 at 10:44 AM, Robert Haas robertmh...@gmail.com wrote: Here's a new version of the patch based on some experimentation with ideas I posted yesterday. At least on my Mac laptop, this is pretty effective at blunting the response time spike for the first table scan, and it converges to steady-state after about 20 tables scans. Rather than write every 20th page, what I've done here is make every 2000'th buffer allocation grant an allowance of 100 hint bit only writes. All dirty pages and the next 100 pages that are dirty-only-for-hint-bits get written out. Then we stop writing the dirty-only-for-hint-bits-pages until we get our next allowance of writes. The idea is to try to avoid creating a lot of random writes on each scan through the table. At least here, that seems to work pretty well - the initial scan is only about 25% slower than the steady state (rather than 6x or more slower). does this only impact the scan case? in oltp scenarios you want to write out the bits asap, i would imagine. what about time based flushing, so that only x dirty hint bit pages can be written out per time unit y? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on
On Wed, Jan 19, 2011 at 11:05 AM, Simon Riggs si...@2ndquadrant.com wrote: The question we should have asked is Why is removing those log entries helpful?. I shouldn't have to justify putting something back, when the good reason for its existence was previously explained and there was no consensus to remove in the first place. This is simply revisionist history. It was discussed for several days, and consensus was eventually reached. The fact that you chose not to participate in that discussion until after it was over doesn't mean we didn't have it, and the fact that you don't agree with the outcome doesn't mean there wasn't consensus. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo List Item - System Table Index Clustering
Robert I think the first thing to do would be to try to come up with a reproducible test case where clustering the tables improves performance. On that note, is there any standard way you guys do benchmarks? Bruce I think CLUSTER is a win when you are looking up multiple rows in the same table, either using a non-unique index or a range search. What places do such lookups? Having them all in adjacent pages would be a win --- single-row lookups are usually not. Mostly the tables that track column level data. Typically you will want to grab rows for multiple columns for a given table at once so it would be helpful to have them be contiguous on disk. I could design a benchmark to display this by building a thousand tables one column at a time using 'alter add column' to scatter the catalog rows for the tables across many blocks. So they'll be a range with column 1 for each table and column 2 for each table and column three for each table. Then fill a couple data tables with a lot of data and set some noise makers to loop through them over and over with full table scans ... filling up cache with unrelated data and hopefully ageing out the cache of the pg_tables. Then do some benchmark index lookup queries to see the retrieval time before and after clustering the pg_ctalog tables to record a difference. If the criteria is doesn't hurt anything and helps a little I think this passes. Esp since clusters aren't maintained automatically so adding them has no negative impact on insert or update. It'd just be a nice thing to do if you know it can be done that doesn't harm anyone who doesn't know. -Simone Aiken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On Wed, Jan 19, 2011 at 9:13 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 19.01.2011 15:56, Merlin Moncure wrote: On Wed, Jan 19, 2011 at 7:57 AM, Merlin Moncuremmonc...@gmail.com wrote: On Tue, Jan 18, 2011 at 12:44 PM, Robert Haasrobertmh...@gmail.com wrote: On Tue, Jan 18, 2011 at 9:24 AM, Merlin Moncuremmonc...@gmail.com wrote: a few weeks back I hacked an experimental patch that removed the hint bit action completely. the results were very premature and/or incorrect, but my initial findings suggested that hint bits might not be worth the cost from performance standpoint. i'd like to see some more investigation in this direction before going with a complex application mechanism (although that would be beneficial vs the status quo). I think it's not very responsible to allege that hint bits aren't providing a benefit without providing the patch that you used and the tests that you ran. This is a topic that needs careful analysis, and I think that saying hint bits don't provide a benefit... maybe... doesn't do anything but confuse the issue. How about doing some tests with the patch from my OP and posting the results? If removing hint bits entirely doesn't degrade performance, then surely the less-drastic approach I've taken here ought to be OK too. But in my testing, it didn't look too good. hm. well, I would have to agree on the performance hit -- I figure 5% scan penalty should be about the maximum you'd want to pay to get the i/o reduction. Odds are you're correct and I blew something...I'd be happy to test your patch. Ah, I tested your patch vs stock postgres vs my patch, basically your results are unhappily correct (mine was just a hair faster than yours which you'd expect). The differential was even wider on my laptop class hardware, maybe 26%. I also agree that even if the penalty was reduced or determined to be worth it anyways, your approach to move the setting/i/o around to appropriate places is the way to go vs wholesale removal, unless some way is found to reduce clog lookup penalty to a fraction of what it is now (not likely, I didn't profile but I bet a lot of the problem is the lw lock). Interesting I didn't notice this on my original test :(. One thing to note is that the current visibility-checking code is optimized for the case that the hint bit is set, and the codepath where it's not is not particularly fast. HeapTupleSatisfiesMVCC does a lot of things besides checking the clog. For xmin: 1. Check HEAP_MOVED_OFF / HEAP_MOVED_IN 2. Check if xmin is the current transaction with TransactionIdIsCurrentTransactionId() 3. Check if xmin is still in progress with TransactionIdIsInProgress() 4. And finally, check the clog with TransactionIdDidCommit() It would be nice to profile the code to see where the time really is spent. Most of it is probably in the clog access, but the TransactionIdInProgress() call can be quite expensive too if there's a lot of concurrent backends. Nice thought -- it's worth checking out. I'll play around with it some more -- I think you're right and the first step is to profile. If the bottleneck is in fact the lock there's not much that can be done afaict. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On Wed, Jan 19, 2011 at 11:18 AM, Merlin Moncure mmonc...@gmail.com wrote: On Wed, Jan 19, 2011 at 10:44 AM, Robert Haas robertmh...@gmail.com wrote: Here's a new version of the patch based on some experimentation with ideas I posted yesterday. At least on my Mac laptop, this is pretty effective at blunting the response time spike for the first table scan, and it converges to steady-state after about 20 tables scans. Rather than write every 20th page, what I've done here is make every 2000'th buffer allocation grant an allowance of 100 hint bit only writes. All dirty pages and the next 100 pages that are dirty-only-for-hint-bits get written out. Then we stop writing the dirty-only-for-hint-bits-pages until we get our next allowance of writes. The idea is to try to avoid creating a lot of random writes on each scan through the table. At least here, that seems to work pretty well - the initial scan is only about 25% slower than the steady state (rather than 6x or more slower). does this only impact the scan case? in oltp scenarios you want to write out the bits asap, i would imagine. what about time based flushing, so that only x dirty hint bit pages can be written out per time unit y? No, it doesn't only affect the scan case. But I don't think that's bad. The goal is for the background writer to provide enough clean pages that backends don't have to write anything at all. If that's not happening, the backends will be slowed by the need to write out pages themselves in order to create a sufficient supply of clean pages to satisfy their allocation needs. The easiest way for that situation to occur is if the backend is doing a large sequential scan of a table - in that case, it's by definition cycling through pages at top speed, and the fact that it's cycling through them in a ring buffer rather than using all of shared_buffers makes the loop even tighter. But if it's possible under some other set of circumstances, the behavior is still reasonable. This behavior kicks in if more than 100 out of some set of 2000 page allocations would require a write only for the purpose of flushing hint bits. Time-based flushing would be problematic in several respects. First, it would require a kernel call, which would be vastly more expensive than what I'm doing now, and might have undesirable performance implications for that reason. Second, I don't think it would be the right way to tune it even if that were not an issue. It doesn't really matter whether the system takes a millisecond or a microsecond or a nanosecond to write each buffer - what matters is that writing all the buffers is a lot slower than writing none of them. So what we want to do is write a percentage of them, in a way that guarantees that they'll all eventually get written if people continue to access the same data. This does that, and a time-based setting would not; it would also almost certainly require tuning based on the I/O capacities of the system it's running on, which isn't necessary with this approach. Before we get too deeply involved in theory, can you give this a test drive on your system and see how it looks? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
On Tue, Jan 18, 2011 at 5:22 PM, Pavel Stehule pavel.steh...@gmail.com wrote: opinion isn't strong in this topic. One or twenty useless detoasting isn't really significant in almost use cases (problem is thousands detoasting). Yeah. Many-times-repeated detoasting is really bad, and this is not the only place in the backend where we have this problem. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
Robert Haas robertmh...@gmail.com writes: ... So what we want to do is write a percentage of them, in a way that guarantees that they'll all eventually get written if people continue to access the same data. The word guarantee seems quite inappropriate here, since as far as I can see this approach provides no such guarantee --- even after many cycles you'd never be really certain all the bits were set. What I asked for upthread was that we continue to have some deterministic, practical way to force all hint bits in a table to be set. This is not *remotely* responding to that request. It's still not deterministic, and even if it were, vacuuming a large table 20 times isn't a very practical solution. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication logging
Simon Riggs wrote: On Tue, 2011-01-18 at 10:57 -0500, Tom Lane wrote: Magnus Hagander mag...@hagander.net writes: Is there *any* usecase for setting them differently though? I can't believe we're still engaged in painting this bikeshed. Let's just control it off log_connections and have done. Yes, this is a waste of time. Leave it as is because its there for a very good reason, as Robert already explained. The code was put in explicitly because debugging replication connections is quite important and prior to that addition, wasn't easy. It's a very separate thing from logging the hundreds/thousands of other connections on the system. I don't really care about neatness of code, or neatness of parameters. I want to be able to confirm the details of the connection. pg_stat_replication is dynamic, not a historical record of connections and reconnections. How else will you diagnose an erratic network, or an accidental change of authority? Replication is so important it isn't worth tidying away a couple of lines of log. My objective is to make replication work, not to reduce the size of the average log file by 1 or 2 lines. I'm particularly concerned that people make such changes too quickly. There are many things in this area of code that need changing, but also many more that do not. If we are to move forwards we need to avoid going one step forwards, one step back. There were enough people who wanted a change that we went ahead and did it --- if there was lack of agreement, we would have delayed longer. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
2011/1/19 Robert Haas robertmh...@gmail.com: On Tue, Jan 18, 2011 at 5:22 PM, Pavel Stehule pavel.steh...@gmail.com wrote: opinion isn't strong in this topic. One or twenty useless detoasting isn't really significant in almost use cases (problem is thousands detoasting). Yeah. Many-times-repeated detoasting is really bad, and this is not the only place in the backend where we have this problem. :-( ??? some general solution can be pretty hardcore - some like handlers instead pointers on varlena :( I know mainly about plpgsql performance problems - but it is my interes. Plpgsql can be simply fixed - maybe 10 maybe less lines of code. Pavel -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication logging
Bruce Momjian br...@momjian.us writes: Simon Riggs wrote: I'm particularly concerned that people make such changes too quickly. There are many things in this area of code that need changing, but also many more that do not. If we are to move forwards we need to avoid going one step forwards, one step back. There were enough people who wanted a change that we went ahead and did it --- if there was lack of agreement, we would have delayed longer. The real reason why we changed this is that nobody (except Simon) sees a situation where unconditional logging of successful replication connections is especially helpful. If you were trying to diagnose a problem you would more likely need to know about *failed* connections, but the code that was in there didn't provide that. At least not unless you turned on log_connections ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
Robert Haas robertmh...@gmail.com writes: On Tue, Jan 18, 2011 at 5:22 PM, Pavel Stehule pavel.steh...@gmail.com wrote: opinion isn't strong in this topic. One or twenty useless detoasting isn't really significant in almost use cases (problem is thousands detoasting). Yeah. Many-times-repeated detoasting is really bad, and this is not the only place in the backend where we have this problem. :-( Yeah, there's been some discussion of a more general solution, and I think I even had a trial patch at one point (which turned out not to work terribly well, but maybe somebody will have a better idea someday). In the meantime, the proposal at hand seems like a bit of a stop-gap, which is why I'd prefer to see something with a very minimal code footprint. Detoast at assignment would likely need only a few lines of code added in a single place. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication logging
Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Simon Riggs wrote: I'm particularly concerned that people make such changes too quickly. There are many things in this area of code that need changing, but also many more that do not. If we are to move forwards we need to avoid going one step forwards, one step back. There were enough people who wanted a change that we went ahead and did it --- if there was lack of agreement, we would have delayed longer. The real reason why we changed this is that nobody (except Simon) sees a situation where unconditional logging of successful replication connections is especially helpful. If you were trying to diagnose a problem you would more likely need to know about *failed* connections, but the code that was in there didn't provide that. At least not unless you turned on log_connections ... Yep. Simon, I understand you being disappointed you could not champion your cause during the discussion, but you can still try to sway people that the original code was appropriate. Based on the feedback from the group, it seemed unlikely you would be able to sway a significant number of people so we just went ahead and made the change. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python refactoring
Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Peter Eisentraut's message of mar ene 18 19:22:50 -0300 2011: #16: This is probably pointless because pgindent will reformat this. pgindent used to remove useless braces around single-statement blocks, but this behavior was removed years ago because it'd break formatting around PG_TRY blocks. Yeah. FWIW, I concur with Jan that this is a readability improvement. A comment and a statement always look like two statements to me --- and the fact that pgindent insists on inserting a blank line in front of the comment in this scenario does even more to mangle the visual impression of what the if controls. +1 for the braces. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
Fujii Masao masao.fu...@gmail.com writes: What I'm worried about is the case where a tablespace is created under the $PGDATA directory. What would be the sense of that? If you're concerned about whether the code handles it correctly, maybe the right solution is to add code to CREATE TABLESPACE to disallow it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On Wed, Jan 19, 2011 at 11:52 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: ... So what we want to do is write a percentage of them, in a way that guarantees that they'll all eventually get written if people continue to access the same data. The word guarantee seems quite inappropriate here, since as far as I can see this approach provides no such guarantee --- even after many cycles you'd never be really certain all the bits were set. What I asked for upthread was that we continue to have some deterministic, practical way to force all hint bits in a table to be set. This is not *remotely* responding to that request. It's still not deterministic, and even if it were, vacuuming a large table 20 times isn't a very practical solution. I get the impression you haven't spent as much time reading my email as I spent writing it. Perhaps I'm wrong, but in any case the code doesn't do what you're suggesting. In the most recently posted version of this patch, which is v2, if VACUUM hits a page that is hint-bit-dirty, it always writes it. Full stop. The 20 times bit applies to a SELECT * FROM table, which is a rather different case. As I write this, I realize that there is a small fly in the ointment here, which is that neither VACUUM nor SELECT force out all the pages they modify to disk. So there is some small amount of remaining nondeterminism, even if you VACUUM, because VACUUM will leave the last few pages it dirties in shared_buffers, and whether those hint bits hit the disk will depend on a decision made at the time they're evicted, not at the time they were dirtied. Possibly I could fix that by making SetBufferCommitInfoNeedsSave() set BM_DIRTY during vacuum and BM_HINT_BITS at other times. That would nail the lid shut pretty tight. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending opfamilies for GIN indexes
Dimitri Fontaine dimi...@2ndquadrant.fr writes: Tom Lane t...@sss.pgh.pa.us writes: Oh, wait a minute: there's a bad restriction there, namely that a contrib module could only add loose operators that had different declared input types from the ones known to the core opclass. I would have though that such contrib would then need to offer their own opfamily and opclasses, and users would have to use the specific opclass manually like they do e.g. for text_pattern_ops. Can't it work that way? I think you missed the point: right now, to use both the core and intarray operators on an integer[] column, you have to create *two* GIN indexes, which will have exactly identical contents. I'm looking for a way to let intarray extend the core opfamily definition so that one index can serve. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
On Wed, Jan 19, 2011 at 12:10 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jan 18, 2011 at 5:22 PM, Pavel Stehule pavel.steh...@gmail.com wrote: opinion isn't strong in this topic. One or twenty useless detoasting isn't really significant in almost use cases (problem is thousands detoasting). Yeah. Many-times-repeated detoasting is really bad, and this is not the only place in the backend where we have this problem. :-( Yeah, there's been some discussion of a more general solution, and I think I even had a trial patch at one point (which turned out not to work terribly well, but maybe somebody will have a better idea someday). I'm pretty doubtful that there's going to be a general solution to this problem - I think it's going to require gradual refactoring of problem spots. In the meantime, the proposal at hand seems like a bit of a stop-gap, which is why I'd prefer to see something with a very minimal code footprint. Detoast at assignment would likely need only a few lines of code added in a single place. OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending opfamilies for GIN indexes
On Wed, Jan 19, 2011 at 12:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: Dimitri Fontaine dimi...@2ndquadrant.fr writes: Tom Lane t...@sss.pgh.pa.us writes: Oh, wait a minute: there's a bad restriction there, namely that a contrib module could only add loose operators that had different declared input types from the ones known to the core opclass. I would have though that such contrib would then need to offer their own opfamily and opclasses, and users would have to use the specific opclass manually like they do e.g. for text_pattern_ops. Can't it work that way? I think you missed the point: right now, to use both the core and intarray operators on an integer[] column, you have to create *two* GIN indexes, which will have exactly identical contents. I'm looking for a way to let intarray extend the core opfamily definition so that one index can serve. Maybe this is a dumb question, but why not just put whatever stuff intarray[] adds directly into the core opfamily? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
On Wed, Jan 19, 2011 at 12:10:16PM -0500, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jan 18, 2011 at 5:22 PM, Pavel Stehule pavel.steh...@gmail.com wrote: opinion isn't strong in this topic. One or twenty useless detoasting isn't really significant in almost use cases (problem is thousands detoasting). Yeah. Many-times-repeated detoasting is really bad, and this is not the only place in the backend where we have this problem. :-( Yeah, there's been some discussion of a more general solution, and I think I even had a trial patch at one point (which turned out not to work terribly well, but maybe somebody will have a better idea someday). In the meantime, the proposal at hand seems like a bit of a stop-gap, which is why I'd prefer to see something with a very minimal code footprint. Detoast at assignment would likely need only a few lines of code added in a single place. What qualifies a patch as stop-gap scale? Pavel's patch is ~60 lines. If adding PLpgSQL_var.should_be_detoasted is your main pain point, testing VARATT_IS_EXTENDED there might be the least-harmful way to avoid it. Saving a few more lines by moving the work to exec_assign_value probably does not justify the associated performance regressions Pavel has cited. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Couple document fixes
Hi, I've attached a couple minor fixes to the docs. One relating to SECURITY LABEL and the other for pg_class.relpersistence -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 doc_fixes.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Couple document fixes
Thom Brown t...@linux.com wrote: I've attached a couple minor fixes to the docs. One relating to SECURITY LABEL and the other for pg_class.relpersistence relpersistence should be typechar/type, not typechar/type. Oddly enough, there is a difference. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo List Item - System Table Index Clustering
On Tue, Jan 18, 2011 at 6:49 PM, Simone Aiken sai...@quietlycompetent.com wrote: Pages like this one have column comments for the system tables: http://www.psql.it/manuale/8.3/catalog-pg-attribute.html Oh, I see. I don't think we want to go there. We'd need some kind of system for keeping the two places in sync. And there'd be no easy way to upgrade the in-database descriptions when we upgraded to a newer minor release, supposing they'd changed in the meantime. And some of the descriptions are quite long, so they wouldn't fit nicely in the amount of space you typically have available when you run \d+. And it would enlarge the size of an empty database by however much was required to store all those comments, which could be an issue for PostgreSQL instances that have many small databases. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Couple document fixes
On 19 January 2011 18:11, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Thom Brown t...@linux.com wrote: I've attached a couple minor fixes to the docs. One relating to SECURITY LABEL and the other for pg_class.relpersistence relpersistence should be typechar/type, not typechar/type. Oddly enough, there is a difference. -Kevin relkind in the same table is the same type, but isn't displayed as char in the docs, and the same applies to many other system tables. They would need changing too then. Examples are: pg_type.typtype pg_proc.provolatile pg_attribute.attstorage -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending opfamilies for GIN indexes
Robert Haas robertmh...@gmail.com writes: On Wed, Jan 19, 2011 at 12:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: I think you missed the point: right now, to use both the core and intarray operators on an integer[] column, you have to create *two* GIN indexes, which will have exactly identical contents. I'm looking for a way to let intarray extend the core opfamily definition so that one index can serve. Maybe this is a dumb question, but why not just put whatever stuff intarray[] adds directly into the core opfamily? AFAICS that means integrating contrib/intarray into core. Independently of whether that's a good idea or not, PG is supposed to be an extensible system, so it would be nice to have a solution that supported add-on extensions. The subtext here is that GIN, unlike the other index AMs, uses a representation that seems pretty amenable to supporting a wide variety of query types with a single index. contrib/intarray's query_int operators are not at all like the subset-inclusion-testing operators that the core opclass supports, and it's not very hard to think of additional cases that could be of interest to somebody (example: find all arrays that contain some/all entries within a given integer range). I think we're going to come up against similar situations over and over until we find a solution. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending opfamilies for GIN indexes
On Wed, Jan 19, 2011 at 1:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Jan 19, 2011 at 12:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: I think you missed the point: right now, to use both the core and intarray operators on an integer[] column, you have to create *two* GIN indexes, which will have exactly identical contents. I'm looking for a way to let intarray extend the core opfamily definition so that one index can serve. Maybe this is a dumb question, but why not just put whatever stuff intarray[] adds directly into the core opfamily? AFAICS that means integrating contrib/intarray into core. Independently of whether that's a good idea or not, PG is supposed to be an extensible system, so it would be nice to have a solution that supported add-on extensions. Yeah, I'm just wondering if it's worth the effort, especially in view of a rather large patch queue we seem to have outstanding at the moment. The subtext here is that GIN, unlike the other index AMs, uses a representation that seems pretty amenable to supporting a wide variety of query types with a single index. contrib/intarray's query_int operators are not at all like the subset-inclusion-testing operators that the core opclass supports, and it's not very hard to think of additional cases that could be of interest to somebody (example: find all arrays that contain some/all entries within a given integer range). I think we're going to come up against similar situations over and over until we find a solution. Interesting. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
Noah Misch n...@leadboat.com writes: On Wed, Jan 19, 2011 at 12:10:16PM -0500, Tom Lane wrote: In the meantime, the proposal at hand seems like a bit of a stop-gap, which is why I'd prefer to see something with a very minimal code footprint. Detoast at assignment would likely need only a few lines of code added in a single place. What qualifies a patch as stop-gap scale? Pavel's patch is ~60 lines. Yeah, but they're spread out all over plpgsql, and seem likely to metastasize to other places --- the additional field that needs to be initialized is the main culprit. I'd like a one-spot patch that will be easy to remove when/if it's no longer needed. If adding PLpgSQL_var.should_be_detoasted is your main pain point, testing VARATT_IS_EXTENDED there might be the least-harmful way to avoid it. I thought about that too, but adding an additional set of tests into exec_eval_datum isn't free --- that's a hot spot for plpgsql execution. Doing it in exec_assign_value would be significantly cheaper, first because it's reasonable to assume that assignments are less frequent than reads, and second because there's already a test there to detect pass-by-ref datatypes, as well as a datumCopy() step that could be skipped altogether when we detoast. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Use of O_DIRECT only for open_* sync options
Is there a reason we only use O_DIRECT with open_* sync options? xlogdefs.h says: /* * Because O_DIRECT bypasses the kernel buffers, and because we never * read those buffers except during crash recovery, it is a win to use * it in all cases where we sync on each write(). We could allow O_DIRECT * with fsync(), but because skipping the kernel buffer forces writes out * quickly, it seems best just to use it for O_SYNC. It is hard to imagine * how fsync() could be a win for O_DIRECT compared to O_SYNC and O_DIRECT. * Also, O_DIRECT is never enough to force data to the drives, it merely * tries to bypass the kernel cache, so we still need O_SYNC or fsync(). */ This seems wrong because fsync() can win if there are two writes before the sync call. Can kernels not issue fsync() if the write was O_DIRECT? If that is the cause, we should document it. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Couple document fixes
Thom Brown t...@linux.com wrote: relkind in the same table is the same type, but isn't displayed as char in the docs, and the same applies to many other system tables. They would need changing too then. Examples are: pg_type.typtype pg_proc.provolatile pg_attribute.attstorage That's a good point. Consistency would trump getting a single entry right, for sure. I wonder, though, whether we shouldn't consistently distinguish them. For one thing, I've seen multiple posts where people were reporting bugs because of having confused char with char. FWIW, \d shows: Table pg_catalog.pg_class Column | Type| Modifiers -+---+--- relname | name | not null relnamespace| oid | not null reltype | oid | not null reloftype | oid | not null relowner| oid | not null relam | oid | not null relfilenode | oid | not null reltablespace | oid | not null relpages| integer | not null reltuples | real | not null reltoastrelid | oid | not null reltoastidxid | oid | not null relhasindex | boolean | not null relisshared | boolean | not null relpersistence | char| not null relkind | char| not null relnatts| smallint | not null relchecks | smallint | not null relhasoids | boolean | not null relhaspkey | boolean | not null relhasexclusion | boolean | not null relhasrules | boolean | not null relhastriggers | boolean | not null relhassubclass | boolean | not null relfrozenxid| xid | not null relacl | aclitem[] | reloptions | text[]| Indexes: pg_class_oid_index UNIQUE, btree (oid) pg_class_relname_nsp_index UNIQUE, btree (relname, relnamespace) Currently we don't seem to distinguish them in very many places in the docs: $ find -name '*.sgml' | xargs egrep -n '\char\' ./doc/src/sgml/textsearch.sgml:1271:setweight(replaceable class=PARAMETERvector/replaceable typetsvector/, replaceable class=PARAMETERweight/replaceable typechar/) returns typetsvector/ ./doc/src/sgml/datatype.sgml:1116:length might change in a future release. The type typechar/type ./doc/src/sgml/datatype.sgml:1134: entrytypechar/type/entry ./doc/src/sgml/release-old.sgml:4406:Add routines for single-byte char type(Thomas) ./doc/src/sgml/release-old.sgml:4747:Make char type a synonym for char(1) (actually implemented as bpchar)(Thomas) ./doc/src/sgml/xfunc.sgml:1794: entrytypechar/type/entry ./doc/src/sgml/release-8.0.sgml:3389: typechar/ data type have been removed. ./doc/src/sgml/release-8.0.sgml:4460:typechar/ data type have been removed. ./doc/src/sgml/release-8.0.sgml:4466:to do arithmetic on a typechar/ column, you can cast it to ./doc/src/sgml/func.sgml:8462: literalfunctionsetweight(typetsvector/, typechar/)/function/literal ./doc/src/sgml/btree-gin.sgml:17: typeoid/, typemoney/, typechar/, -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending opfamilies for GIN indexes
Robert Haas robertmh...@gmail.com writes: On Wed, Jan 19, 2011 at 1:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: AFAICS that means integrating contrib/intarray into core. Independently of whether that's a good idea or not, PG is supposed to be an extensible system, so it would be nice to have a solution that supported add-on extensions. Yeah, I'm just wondering if it's worth the effort, especially in view of a rather large patch queue we seem to have outstanding at the moment. Oh, maybe we're not on the same page here: I wasn't really proposing to do this right now, it's more of a TODO item. Offhand the only reason to do it now would be if we settled on something that required a layout change in pg_amop/pg_amproc. Since we already have one such change in 9.1, getting the additional change done in the same release would be valuable to reduce the number of distinct cases for pg_dump and other clients to support. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
On Tue, Jan 18, 2011 at 5:16 PM, Tomas Vondra t...@fuzzy.cz wrote: Yes, I was aware of this problem (amount of memory consumed with lots of updates), and I kind of hoped someone will come up with a reasonable solution. As far as I can see, periodically sampling some or all of the table is really the only thing that has a chance of working OK. You could try to track changes only when they're not too big, but I think that's still going to have nasty performance consequences. I was thinking about it and I believe at least one of the algorithms (the adaptive sampling) might handle merging i.e. the backends would not need to store the list of values, just a private copy of the estimator and update it. And at the end (after commit), the estimator would be merged back into the actual one. So the algorithm would be something like this: 1. create copy for all distinct estimators influenced by the INSERT 2. update the local copy 3. commit and merge the local copies back into the original estimator Well, maybe. But DELETEs seem like a problem. And it's still adding foreground work to every query, which I just have a hard time believing is ever going to work out Regarding the crash scenario - if the commit fails, just throw away the local estimator copy, it's not needed. I'm not sure how to take care of the case when commit succeeds and the write of the merged estimator fails, but I think it might be possible to write the estimator to xlog or something like that. So it would be replayed during recovery etc. Or is it a stupid idea? It's not stupid, in the sense that that is what you'd need to do if you want to avoid ever having to rescan the table. But it is another thing that I think is going to be way too expensive. Yes, as I've mentioned above, the multi-column stats are the reason why I started working on this. And yes, it really should work like this: 1. user realizes there's something wrong as the plans are bad 2. after analysis, the user realizes the reason is an imprecise estimate due to dependence between columns 3. the user enables cross-column stats on the columns and checks if it actually solved the issue 4. if the cost outweights the gains, the user drops the stats Does that sound reasonable? Yes. The only caveat I'm trying to insert is that it's hard for me to see how the proposed approach could ever be cheap enough to be a win. If we need some kind of more expensive kind of ANALYZE that scans the whole table, and it's optional, sure, why not? But that's a one-time hit. You run it, it sucks, it's over, and now your queries work. Odds are good you may never need to touch it again. Now, if we can convince ourselves that multi-column stats are likely to require constant updating, then maybe there would be more to recommend the stream-based approach, although even then it seems dreadfully complex and expensive to me. But I bet these things are pretty static. If the city and state tend to imply the zip code when there are 10K rows in the table, they probably still will when there are 100K rows in the table. If users with org_id = 1 have a higher karma score on average than users with org_id != 1, that'll probably still be true when there are more users in both classes. Whether the database can understand that without rescanning the table depends on the data representation, of course, but I guess I'm just saying I'd think really, really hard before abandoning the idea of periodic sampling. You have to get an awful lot of benefit out of those cross-column stats to make it worth paying a run-time cost for them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo List Item - System Table Index Clustering
Robert Haas robertmh...@gmail.com writes: On Tue, Jan 18, 2011 at 6:49 PM, Simone Aiken sai...@quietlycompetent.com wrote: Pages like this one have column comments for the system tables: http://www.psql.it/manuale/8.3/catalog-pg-attribute.html Oh, I see. I don't think we want to go there. We'd need some kind of system for keeping the two places in sync. I seem to recall some muttering about teaching genbki to extract such comments from the SGML sources or perhaps the C header files. I tend to agree though that it would be a lot more work than it's worth. And as you say, pg_description entries aren't free. Which brings up another point though. I have a personal TODO item to make the comments for operator support functions more consistent: http://archives.postgresql.org/message-id/21407.1287157...@sss.pgh.pa.us Should we consider removing those comments altogether, instead? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo List Item - System Table Index Clustering
Excerpts from Robert Haas's message of mié ene 19 15:25:00 -0300 2011: Oh, I see. I don't think we want to go there. We'd need some kind of system for keeping the two places in sync. Maybe autogenerate both the .sgml and the postgres.description files from a single source. And there'd be no easy way to upgrade the in-database descriptions when we upgraded to a newer minor release, supposing they'd changed in the meantime. I wouldn't worry about this issue. We don't do many catalog changes in minor releases anyway. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] REVIEW: patch: remove redundant code from pl_exec.c
Greetings, * Pavel Stehule (pavel.steh...@gmail.com) wrote: This patch remove redundant rows from PL/pgSQL executor (-89 lines). While I can certainly appreciate wanting to remove redundant code, I don't think this change actually improves the situation. The problem is more than just that we might want to make a change to 'while' but not 'for', it's also that it makes it very difficult to follow the code flow. If you could have found a way to make it work for all calls to exec_stmts(), it might be a bit better, but you can't without kludging it up further. If you could, then it might be possible to move some of this logic *into* exec_stmts(), but I don't see that being reasonable either. In the end, I'd recommend cleaning up the handling of the exec_stmts() return code so that all of these pieces follow the same style and look similar (I'd go with the switch-based approach and remove the if/else branches). That'll make it easier for anyone coming along later who does end up needing to change all three. Doesn't change a functionality. I'm not convinced of this either, to be honest.. In exec_stmt_while(), there is: for(;;) { [...] if (isnull || !value) break; rc = exec_stmts(estate, stmt-body); [...] } return PLPGSQL_RC_OK; You replaced the last return with: return rc; Which could mean returning an uninitialized rc if the above break; happens. In the end, I guess it's up to the committers on how they feel about this, so here's an updated patch which fixes the above issue and improves the comments/grammer. Passes all regressions and works in my limited testing. commit e6639d83db5b301e184bf158b1591007a3f79526 Author: Stephen Frost sfr...@snowman.net Date: Wed Jan 19 14:28:36 2011 -0500 Improve comments in pl_exec.c wrt can_leave_loop() This patch improves some of the comments around can_leave_loop(), and fixes a couple of fall-through cases to make sure they behave the same as before the code de-duplication patch that introduced can_leave_loop(). commit f8262adcba662164dbc24d289cb036b3f0aee582 Author: Stephen Frost sfr...@snowman.net Date: Wed Jan 19 14:26:27 2011 -0500 remove redundant code from pl_exec.c This patch removes redundant handling of exec_stmts()'s return code by creating a new function to be called after exec_stmts() is run. This new function will then handle the return code, update it if necessary, and return if the loop should continue or not. Patch by Pavel Stehule. Thanks, Stephen *** a/src/pl/plpgsql/src/pl_exec.c --- b/src/pl/plpgsql/src/pl_exec.c *** *** 204,209 static Portal exec_dynquery_with_params(PLpgSQL_execstate *estate, --- 204,211 PLpgSQL_expr *dynquery, List *params, const char *portalname, int cursorOptions); + static bool can_leave_loop(PLpgSQL_execstate *estate, + PLpgSQL_any_loop *stmt, int *rc); /* -- * plpgsql_exec_function Called by the call handler for *** *** 1566,1571 exec_stmt_case(PLpgSQL_execstate *estate, PLpgSQL_stmt_case *stmt) --- 1568,1650 return exec_stmts(estate, stmt-else_stmts); } + /* + * can_leave_loop + * + * Check the result of exec_stmts for the various exec_stmt_loop + * functions (unconditional loop, while loop, for-over-integer loop, + * for-over-portal loop). + * + * Returns true when the outer cycle should be left, otherwise false. + * Will also update the return code (rc) as necessary. + */ + static bool + can_leave_loop(PLpgSQL_execstate *estate, PLpgSQL_any_loop *stmt, int *rc) + { + /* + * Check which return code exec_stmts() returned and handle it + * accordingly. + */ + switch (*rc) + { + case PLPGSQL_RC_OK: + /* Just continue the outer loop on PLPGSQL_RC_OK */ + return false; + + case PLPGSQL_RC_RETURN: + /* Time to break out of the outer loop */ + return true; + + case PLPGSQL_RC_EXIT: + if (estate-exitlabel == NULL) + { + /* unlabelled exit, finish the current loop */ + *rc = PLPGSQL_RC_OK; + } + if (stmt-label != NULL strcmp(stmt-label, estate-exitlabel) == 0) + { + /* labelled exit, matches the current stmt's label */ + estate-exitlabel = NULL; + *rc = PLPGSQL_RC_OK; + } + + /* + * otherwise, this is a labelled exit that does not match the + * current statement's label, if any: return RC_EXIT so that the + * EXIT continues to propagate up the stack. + */ + return true; + + case PLPGSQL_RC_CONTINUE: + if (estate-exitlabel == NULL) + { + /* anonymous continue, so re-run the loop */ + *rc = PLPGSQL_RC_OK; + } + else if (stmt-label != NULL + strcmp(stmt-label, estate-exitlabel) == 0) + { + /* label matches named continue, so re-run loop */ + estate-exitlabel = NULL; + *rc =
Re: [HACKERS] ToDo List Item - System Table Index Clustering
On Wed, Jan 19, 2011 at 2:26 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jan 18, 2011 at 6:49 PM, Simone Aiken sai...@quietlycompetent.com wrote: Pages like this one have column comments for the system tables: http://www.psql.it/manuale/8.3/catalog-pg-attribute.html Oh, I see. I don't think we want to go there. We'd need some kind of system for keeping the two places in sync. I seem to recall some muttering about teaching genbki to extract such comments from the SGML sources or perhaps the C header files. I tend to agree though that it would be a lot more work than it's worth. And as you say, pg_description entries aren't free. Which brings up another point though. I have a personal TODO item to make the comments for operator support functions more consistent: http://archives.postgresql.org/message-id/21407.1287157...@sss.pgh.pa.us Should we consider removing those comments altogether, instead? I could go either way on that. Most of those comments are pretty short, aren't they? How much storage are they really costing us? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
Robert Haas robertmh...@gmail.com writes: ... I guess I'm just saying I'd think really, really hard before abandoning the idea of periodic sampling. You have to get an awful lot of benefit out of those cross-column stats to make it worth paying a run-time cost for them. I've been trying to not discourage Tomas from blue-sky speculation, but I have to say I agree with Robert that the chance of any usable result from this approach is very very small. Feel free to do some experimentation to prove us wrong --- but don't put a lot of effort into it before that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
Robert Haas robertmh...@gmail.com writes: On Wed, Jan 19, 2011 at 12:10 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Yeah. Many-times-repeated detoasting is really bad, and this is not the only place in the backend where we have this problem. :-( Yeah, there's been some discussion of a more general solution, and I think I even had a trial patch at one point (which turned out not to work terribly well, but maybe somebody will have a better idea someday). I'm pretty doubtful that there's going to be a general solution to this problem - I think it's going to require gradual refactoring of problem spots. Do you remember the previous discussion? One idea that was on the table was to make the TOAST code maintain a cache of detoasted values, which could be indexed by the toast pointer OIDs (toast rel OID + value OID), and then PG_DETOAST_DATUM might give back a pointer into the cache instead of a fresh value. In principle that could be done in a fairly centralized way. The hard part is to know when a cache entry is not actively referenced anymore ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python refactoring
On ons, 2011-01-19 at 00:52 -0300, Alvaro Herrera wrote: Excerpts from Peter Eisentraut's message of mar ene 18 19:22:50 -0300 2011: #16: This is probably pointless because pgindent will reformat this. pgindent used to remove useless braces around single-statement blocks, but this behavior was removed years ago because it'd break formatting around PG_TRY blocks. Good to know. Committed then. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending opfamilies for GIN indexes
Tom Lane t...@sss.pgh.pa.us writes: I think you missed the point: right now, to use both the core and intarray operators on an integer[] column, you have to create *two* GIN indexes, which will have exactly identical contents. I'm looking for a way to let intarray extend the core opfamily definition so that one index can serve. That I think I understood, but then I mixed opfamily and opclasses badly. Let's try again. For the GIN indexes, we have 2 methods for building the index and 3 others to search it to solve the query. You're proposing that the 2 former methods would be in the opfamily and the 3 later in the opclass. We'd like to be able to use the same index (which building depends on the opfamily) for solving different kind of queries, for which we can use different traversal and search algorithms, that's the opclass. So we would want the planner to know that in the GIN case an index built with any opclass of a given opfamily can help answer a query that would need any opclass of the opfamily. Right? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo List Item - System Table Index Clustering
Robert Haas robertmh...@gmail.com writes: On Wed, Jan 19, 2011 at 2:26 PM, Tom Lane t...@sss.pgh.pa.us wrote: Which brings up another point though. I have a personal TODO item to make the comments for operator support functions more consistent: http://archives.postgresql.org/message-id/21407.1287157...@sss.pgh.pa.us Should we consider removing those comments altogether, instead? I could go either way on that. Most of those comments are pretty short, aren't they? How much storage are they really costing us? Well, on my machine pg_description is about 210K (per database) as of HEAD. 90% of its contents are pg_proc entries, though I have no good fix on how much of that is for internal-use-only functions. A very rough estimate from counting pg_proc and pg_operator entries suggests that the answer might be about a third. So if we do what was said in the above-cited thread, ie move existing comments to pg_operator and add boilerplate ones to pg_proc, we probably would pay 100K for it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
On Wed, Jan 19, 2011 at 2:58 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Jan 19, 2011 at 12:10 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Yeah. Many-times-repeated detoasting is really bad, and this is not the only place in the backend where we have this problem. :-( Yeah, there's been some discussion of a more general solution, and I think I even had a trial patch at one point (which turned out not to work terribly well, but maybe somebody will have a better idea someday). I'm pretty doubtful that there's going to be a general solution to this problem - I think it's going to require gradual refactoring of problem spots. Do you remember the previous discussion? One idea that was on the table was to make the TOAST code maintain a cache of detoasted values, which could be indexed by the toast pointer OIDs (toast rel OID + value OID), and then PG_DETOAST_DATUM might give back a pointer into the cache instead of a fresh value. In principle that could be done in a fairly centralized way. The hard part is to know when a cache entry is not actively referenced anymore ... I do remember that discussion. Aside from the problem you mention, it also seems that maintaining the hash table and doing lookups into it would have some intrinsic cost. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo List Item - System Table Index Clustering
On Wed, Jan 19, 2011 at 3:10 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Jan 19, 2011 at 2:26 PM, Tom Lane t...@sss.pgh.pa.us wrote: Which brings up another point though. I have a personal TODO item to make the comments for operator support functions more consistent: http://archives.postgresql.org/message-id/21407.1287157...@sss.pgh.pa.us Should we consider removing those comments altogether, instead? I could go either way on that. Most of those comments are pretty short, aren't they? How much storage are they really costing us? Well, on my machine pg_description is about 210K (per database) as of HEAD. 90% of its contents are pg_proc entries, though I have no good fix on how much of that is for internal-use-only functions. A very rough estimate from counting pg_proc and pg_operator entries suggests that the answer might be about a third. So if we do what was said in the above-cited thread, ie move existing comments to pg_operator and add boilerplate ones to pg_proc, we probably would pay 100K for it. I guess that's not enormously expensive, but it's not insignificant either. On my machine, a template database is 5.5MB. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: patch: remove redundant code from pl_exec.c
Stephen Frost sfr...@snowman.net writes: While I can certainly appreciate wanting to remove redundant code, I don't think this change actually improves the situation. The problem is more than just that we might want to make a change to 'while' but not 'for', it's also that it makes it very difficult to follow the code flow. That was my reaction as well; and I was also concerned that this could have a non-negligible performance price. (At the very least it's adding an additional function call per loop execution, and there could also be a penalty from forcing rc to be in memory rather than a register.) I think we should reject this one. In the end, I'd recommend cleaning up the handling of the exec_stmts() return code so that all of these pieces follow the same style and look similar (I'd go with the switch-based approach and remove the if/else branches). That'll make it easier for anyone coming along later who does end up needing to change all three. Using a switch there is a bit problematic since in some cases you want to use break to exit the loop. We could replace such breaks by gotos, but that would be another strike against the argument that you're making things more readable. I think the switch in exec_stmt_loop is only workable because it has no cleanup to do, so it can just return in places where a loop break would otherwise be needed. In short: if you want to make these all look alike, better to go the other way. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: patch: remove redundant code from pl_exec.c
* Tom Lane (t...@sss.pgh.pa.us) wrote: I think we should reject this one. Works for me. Using a switch there is a bit problematic since in some cases you want to use break to exit the loop. We could replace such breaks by gotos, but that would be another strike against the argument that you're making things more readable. I think the switch in exec_stmt_loop is only workable because it has no cleanup to do, so it can just return in places where a loop break would otherwise be needed. In short: if you want to make these all look alike, better to go the other way. Ah, yeah, good point. We do use gotos elsewhere for this reason, might consider revisiting those also, if we're trying to a 'clean-up' patch. In any case, I'll mark this as rejected. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Extending opfamilies for GIN indexes
Dimitri Fontaine dimi...@2ndquadrant.fr writes: For the GIN indexes, we have 2 methods for building the index and 3 others to search it to solve the query. You're proposing that the 2 former methods would be in the opfamily and the 3 later in the opclass. Actually the other way around. An opclass is the subset of an opfamily that is tightly bound to an index. The build methods have to be associatable with an index, so they're part of the index's opclass. The query methods could be loose in the opfamily. So we would want the planner to know that in the GIN case an index built with any opclass of a given opfamily can help answer a query that would need any opclass of the opfamily. Right? The planner's not the problem here --- what's missing is the rule for the index AM to look up the right support functions to call at runtime. The trick is to associate the proper query support methods with any given query operator (which'd also be loose in the family, probably). The existing schema for pg_amop and pg_amproc is built on the assumption that the amoplefttype/amoprighttype are sufficient for making this association; but that seems to fall down if we would like to allow contrib modules to add new query operators that coincidentally take the same input types as an existing opfamily member. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
Robert Haas robertmh...@gmail.com writes: I do remember that discussion. Aside from the problem you mention, it also seems that maintaining the hash table and doing lookups into it would have some intrinsic cost. Well, sure, but it's still far cheaper than going out to the toast table (which will require multiple hashtable lookups, not to mention I/O and possible lock blocking). If we could solve the refcounting problem I think this would be a very significant win. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
Tom Lane t...@sss.pgh.pa.us wrote: If we could solve the refcounting problem I think this would be a very significant win. Instead of trying to keep a refcount, how about just evicting from the buffer as needed based on LRU? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Couple document fixes
Thom Brown t...@linux.com writes: I've attached a couple minor fixes to the docs. One relating to SECURITY LABEL and the other for pg_class.relpersistence Applied, thanks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Couple document fixes
On 19 January 2011 21:10, Tom Lane t...@sss.pgh.pa.us wrote: Thom Brown t...@linux.com writes: I've attached a couple minor fixes to the docs. One relating to SECURITY LABEL and the other for pg_class.relpersistence Applied, thanks. Cheers Mr Lane. -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: If we could solve the refcounting problem I think this would be a very significant win. Instead of trying to keep a refcount, how about just evicting from the buffer as needed based on LRU? Well, unless you know for certain that an item is no longer used, you can't evict it. There are ways you could finesse that --- for instance, you might try to only flush between statements, when certainly no user functions are running --- but the problem is that the cache is likely to contain some large values that you can't adopt such a laissez faire approach with, or you'll run out of memory. One idea that I think we discussed was to tie cache entries to the memory context they were demanded in, and mark them unused at the next context reset/delete. That way they'd be considered unused at the same points where the current implementation would certainly have discarded the value. This isn't perfect (because of pfree) but might be good enough. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
On Wed, Jan 19, 2011 at 4:18 PM, Tom Lane t...@sss.pgh.pa.us wrote: One idea that I think we discussed was to tie cache entries to the memory context they were demanded in, and mark them unused at the next context reset/delete. That way they'd be considered unused at the same points where the current implementation would certainly have discarded the value. This isn't perfect (because of pfree) but might be good enough. Yeah, I was thinking that's probably what would have to be done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python refactoring
On Wed, Jan 19, 2011 at 2:59 PM, Peter Eisentraut pete...@gmx.net wrote: On ons, 2011-01-19 at 00:52 -0300, Alvaro Herrera wrote: Excerpts from Peter Eisentraut's message of mar ene 18 19:22:50 -0300 2011: #16: This is probably pointless because pgindent will reformat this. pgindent used to remove useless braces around single-statement blocks, but this behavior was removed years ago because it'd break formatting around PG_TRY blocks. Good to know. Committed then. I cracked up upon reading your commit message. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
Dne 19.1.2011 20:25, Robert Haas napsal(a): On Tue, Jan 18, 2011 at 5:16 PM, Tomas Vondra t...@fuzzy.cz wrote: Yes, I was aware of this problem (amount of memory consumed with lots of updates), and I kind of hoped someone will come up with a reasonable solution. As far as I can see, periodically sampling some or all of the table is really the only thing that has a chance of working OK. You could try to track changes only when they're not too big, but I think that's still going to have nasty performance consequences. Yes, sampling all the table is the only way to get reliable ndistinct estimates. I'm not sure what you mean by 'tracking changes' - as I've mentioned in the previous post, this might be solved by updating a local copy. Which requires a constant space (a few kB, see the previous post). Is that acceptable? I don't know, that's up to the user if he want's to pay this price. So the algorithm would be something like this: 1. create copy for all distinct estimators influenced by the INSERT 2. update the local copy 3. commit and merge the local copies back into the original estimator Well, maybe. But DELETEs seem like a problem. And it's still adding foreground work to every query, which I just have a hard time believing is ever going to work out Yes, deletes are difficult to handle. My idea was to compute something like ((tuples changed + tuples deleted) / tuples total), and indicate that a rebuild of the estimator is needed if this coefficient changes too much - e.g. log a message or something like that. Regarding the crash scenario - if the commit fails, just throw away the local estimator copy, it's not needed. I'm not sure how to take care of the case when commit succeeds and the write of the merged estimator fails, but I think it might be possible to write the estimator to xlog or something like that. So it would be replayed during recovery etc. Or is it a stupid idea? It's not stupid, in the sense that that is what you'd need to do if you want to avoid ever having to rescan the table. But it is another thing that I think is going to be way too expensive. Way too expensive? All you need to put into the logfile is a copy of the estimator, which is a few kBs. How is that 'way too expensive'? Sure, it might be expensive when the user does a lot of small changes in separate transactions, that's true, but I guess we could minimize the amount of data written to the xlog by doing a diff of the estimators or something like that. Yes, as I've mentioned above, the multi-column stats are the reason why I started working on this. And yes, it really should work like this: 1. user realizes there's something wrong as the plans are bad 2. after analysis, the user realizes the reason is an imprecise estimate due to dependence between columns 3. the user enables cross-column stats on the columns and checks if it actually solved the issue 4. if the cost outweights the gains, the user drops the stats Does that sound reasonable? Yes. The only caveat I'm trying to insert is that it's hard for me to see how the proposed approach could ever be cheap enough to be a win. IMHO the question is not 'How expensive is that?' but rather 'How expensive is it compared to the gains?' If the user gets much better estimates and a then a much better plan, then this may be a completely acceptable price. If we need some kind of more expensive kind of ANALYZE that scans the whole table, and it's optional, sure, why not? But that's a one-time hit. You run it, it sucks, it's over, and now your queries work. Odds are good you may never need to touch it again. Now, if we can convince ourselves that multi-column stats are likely to require constant updating, then maybe there would be more to recommend the stream-based approach, although even then it seems dreadfully complex and expensive to me. But I bet these things are pretty static. If No, the multi-column statistics do not require constant updating. There are cases where a sampling is perfectly fine, although you may need a bit larger sample. Generally if you can use a multi-dimensional histogram, you don't need to scan the whole table. But the multi-dimensional histograms are not applicable to some cases. Especially to the ZIP code fail case, that was repeatedly discussed. So in case of discrete data, we need a different approach - and the solution I've proposed is based on using ndistinct estimates to get the estimates (actually it's based on one of the papers listed on wiki). and expensive to me. But I bet these things are pretty static. If the city and state tend to imply the zip code when there are 10K rows in the table, they probably still will when there are 100K rows in the table. If users with org_id = 1 have a higher karma score on average OK, how will you measure the implicativeness? We have discussed this in another thread and there is a lot of gotchas although it seems like a really
Re: [HACKERS] estimating # of distinct values
Dne 19.1.2011 20:46, Tom Lane napsal(a): Robert Haas robertmh...@gmail.com writes: ... I guess I'm just saying I'd think really, really hard before abandoning the idea of periodic sampling. You have to get an awful lot of benefit out of those cross-column stats to make it worth paying a run-time cost for them. I've been trying to not discourage Tomas from blue-sky speculation, but I have to say I agree with Robert that the chance of any usable result from this approach is very very small. Feel free to do some experimentation to prove us wrong --- but don't put a lot of effort into it before that. regards, tom lane Discourage? Not really - I have not expected this to be a simple thing to implement. And yes, it might turn out to be a dead end eventually. OTOH the approach it seems really expensive so let's not try to build it is a certain dead end, so I'm not going to surrender due to such speculations (althouh those are valid concerns and I'll have to address them in the future). regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo List Item - System Table Index Clustering
Robert Haas robertmh...@gmail.com writes: On Wed, Jan 19, 2011 at 3:10 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, on my machine pg_description is about 210K (per database) as of HEAD. 90% of its contents are pg_proc entries, though I have no good fix on how much of that is for internal-use-only functions. A very rough estimate from counting pg_proc and pg_operator entries suggests that the answer might be about a third. So if we do what was said in the above-cited thread, ie move existing comments to pg_operator and add boilerplate ones to pg_proc, we probably would pay 100K for it. I guess that's not enormously expensive, but it's not insignificant either. On my machine, a template database is 5.5MB. The implementation I was thinking about was to have initdb run a SQL command that would do something like INSERT INTO pg_description SELECT oprcode, 'pg_proc'::regclass, 0, 'implementation of ' || oprname FROM pg_operator WHERE theres-not-already-a-description-of-the-oprcode-function So it would be minimal work to either provide or omit the boilerplate descriptions. I think we can postpone the decision till we have a closer fix on the number of entries we're talking about. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
On Wed, Jan 19, 2011 at 2:13 PM, Tomas Vondra t...@fuzzy.cz wrote: Dne 19.1.2011 20:25, Robert Haas napsal(a): On Tue, Jan 18, 2011 at 5:16 PM, Tomas Vondra t...@fuzzy.cz wrote: Yes, I was aware of this problem (amount of memory consumed with lots of updates), and I kind of hoped someone will come up with a reasonable solution. As far as I can see, periodically sampling some or all of the table is really the only thing that has a chance of working OK. You could try to track changes only when they're not too big, but I think that's still going to have nasty performance consequences. Yes, sampling all the table is the only way to get reliable ndistinct estimates. IMHO continuing to focus on worst case results is a dead end. Yes, for some distributions, ndistinct is very hard to estimate well. However, let us not forget that the current ndistinct estimator is very bad but the postgres planner is, on the whole, very good. As far as I can see this is due to two facts: 1) The distribution of values in a table is rarely pathological, and usually follows one of several common patterns. ( IOW, we have good heuristics ) 2) The choice of plan is fairly robust to mis-estimation of ndistinct, because there are only a few things the executer can choose. It doesn't usually matter if a value composes 5% or 20% ( or 99% ) of the table, we still usually need to scan the entire table. Again, in my *very* humble opinion, If the ultimate goal is to improve the planner, we should try to cut down on the number of cases in which a poor estimate of ndistinct gives a really bad plan instead of chasing after marginal gains from a better ndistinct estimator. Maybe ( as I've advocated for before ) this means parameterizing the distribution of values ( ie, find that the values are roughly uniform ), maybe this means estimating the error of our statistics and using the most robust rather than the best plan, or maybe it means something totally different. But: ( and the literature seems to support me ) pounding away at the ndistinct estimator seems like a dead end. If you think about it, it's a bit ridiculous to look at the whole table *just* to estimate ndistinct - if we go that far why dont we just store the full distribution and be done with it? Best, Nathan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... REPLACE WITH
Hi Simon, I'm reviewing this patch for CommitFest 2011-01. On Sat, Jan 15, 2011 at 10:02:03PM +, Simon Riggs wrote: On Tue, 2010-12-14 at 19:48 +, Simon Riggs wrote: REPLACE TABLE ying WITH yang Patch. Needs work. First, I'd like to note that the thread for this patch had *four* me-too responses to the use case. That's extremely unusual; the subject is definitely compelling to people. It addresses the bad behavior of natural attempts to atomically swap two tables in the namespace: psql -c CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES ('new') psql -c 'SELECT pg_sleep(2) FROM t' # block the ALTER or DROP briefly sleep 1 # give prev time to take AccessShareLock # Do it this way, and the next SELECT gets data from the old table. #psql -c 'ALTER TABLE t RENAME TO old_t; ALTER TABLE new_t RENAME TO t' # Do it this way, and get: ERROR: could not open relation with OID 41380 psql -c 'DROP TABLE t; ALTER TABLE new_t RENAME TO t' psql -c 'SELECT * FROM t' # I get 'old' or an error, never 'new'. psql -c 'DROP TABLE IF EXISTS t, old_t, new_t' by letting you do this instead: psql -c CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES ('new') psql -c 'SELECT pg_sleep(2) FROM t' # block the ALTER or DROP briefly sleep 1 # give prev time to take AccessShareLock psql -c 'EXCHANGE TABLE new_t TO t psql -c 'SELECT * FROM t' # I get 'new', finally! psql -c 'DROP TABLE IF EXISTS t, new_t' I find Heikki's (4d07c6ec.2030...@enterprisedb.com) suggestion from the thread interesting: can we just make the first example work? Even granting that the second syntax may be a useful addition, the existing behavior of the first example is surely worthless, even actively harmful. I tossed together a proof-of-concept patch, attached, that makes the first example DTRT. Do you see any value in going down that road? As for your patch itself: + /* + * Exchange table contents + * + * Swap heaps, toast tables, toast indexes + * all forks + * all indexes For indexes, would you basically swap yin-yang in pg_index.indrelid, update pg_class.relnamespace as needed, and check for naming conflicts (when yin and yang differ in schema)? Is there more? + * + * Checks: + * * table definitions must match Is there a particular reason to require this, or is it just a simplification to avoid updating things to match? + * * constraints must match Wouldn't CHECK constraints have no need to match? + * * indexes need not match + * * outbound FKs don't need to match + * * inbound FKs will be set to not validated We would need to ensure that inbound FOREIGN KEY constraints still have indexes suitable to implement them. The easiest thing there might be to internally drop and recreate the constraint, so we get all that verification. + * + * No Trigger behaviour + * + * How is it WAL logged? By locks and underlying catalog updates + */ I see that the meat of the patch is yet to be written, so this ended up as more of a design review based on your in-patch comments. Hopefully it's of some value. I'll go ahead and mark the patch Returned with Feedback. Thanks, nm *** a/src/backend/access/heap/heapam.c --- b/src/backend/access/heap/heapam.c *** *** 970,995 relation_openrv(const RangeVar *relation, LOCKMODE lockmode) { Oid relOid; ! /* !* Check for shared-cache-inval messages before trying to open the !* relation. This is needed to cover the case where the name identifies a !* rel that has been dropped and recreated since the start of our !* transaction: if we don't flush the old syscache entry then we'll latch !* onto that entry and suffer an error when we do RelationIdGetRelation. !* Note that relation_open does not need to do this, since a relation's !* OID never changes. !* !* We skip this if asked for NoLock, on the assumption that the caller has !* already ensured some appropriate lock is held. !*/ ! if (lockmode != NoLock) ! AcceptInvalidationMessages(); ! ! /* Look up the appropriate relation using namespace search */ ! relOid = RangeVarGetRelid(relation, false); /* Let relation_open do the rest */ ! return relation_open(relOid, lockmode); } /* --- 970,980 { Oid relOid; ! /* Look up and lock the appropriate relation using namespace search */ ! relOid = RangeVarLockRelid(relation, lockmode, false); /* Let relation_open do the rest */ !
Re: [HACKERS] psql: Add \dL to show languages
On Tue, 2011-01-18 at 19:34 -0500, Josh Kupershmidt wrote: Got that now too. I lost my ~/.emacs file recently, which is mostly why I'm making whitespace mistakes. Rebuilding slowly though; (setq-default show-trailing-whitespace t) is what I needed. Aha, I see. I left the Call Handler and Validator columns in the verbose output since I haven't heard otherwise. Josh I do not really care either way. The columns are not terribly useful but not pointless either. The patch looks alright now so I will mark it as ready for committer now. Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] REVIEW: WIP: plpgsql - foreach in
Greetings, * Pavel Stehule (pavel.steh...@gmail.com) wrote: attached patch contains a implementation of iteration over a array: I've gone through this patch and, in general, it looks pretty reasonable to me. There's a number of places where I think additional comments would be good and maybe some variable name improvments. Also, my changes should be reviewed to make sure they make sense. Attached is a patch against master which includes my changes, and a patch against Pavel's patch, so he can more easily see my changes and include them if he'd like. I'm going to mark this returned to author with feedback. commit 30295015739930e68c33b29da4f7ef535bc293ea Author: Stephen Frost sfr...@snowman.net Date: Wed Jan 19 17:58:24 2011 -0500 Clean up foreach-in-array PL/PgSQL code/comments Minor clean-up of the PL/PgSQL foreach-in-array patch, includes some white-space cleanup, grammar fixes, additional errhint where it makes sense, etc. Also added a number of 'XXX' comments asking for clarification and additional comments on what's happening in the code. commit f1a02fe3a8fa84217dae32d5ba74e9764c77431c Author: Stephen Frost sfr...@snowman.net Date: Wed Jan 19 15:11:53 2011 -0500 PL/PgSQL - Add interate-over-array support This patch adds support for iterating over an array in PL/PgSQL. Patch Author: Pavel Stehule Thanks, Stephen *** a/doc/src/sgml/plpgsql.sgml --- b/doc/src/sgml/plpgsql.sgml *** *** 2238,2243 END LOOP optional replaceablelabel/replaceable /optional; --- 2238,2268 /para /sect2 +sect2 id=plpgsql-array-iterating + titleLooping Through Array/title + synopsis + optional lt;lt;replaceablelabel/replaceablegt;gt; /optional + FOREACH replaceabletarget/replaceable optional SCALE replaceablenumber/replaceable /optional IN replaceableexpression/replaceable LOOP + replaceablestatements/replaceable + END LOOP optional replaceablelabel/replaceable /optional; + /synopsis + + programlisting + CREATE FUNCTION sum(VARIADIC int[]) RETURNS int8 AS $$ + DECLARE + s int8; x int; + BEGIN + FOREACH x IN $1 + LOOP + s := s + x; + END LOOP; + RETURN s; + END; + $$ LANGUAGE plpgsql; + /programlisting + +/sect2 + sect2 id=plpgsql-error-trapping titleTrapping Errors/title *** a/src/pl/plpgsql/src/gram.y --- b/src/pl/plpgsql/src/gram.y *** *** 21,26 --- 21,27 #include parser/parse_type.h #include parser/scanner.h #include parser/scansup.h + #include utils/array.h /* Location tracking support --- simpler than bison's default */ *** *** 134,139 static List *read_raise_options(void); --- 135,141 PLpgSQL_datum *scalar; PLpgSQL_rec *rec; PLpgSQL_row *row; + int slice; } forvariable; struct { *** *** 178,184 static List *read_raise_options(void); %type ival assign_var %type var cursor_variable %type datum decl_cursor_arg ! %type forvariable for_variable %type stmt for_control %type str any_identifier opt_block_label opt_label --- 180,186 %type ival assign_var %type var cursor_variable %type datum decl_cursor_arg ! %type forvariable for_variable foreach_control %type stmt for_control %type str any_identifier opt_block_label opt_label *** *** 190,196 static List *read_raise_options(void); %type stmt stmt_return stmt_raise stmt_execsql %type stmt stmt_dynexecute stmt_for stmt_perform stmt_getdiag %type stmt stmt_open stmt_fetch stmt_move stmt_close stmt_null ! %type stmt stmt_case %type list proc_exceptions %type exception_block exception_sect --- 192,198 %type stmt stmt_return stmt_raise stmt_execsql %type stmt stmt_dynexecute stmt_for stmt_perform stmt_getdiag %type stmt stmt_open stmt_fetch stmt_move stmt_close stmt_null ! %type stmt stmt_case stmt_foreach_a %type list proc_exceptions %type exception_block exception_sect *** *** 264,269 static List *read_raise_options(void); --- 266,272 %token keyword K_FETCH %token keyword K_FIRST %token keyword K_FOR + %token keyword K_FOREACH %token keyword K_FORWARD %token keyword K_FROM %token keyword K_GET *** *** 298,303 static List *read_raise_options(void); --- 301,307 %token keyword K_ROWTYPE %token keyword K_ROW_COUNT %token keyword K_SCROLL + %token keyword K_SLICE %token keyword K_SQLSTATE %token keyword K_STRICT %token keyword K_THEN *** *** 739,744 proc_stmt : pl_block ';' --- 743,750 { $$ = $1; } | stmt_for { $$ = $1; } + | stmt_foreach_a + { $$ = $1; } | stmt_exit { $$ = $1; } | stmt_return *** *** 1386,1391 for_variable : T_DATUM --- 1392,1455 } ; + stmt_foreach_a : opt_block_label K_FOREACH foreach_control K_IN
Re: [HACKERS] estimating # of distinct values
Dne 19.1.2011 23:44, Nathan Boley napsal(a): 1) The distribution of values in a table is rarely pathological, and usually follows one of several common patterns. ( IOW, we have good heuristics ) Not true. You're missing the goal of this effort - to get ndistinct estimate for combination of multiple columns. Which is usually pathological in case of dependent columns. Which is exactly the case when the user will explicitly enable this feature to get better estimates (and thus better plans). 2) The choice of plan is fairly robust to mis-estimation of ndistinct, because there are only a few things the executer can choose. It doesn't usually matter if a value composes 5% or 20% ( or 99% ) of the table, we still usually need to scan the entire table. Again, don't think about a single column (although even in that case there are known fail cases). Think about multiple columns and the number of distinct combinations. With attribute value independence assumption, this is usually significantly underestimated. And that's a problem as it often leads to an index scan instead of sequential scan (or something like that). Again, in my *very* humble opinion, If the ultimate goal is to improve the planner, we should try to cut down on the number of cases in which a poor estimate of ndistinct gives a really bad plan instead of chasing after marginal gains from a better ndistinct estimator. Maybe What is a marginal gain? The ultimate goal is to build cross-column stats, which in case of dependent columns usually results is poor plans. How is fixing that a marginal gain? Yes, it may turn out it's not worth it, but it's a bit early to say that without an implementation and some hard data. ( as I've advocated for before ) this means parameterizing the distribution of values ( ie, find that the values are roughly uniform ), maybe this means estimating the error of our statistics and using the most robust rather than the best plan, or maybe it means something totally different. But: ( and the literature seems to support me ) Which literature? I've read an awful lot of papers on this topic lately, and I don't remember anything like that. If there's an interesting paper, I'd like to read it. Yes, all the papers state estimating a ndistinct is a particularly tricky task, but that's somehow expected? I've even checked how other databases do this - e.g. Oracle does it very similarly to what I proposed (the user has to enable the feature, it has to be rebuilt from time to time, etc.). I'm not saying we should do everything like Oracle, but they are not clueless idiots. pounding away at the ndistinct estimator seems like a dead end. If you think about it, it's a bit ridiculous to look at the whole table *just* to estimate ndistinct - if we go that far why dont we just store the full distribution and be done with it? No, I'm not trying to do this just to improve the ndistinct estimate. The goal is to improve the cardinality estimate in case of dependent columns, and one of the papers depends on good ndistinct estimates. And actually it does not depend on ndistinct for the columns only, it depends on ndistinct estimates for the combination of columns. So improving the ndistinct estimates for columns is just a necessary first step (and only if it works reasonably well, we can do the next step). regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
On Jan19, 2011, at 23:44 , Nathan Boley wrote: If you think about it, it's a bit ridiculous to look at the whole table *just* to estimate ndistinct - if we go that far why dont we just store the full distribution and be done with it? The crucial point that you're missing here is that ndistinct provides an estimate even if you *don't* have a specific value to search for at hand. This is way more common than you may think, it e.g. happens every you time PREPARE are statement with parameters. Even knowing the full distribution has no advantage in this case - the best you could do is to average the individual probabilities which gives ... well, 1/ndistinct. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python refactoring
On 19/01/11 10:57, Jan Urbański wrote: On 18/01/11 23:22, Peter Eisentraut wrote: #2: It looks like this loses some information/formatting in the error message. Should we keep the pointing arrow there? CONTEXT: PL/Python function sql_syntax_error -ERROR: syntax error at or near syntax -LINE 1: syntax error -^ -QUERY: syntax error +ERROR: PL/Python: plpy.SPIError: syntax error at or near syntax CONTEXT: PL/Python function sql_syntax_error Yes, the message is less informative, because the error is reported by PL/Python, by wrapping the SPI message. I guess I could try to extract more info from the caught ErrorData and put it in the new error that PL/Python throws. All right, I found a way to shoehorn the extra information into SPIException, I'll post a new patch series with what's left of the general refactoring patch soon. Shortly after, I'll post updated patches for doing SPI in subxacts, explicit subxact starting and generated SPI exceptions, as they will surely be broken by these changes. Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... REPLACE WITH
On Wed, 2011-01-19 at 17:46 -0500, Noah Misch wrote: I'll go ahead and mark the patch Returned with Feedback. My understanding of the meaning of that is polite rejection. If you do that there is no further author comment and we move to July 2011. That then also rejects your own patch with what you say is an alternative implementation... Is that what you wish? That isn't what I wish, either way. I suggest you mark it Waiting on Author, so we can discuss it further. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] SSI and Hot Standby
Here's an issue for feedback from the community -- do we want to support truly serializable transactions on hot standby machines? The best way Dan and I have been able to think to do this is to build on the SERIALIZABLE READ ONLY DEFERRABLE behavior. We are able to obtain a snapshot and then check to see if it is at a place in the transaction processing that it would be guaranteed to be serializable without participating in predicate locking, rw-conflict detection, etc. If it's not, we block until a READ WRITE transaction completes, and then check again. Repeat. We may reach a point where we determine that the snapshot can't work, and we get a new one and start over. Due to the somewhat complex rules for this, you are likely to see a safe snapshot fairly quickly even in a mix which always has short-lived READ WRITE transactions running, although a single long-running READ WRITE transaction can block things until it completes. The idea is that whenever we see a valid snapshot which would yield a truly serializable view of the data for a READ ONLY transaction, we add a WAL record with that snapshot information. Of course, we might want some limit of how often they are sent, to avoid WAL bloat. A hot standby could just keep the most recently received of these and use it when a SERIALIZABLE transaction is requested. Perhaps DEFERRABLE in this context could mean that it waits for the *next* one and uses it, to assure freshness. Actually, we could try to get tricky to avoid sending a complete snapshot by having two WAL messages with no payload -- one would mean the snapshot you would get now is being tested for serializability. If it failed reach that state we would send another when we started working a new snapshot. The other type of message would mean the snapshot you built when we last told you we were starting to test one is good. I *think* that can work, and it may require less WAL space. If we don't do something like this, do we just provide REPEATABLE READ on the standby as the strictest level of transaction isolation? If so, do we generate an error on a request for SERIALIZABLE, warn and provide degraded behavior, or just quietly give them REPEATABLE READ behavior? Thoughts? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote: Here's an issue for feedback from the community -- do we want to support truly serializable transactions on hot standby machines? In this release? Maybe? In later releases? Yes. If it blocks your excellent contribution in this release, then from me, no. If you can achieve this in this release, yes. However, if this is difficult or complex, then I would rather say not yet quickly now, than spend months working out the weirdnesses and possibly still get them wrong. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Wed, Jan 19, 2011 at 9:37 PM, Magnus Hagander mag...@hagander.net wrote: The fast or slow seems to lead users to always choose fast. Instead, what about fast or smooth, fast or spread or immediate or delayed? Hmm. fast or spread seems reasonable to me. And I want to use fast for the fast version, because that's what we call it when you use pg_start_backup(). I'll go change it to spread for now - it's the one I can find used in the docs. Fair enough. What if pg_basebackup receives a signal while doing a backup? For example, users might do Ctrl-C to cancel the long-running backup. We should define a signal handler and send a Terminate message to the server to cancel the backup? Nah, we'll just disconnect and it'll deal with things that way. Just like we do with e.g. pg_dump. I don't see the need to complicate it with that. Okay. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Simon Riggs si...@2ndquadrant.com wrote: In this release? Maybe? In later releases? Yes. If it blocks your excellent contribution in this release, then from me, no. If you can achieve this in this release, yes. However, if this is difficult or complex, then I would rather say not yet quickly now, than spend months working out the weirdnesses and possibly still get them wrong. We already have a mechanism for generating a good snapshot, the hard part (for me at least) would be to get that snapshot over to the hot standby and have it use the latest one on a request for a serializable transaction. I have no experience with WAL file output, and don't know what it would take for hot standby to use it as I describe. I agree it's pretty late in the cycle, but I'm going through all the loose ends and found this one -- which has been hanging out on the Wiki page as an RD item for over a full year without discussion. :-( If we provide the snapshots (which we can safely and easily do), can someone else who knows what they're doing with WAL and HS get the rest of it safely into the release? That seems to me to be the only way it can still happen for 9.1. If not, I agree this can be 9.2 material. We just have to decide how to document it and answer the questions near the bottom of my initial post of the thread. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
If you think about it, it's a bit ridiculous to look at the whole table *just* to estimate ndistinct - if we go that far why dont we just store the full distribution and be done with it? - the best you could do is to average the individual probabilities which gives ... well, 1/ndistinct. That is certainly *not* the best you could do in every case. The mean is only the best estimate in L2, which is definitely not the case here. Consider a table with 10K values, 9,990 of which are 1 and the rest of which are 2, 3, ..., 10, versus a table that has the same 10 distinct values evenly distributed. For a simple equality query, in the first case, a bitmap scan might be best. In the second case, a sequential scan would always be best. This is precisely the point I was trying to make in my email: the loss function is very complicated. Improving the ndistinct estimator could significantly improve the estimates of ndistinct ( in the quadratic loss sense ) while only marginally improving the plans. -Nathan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Thu, Jan 20, 2011 at 2:21 AM, Tom Lane t...@sss.pgh.pa.us wrote: Fujii Masao masao.fu...@gmail.com writes: What I'm worried about is the case where a tablespace is created under the $PGDATA directory. What would be the sense of that? If you're concerned about whether the code handles it correctly, maybe the right solution is to add code to CREATE TABLESPACE to disallow it. I'm not sure why that's the right solution. Why do you think that we should not create the tablespace under the $PGDATA directory? I'm not surprised that people mounts the filesystem on $PGDATA/mnt and creates the tablespace on it. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... REPLACE WITH
On Thu, Jan 20, 2011 at 12:57:23AM +, Simon Riggs wrote: On Wed, 2011-01-19 at 17:46 -0500, Noah Misch wrote: I'll go ahead and mark the patch Returned with Feedback. My understanding of the meaning of that is polite rejection. If you do that there is no further author comment and we move to July 2011. That then also rejects your own patch with what you say is an alternative implementation... Is that what you wish? That isn't what I wish, either way. I suggest you mark it Waiting on Author, so we can discuss it further. Before I consider my wishes too carefully, I've moved the patch back to Waiting on Author, for the reason that it seems wrong to force it elsewhere today as long as the author (you) would like it there. Not that I have some kind of authority over patch disposition in any case. I had put it straight to RWF because it seemed clearly not intended to be applied. No political statement or harm intended, and maybe that determination was not even correct. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
Fujii Masao masao.fu...@gmail.com writes: On Thu, Jan 20, 2011 at 2:21 AM, Tom Lane t...@sss.pgh.pa.us wrote: Fujii Masao masao.fu...@gmail.com writes: What I'm worried about is the case where a tablespace is created under the $PGDATA directory. What would be the sense of that? If you're concerned about whether the code handles it correctly, maybe the right solution is to add code to CREATE TABLESPACE to disallow it. I'm not sure why that's the right solution. Why do you think that we should not create the tablespace under the $PGDATA directory? I'm not surprised that people mounts the filesystem on $PGDATA/mnt and creates the tablespace on it. No? Usually, having a mount point in a non-root-owned directory is considered a Bad Thing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers