Re: [HACKERS] REVIEW: patch: remove redundant code from pl_exec.c
2011/1/19 Stephen Frost : > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> I think we should reject this one. > > Works for me. > >> Using a switch there is a bit problematic since in some cases you want >> to use "break" to exit the loop. We could replace such breaks by gotos, >> but that would be another strike against the argument that you're making >> things more readable. I think the switch in exec_stmt_loop is only >> workable because it has no cleanup to do, so it can just "return" in >> places where a loop break would otherwise be needed. In short: if you >> want to make these all look alike, better to go the other way. > > Ah, yeah, good point. We do use gotos elsewhere for this reason, might > consider revisiting those also, if we're trying to a 'clean-up' patch. > In any case, I'll mark this as rejected. ok, I don't thinking so current code is readable, but I can't to do better now. Thank you for review. Regards Pavel Stehule > > Thanks! > > Stephen > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.10 (GNU/Linux) > > iEYEARECAAYFAk03S10ACgkQrzgMPqB3kigWdQCeIU/dvgJ8bMVZ7nh+TYiFHVlP > 4H4AnR/JbboMWbCu95G2aUEcP3LZDDGM > =R8c6 > -END PGP SIGNATURE- > > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On 20.01.2011 03:05, Kevin Grittner wrote: If we don't do something like this, do we just provide REPEATABLE READ on the standby as the strictest level of transaction isolation? If so, do we generate an error on a request for SERIALIZABLE, warn and provide degraded behavior, or just quietly give them REPEATABLE READ behavior? +1 for generating an error. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: EXPLAIN and nfiltered
On Thu, Jan 20, 2011 at 12:16, Stephen Frost wrote: > This patch looked good, in general, to me. I added a few documentation > updates and a comment, but it's a very straight-forward patch as far as > I can tell. Passes all regressions and my additional testing. Looks good and useful for me, too. We need to adjust a bit more documentation. The patch changes all of EXPLAIN ANALYZE outputs. When I grep'ed the docs with "loops=", EXPLAIN ANALYZE is also used in perform.sgml and auto-explain.sgml in addition to explain.sgml. It's good to have documentation about "nfiltered" parameter. The best place would be around the descriptions of "loops" in "Using EXPLAIN" page: http://developer.postgresql.org/pgdocs/postgres/using-explain.html -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Snapshot synchronization, again...
On Wed, Jan 19, 2011 at 12:12:39AM -0500, Joachim Wieland wrote: > Noah, thank you for this excellent review. I have taken over most > (allmost all) of your suggestions (except for the documentation which > is still missing). Please check the updated & attached patch for > details. Great. I do get an assertion failure with this sequence: BEGIN; SELECT pg_export_snapshot(); ROLLBACK; SELECT 1; TRAP: FailedAssertion("!(RegisteredSnapshots > 0)", File: "snapmgr.c", Line: 430) Perhaps some cleanup is missing from a ROLLBACK path? > > On Fri, Jan 14, 2011 at 10:13 PM, Noah Misch wrote: > > "" is a valid md5 message digest. ?To avoid always > > accepting > > a snapshot with that digest, we would need to separately store a flag > > indicating > > whether a given syncSnapshotChksums member is currently valid. ?Maybe that > > hole > > is acceptable, though. > > In the end I decided to store md5 checksums as printable characters in > shmem. We now need a few more bytes for each checksum but as soon as a > byte is NUL we know that it is not a valid checksum. Just to clarify, I was envisioning something like: typedef struct { bool valid; char value[16]; } snapshotChksum; I don't object to the way you've done it, but someone else might not like the extra marshalling between text and binary. Your call. > >> + ? ? ?* Instead we must check to not go forward (we might have opened a > >> cursor > >> + ? ? ?* in this transaction and still have its snapshot registered) > >> + ? ? ?*/ > > > > I'm thinking this case should yield an ERROR. ?Otherwise, our net result > > would > > be to silently adopt a snapshot that is neither our old self nor the target. > > Maybe there's a use case for that, but none comes to my mind. > > This can happen when you do: > > DECLARE foo CURSOR FOR SELECT * FROM bar; > import snapshot... > FETCH 1 FROM foo; I see now. You set snapshot->xmin unconditionally, but never move MyProc->xmin forward, only backward. Yes; that seems just right. > > I guess the use case for pg_import_snapshot from READ COMMITTED would be > > something like "DO $$BEGIN PERFORM pg_import_snapshot('...'); stuff; > > END$$;". > > First off, would that work ("stuff" use the imported snapshot)? ?If it does > > work, we should take the precedent of SET LOCAL and issue no warning. ?If it > > doesn't work, then we should document that the snapshot does take effect > > until > > the next command and probably keep this warning or something similar. > > No, this will also give you a new snapshot for every query, no matter > if it is executed within or outside of a DO-Block. You're right. Then consider "VALUES (pg_import_snapshot('...'), (SELECT count(*) from t))" at READ COMMITTED. It works roughly as I'd guess; the subquery uses the imported snapshot. If I flip the two expressions and do "VALUES ((SELECT count(*) from t), pg_import_snapshot('...'))", the subquery uses the normal snapshot. That makes sense, but I can't really see a use case for it. A warning, like your code has today, seems appropriate. > > Is it valid to scribble directly on snapshots like this? > > I figured that previously executed code still has references pointing > to the snapshots and so we cannot just push a new snapshot on top but > really need to change the memory where they are pointing to. Okay. Had no special reason to believe otherwise, just didn't know. > I am also adding a test script that shows the difference of READ > COMMITTED and SERIALIZABLE in an importing transaction in a DO-Block. > It's based on the script you sent. Thanks; that was handy. One thing I noticed is that the second "SELECT * FROM kidseen" yields zero rows instead of yielding "ERROR: relation "kidseen" does not exist". I changed things around per the attached "test-drop.pl", such that the table is already gone in the ordinary snapshot, but still visible to the imported snapshot. Note how the pg_class row is visible, but an actual attempt to query the table fails. Must some kind of syscache invalidation follow the snapshot alteration to make this behave normally? General tests involving only DML seem to do the right thing. Subtransaction handling looks sound. Patch runs cleanly according to Valgrind. > So thanks again and I'm looking forward to your next review... :-) Hope this one helps, too. > diff --git a/src/backend/storage/ipc/procarray.c > b/src/backend/storage/ipc/procarray.c > index 8b36df4..29c9426 100644 > + bytea * > + ExportSnapshot(Snapshot snapshot) > + { > + int bufsize = 1024; > + int bufsize_filled = 0; /* doesn't include > NUL byte */ > + int bufsize_left = bufsize - bufsize_filled; > + char *buf = (char *) palloc(bufsize); > + int i; > + TransactionId *children; > + int nchildren; > + > + /* In a subtransact
Re: [HACKERS] ToDo List Item - System Table Index Clustering
> >I seem to recall some muttering about teaching genbki to extract such comments from the SGML sources or perhaps the C header files. I tend to agree though that it would be a lot >more work than it's worth. And as you say, pg_description entries aren't free. > I know I can't do all of the work, any submission requires review etc, but it is worth it to me provided it does no harm to the codebase. So the only outstanding question is the impact of increased size. In my experience size increases related to documentation are almost always worth it. So I'm prejudiced right out of the gate. I was wondering if every pg_ table gets copied out to every database .. if there is already a mechanism for not replicating all of them we could utilize views or re-writes rules to merge a single copy of catalog comments in a separate table with each deployed database's pg_descriptions. If all catalog descriptions were handled this way it would actually decrease the size of a deployed database ( by 210K? ) by absorbing the pg_descriptions that are currently being duplicated. Since users shouldn't be messing with them anyway and they are purely for humans to refer to - not computers to calculate explain plans with - there shouldn't be anything inherently wrong with moving static descriptions out of user space. In theory at least. -Simone Aiken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 1: recheck index-based constraints
On Wed, Jan 19, 2011 at 11:50:12PM -0500, Robert Haas wrote: > On Wed, Jan 12, 2011 at 8:56 AM, Robert Haas wrote: > > On Wed, Jan 12, 2011 at 8:14 AM, Noah Misch wrote: > >> Something like the attached? > > > > I haven't really analyzed in this detail, but yes, approximately > > something sorta like that. > > [assessment of current uses] So I think the logic is correct and not overly > complex. Sounds correct to me. > I think what might make sense is - instead of adding another Boolean > argument, change the heap_rebuilt argument to int flags, and define > REINDEX_CHECK_CONSTRAINTS and REINDEX_SUPPRESS_INDEX_USE as OR-able > flags. I think that's more clear about what's actually going on than > heap_rebuit/tuples_changed. There are two distinct questions here: (1) Should reindex_relation receive boolean facts from its callers by way of two boolean arguments, or by way of one flags vector? The former seems best when you want every caller to definitely think about which answer is right, and the latter when that's not so necessary. (For a very long list of options, the flags might be chosen on other grounds.) As framed, I'd lean toward keeping distinct arguments, as these are important questions. However, suppose we inverted both flags, say REINDEX_SKIP_CONSTRAINT_CHECKS and REINDEX_ALLOW_OLD_INDEX_USE. Then, flags = 0 can hurt performance but not correctness. That's looking like a win. (2) Should reindex_relation frame its boolean arguments in terms of what the caller did (heap_rebuilt, tuples_changed), or in terms of what reindex_relation will be doing (check_constraints, suppress_index_use)? The former should be the default approach, but it requires that we be able to frame good names that effectively convey an abstraction. Prospective callers must know what to send just by looking at the name and reading the header comment. When no prospective name is that expressive and you'll end up reading the reindex_relation code to see how they play out, then it's better to go with the latter instead. A bad abstraction is worse than none at all. I agree that both heap_rebuilt and tuples_changed are bad abstractions. TRUNCATE is lying about the former, and the latter, as you say, is never really correct. column_values_changed, perhaps. tuples_could_now_violate_constraints would be correct, but that's just a bad spelling for REINDEX_CHECK_CONSTRAINTS. I guess the equivalent long-winded improvement for heap_rebuilt would be indexes_still_valid_for_snapshotnow. Eh. So yes, let's adopt callee-action-focused names like you propose. Overall, I'd vote for a flags parameter with negative senses like I noted above. My second preference would be for two boolean parameters, check_constraints and suppress_index_use. Not really a big deal to me, though. (I feel a bit silly writing this email.) What do you think? Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sepgsql contrib module
(2011/01/20 13:01), Robert Haas wrote: > 2011/1/19 KaiGai Kohei: >>> And how about adding a >>> ProcessUtility_hook to trap evil non-DML statements that some >>> nefarious user might issues? >>> >> It seems to me reasonable as long as the number of controlled command >> are limited. For example, LOAD command may be a candidate being >> controlled without exceptions. >> However, it will be a tough work, if the plug-in tries to parse and >> analyze supplied utility commands by itself. > > I think the key is to either accept or reject the command based on > very simple criteria - decide based only on the command type, and > ignore its parameters. > I can understand this idea, however, it is hard to implement this criteria, because SELinux describes all the rules as a relationship between a client and object using their label, so we cannot know what actions (typically represented in command tag) are allowed or denied without resolving their object names. >> I uploaded my draft here. >> http://wiki.postgresql.org/wiki/SEPostgreSQL_Documentation >> >> If reasonable, I'll move them into *.sgml style. > > I have yet to review that, but will try to get to it before too much > more time goes by. > OK, I try to translate it into *.sgml format. >> I may want to simplify the step to installation using an installer >> script. > > OK, but let's get this nailed down as soon as possible. Tempus fugit. > I like to give my higher priority on the ProcessUtility_hook, rather than installation script. Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql: Add \dL to show languages
On Wed, Jan 19, 2011 at 11:19 PM, Josh Kupershmidt wrote: > On Wed, Jan 19, 2011 at 9:09 PM, Robert Haas wrote: >> This patch doesn't seem terribly consistent to me - we show the name >> of the call handler and the name of the validator, but for the inline >> handler we just indicate whether there is one or not. That seems like >> something that we should make consistent, and my vote is to show the >> name in all cases. > > OK, changed. I've shuffled the columns slightly so that handlers and > Validator columns are next to each other. > >> Also, I'm wondering whether the System Language column be renamed to >> Internal Language, for consistency with the terminology used here: >> >> http://www.postgresql.org/docs/current/static/catalog-pg-language.html > > Fixed, updated patch attached. Though, reading the description of > lanispl on that page, I wonder if "user-defined language" correctly > describes plpgsql these days, which comes installed by default. OK, committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 1: recheck index-based constraints
On Wed, Jan 12, 2011 at 8:56 AM, Robert Haas wrote: > On Wed, Jan 12, 2011 at 8:14 AM, Noah Misch wrote: >> Something like the attached? > > I haven't really analyzed in this detail, but yes, approximately > something sorta like that. I looked this over some more tonight. The name "tuples_changed" is giving me some angst, because if we rewrote the heap... the tuples changed. I think what you intend this to indicate whether the tuples have changed in a semantic sense, ignoring CTIDs and so on. But it's not even quite that, either, because ExecuteTruncate() is passing false, and the set of tuples has probably changed in that case. It seems that the cases here are: 1. When ALTER TABLE causes a rewrite, we set heap_rebuilt = true and tuples_changed = true. This causes constraints to be revalidated and suppresses use of indexes while the rebuild is in progress. 2. When VACUUM FULL or CLUSTER causes a rewrite, we set heap_rebuilt = true and tuples_changed = false. This avoids constraint revalidation but still suppresses use of indexes while the rebuild is in progress. 3. When TRUNCATE or REINDEX is invoked, we set heap_rebuilt = false and tuples_changed = false. Here we neither revalidate constraints nor suppress use of indexes while the rebuild is in progress. The first question I asked myself is whether the above is actually correct, and whether it's possible to simplify back to two cases. So: The REINDEX case should definitely avoid suppressing the use of the old index while the new one is rebuilt; I'm not really sure it matters what TRUNCATE does, since we're going to be operating on a non-system catalog on which we hold AccessExclusiveLock; the VACUUM FULL/CLUSTER case certainly needs to suppress uses of indexes, because it can be used on system catalogs, which may need to be used during the rebuild itself. Thus #2 and #3 must be distinct. #1 must be distinct from #2 both for performance reasons and to prevent deadlocks when using VACUUM FULL or CLUSTER on a system catalog (ALTER TABLE can't be so used, so it's safe for it to not worry about this problem). So I think the logic is correct and not overly complex. I think what might make sense is - instead of adding another Boolean argument, change the heap_rebuilt argument to int flags, and define REINDEX_CHECK_CONSTRAINTS and REINDEX_SUPPRESS_INDEX_USE as OR-able flags. I think that's more clear about what's actually going on than heap_rebuit/tuples_changed. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote: > If we don't do something like this, do we just provide REPEATABLE > READ on the standby as the strictest level of transaction isolation? > If so, do we generate an error on a request for SERIALIZABLE, warn > and provide degraded behavior, or just quietly give them REPEATABLE > READ behavior? > > Thoughts? Hopefully there is a better option available. We don't want to silently give wrong results. Maybe we should bring back the compatibility GUC? It could throw an error unless the user sets the compatibility GUC to turn "serializable" into "repeatable read". Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Wed, Jan 19, 2011 at 9:37 PM, Magnus Hagander wrote: >> Great. Thanks for the quick update! >> >> Here are another comments: Here are comments against the documents. The other code looks good. It's helpful to document what to set to allow pg_basebackup connection. That is not only the REPLICATION privilege but also max_wal_senders and pg_hba.conf. + + Options Can we list the descriptions of option in the same order as "pg_basebackup --help" does? It's helpful to document that the target directory must be specified and it must be empty. + + The backup will include all files in the data directory and tablespaces, + including the configuration files and any additional files placed in the + directory by third parties. Only regular files and directories are allowed + in the data directory, no symbolic links or special device files. The latter sentence means that the backup of the database cluster created by initdb -X is not supported? Because the symlink to the actual WAL directory is included in it. OTOH, I found the following source code comments: + * Receive a tar format stream from the connection to the server, and unpack + * the contents of it into a directory. Only files, directories and + * symlinks are supported, no other kinds of special files. This says that symlinks are supported. Which is true? Is the symlink supported only in tar format? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql: Add \dL to show languages
On Wed, Jan 19, 2011 at 9:09 PM, Robert Haas wrote: > This patch doesn't seem terribly consistent to me - we show the name > of the call handler and the name of the validator, but for the inline > handler we just indicate whether there is one or not. That seems like > something that we should make consistent, and my vote is to show the > name in all cases. OK, changed. I've shuffled the columns slightly so that handlers and Validator columns are next to each other. > Also, I'm wondering whether the System Language column be renamed to > Internal Language, for consistency with the terminology used here: > > http://www.postgresql.org/docs/current/static/catalog-pg-language.html Fixed, updated patch attached. Though, reading the description of lanispl on that page, I wonder if "user-defined language" correctly describes plpgsql these days, which comes installed by default. Josh diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 5f61561..30d4507 100644 *** a/doc/src/sgml/ref/psql-ref.sgml --- b/doc/src/sgml/ref/psql-ref.sgml *** testdb=> *** 1249,1254 --- 1249,1269 + + \dL[S+] [ pattern ] + + + Lists all procedural languages. If pattern + is specified, only languages whose names match the pattern are listed. + By default, only user-created languages + are shown; supply the S modifier to include system + objects. If + is appended to the command name, each + language is listed with its call handler, validator, access privileges, + and whether it is a system object. + + + \dn[S+] [ pattern ] diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 962c13c..301dc11 100644 *** a/src/bin/psql/command.c --- b/src/bin/psql/command.c *** exec_command(const char *cmd, *** 416,421 --- 416,424 case 'l': success = do_lo_list(); break; + case 'L': + success = listLanguages(pattern, show_verbose, show_system); + break; case 'n': success = listSchemas(pattern, show_verbose, show_system); break; diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index 205190f..e1db4c0 100644 *** a/src/bin/psql/describe.c --- b/src/bin/psql/describe.c *** listTables(const char *tabtypes, const c *** 2566,2571 --- 2566,2638 } + /* \dL + * + * Describes Languages. + */ + bool + listLanguages(const char *pattern, bool verbose, bool showSystem) + { + PQExpBufferData buf; + PGresult *res; + printQueryOpt myopt = pset.popt; + + initPQExpBuffer(&buf); + + printfPQExpBuffer(&buf, + "SELECT l.lanname AS \"%s\",\n", + gettext_noop("Name")); + if (pset.sversion >= 80300) + appendPQExpBuffer(&buf, + " pg_catalog.pg_get_userbyid(l.lanowner) as \"%s\",\n", + gettext_noop("Owner")); + + appendPQExpBuffer(&buf, + " l.lanpltrusted AS \"%s\"", + gettext_noop("Trusted")); + + if (verbose) + { + appendPQExpBuffer(&buf, + ",\n NOT l.lanispl AS \"%s\",\n" + " l.lanplcallfoid::regprocedure AS \"%s\",\n" + " l.lanvalidator::regprocedure AS \"%s\",\n ", + gettext_noop("Internal Language"), + gettext_noop("Call Handler"), + gettext_noop("Validator")); + if (pset.sversion >= 9) + appendPQExpBuffer(&buf, "l.laninline::regprocedure AS \"%s\",\n ", + gettext_noop("Inline Handler")); + printACLColumn(&buf, "l.lanacl"); + } + + appendPQExpBuffer(&buf, + "\nFROM pg_catalog.pg_language l\n"); + + processSQLNamePattern(pset.db, &buf, pattern, false, false, + NULL, "l.lanname", NULL, NULL); + + if (!showSystem && !pattern) + appendPQExpBuffer(&buf, "WHERE lanplcallfoid != 0\n"); + + appendPQExpBuffer(&buf, "ORDER BY 1;"); + + res = PSQLexec(buf.data, false); + termPQExpBuffer(&buf); + if (!res) + return false; + + myopt.nullPrint = NULL; + myopt.title = _("List of languages"); + myopt.translate_header = true; + + printQuery(res, &myopt, pset.queryFout, pset.logfile); + + PQclear(res); + return true; + } + + /* * \dD * diff --git a/src/bin/psql/describe.h b/src/bin/psql/describe.h index 2029ef8..4e80bcf 100644 *** a/src/bin/psql/describe.h --- b/src/bin/psql/describe.h *** extern bool listUserMappings(const char *** 84,88 --- 84,90 /* \det */ extern bool listForeignTables(const char *pattern, bool verbose); + /* \dL */ + extern bool listLanguages(const char *pattern, bool verbose, bool showSystem); #endif /* DESCRIBE_H */ diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c index 96c85a2..bd5c4b7 100644 *** a/src/bin/psql/help.c --- b/src/bin/psql/help.c *** slashUsage(unsigned short int pager) *** 211,216 --- 211,217 fprintf(output, _(" \\dg[+] [PATTERN
Re: [HACKERS] estimating # of distinct values
On Wed, Jan 19, 2011 at 6:35 PM, Florian Pflug wrote: > On Jan20, 2011, at 02:41 , Nathan Boley wrote: If you think about it, it's a bit ridiculous to look at the whole table *just* to "estimate" ndistinct - if we go that far why dont we just store the full distribution and be done with it? >>> >>> - the best you could do is to average the >>> individual probabilities which gives ... well, 1/ndistinct. >> >> That is certainly *not* the best you could do in every case. The mean >> is only the best estimate in L2, which is definitely not the case >> here. > > No, not in every case. But on average it comes out best, no? In the sense of minimizing the average plan cost over all values? Definitely not. Consider the trivial case where a bitmap scan is the cost of a sequential scan + epsilon. > >> Consider a table with 10K values, 9,990 of which are 1 and the rest of >> which are 2, 3, ..., 10, versus a table that has the same 10 distinct >> values evenly distributed. For a simple equality query, in the first >> case, a bitmap scan might be best. In the second case, a sequential >> scan would always be best. > > True. But selectivity estimates alone don't show the difference there. Yet the full distribution would - supporting my argument that even in cases where we dont know a specific value, the full distribution is more informative. > > Also, in my personal experience this isn't really the area we've got > problems now. The problem cases for me always were queries with a rather > large number of joins (7 to 10 or so) plus rather complex search > conditions. The join order, not the access strategy, then becomes the > deciding factor in plan performance. And the join order *is* driven purely > off the selectivity estimates, unlike the access strategy which even today > takes other factors into account (like clusteredness, I believe) > I think I'm losing you. Why would ndistinct be complete sufficient for estimating the optimal join order? >> This is precisely the point I was trying to make in my email: the loss >> function is very complicated. Improving the ndistinct estimator could >> significantly improve the estimates of ndistinct ( in the quadratic >> loss sense ) while only marginally improving the plans. > > > The point of this exercise isn't really to improve the ndistinct estimates > for single columns. Rather, we'd like to get ndistinct estimates for > groups of columns because that allows us to use the uniform bayesian > approach to multi-column selectivity estimation that Tomas found literature > on. Which on the whole seems like it *does* have a chance of greatly > improving the plans in some cases. We could, of course, estimate multi-column > ndistinct the same way we estimate single-column ndistincts, but quite a few > people raised concerns that this wouldn't work due to the large error our > current ndistinct estimates have. Sure. But estimating multi-column ndistinct is surely not the only way to approach this problem. The comment I made, which you objected to, was that it's silly to look at the whole table and then throw away all information *except* ndistinct. If we really think that looking at the whole table is the best approach, then shouldn't we be able to get better statistics? Is this really an open question? -Nathan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Include WAL in base backup
Magnus, * Stephen Frost (sfr...@snowman.net) wrote: > mkay, I'm not going to try to make this ready for committer, but will > provide my comments on it overall. Bit difficult to review when someone > else is reviewing the base patch too. :/ I went ahead and marked it as 'waiting on author', since I'd like feedback on my comments, but I'll try to make time in the next few days to apply the two patches and test it out, unless I hear otherwise. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Include WAL in base backup
Greetings, * Magnus Hagander (mag...@hagander.net) wrote: > For now, you need to set wal_keep_segments to make it work properly, > but if you do the idea is that the tar file/stream generated in the > base backup will include all the required WAL files. Is there some reason to not ERROR outright if we're asked to provide WAL and wal_keep_segments isn't set..? I'd rather do that than only ERROR when a particular WAL is missing.. That could lead to transient backup errors that an inexperienced sysadmin or DBA might miss or ignore. They'll notice if it doesn't work the first time they try it and spits out a hint about wal_keep_segments. > That means that > you can start a postmaster directly against the directory extracted > from the tarball, and you no longer need to set up archive logging to > get a backup. Like that part. > I've got some refactoring I want to do around the > SendBackupDirectory() function after this, but a review of the > functionality first would be good. And obviously, documentation is > still necessary. mkay, I'm not going to try to make this ready for committer, but will provide my comments on it overall. Bit difficult to review when someone else is reviewing the base patch too. :/ Here goes: - I'm not a huge fan of the whole 'closetar' option, that feels really rather wrong to me. Why not just open it and close it in perform_base_backup(), unconditionally? - I wonder if you're not getting to a level where you shold be using a struct to pass the relevant information to perform_base_backup() instead of adding more arguments on.. That's going to get unwieldy at some point. - Why not initialize logid and logseg like so?: int logid = startptr.xlogid; int logseg = startptr.xrecoff / XLogSegSize; Then use those in your elog? Seems cleaner to me. - A #define right in the middle of a while loop...? Really? - The grammar changes strike me as.. odd. Typically, you would have an 'option' production that you can then have a list of and then let each option be whatever the OR'd set of options is. Wouldn't the current grammar require that you put the options in a specific order? That'd blow. @@ -687,7 +690,7 @@ BaseBackup() * once since it can be relocated, and it will be checked before we do * anything anyway. */ - if (basedir != NULL && i > 0) + if (basedir != NULL && !PQgetisnull(res, i, 1)) verify_dir_is_empty_or_create(PQgetvalue(res, i, 1)); } - Should the 'i > 0' conditional above still be there..? So, that's my review from just reading the source code and the thread.. Hope it's useful, sorry it's not more. :/ Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] REVIEW: EXPLAIN and nfiltered
On Wed, Jan 19, 2011 at 10:16 PM, Stephen Frost wrote: > This patch looked good, in general, to me. I added a few documentation > updates and a comment, but it's a very straight-forward patch as far as > I can tell. Passes all regressions and my additional testing. I have not looked at the code for this patch at all as yet, but just as a general user comment - I really, really want this. It's one of about, uh, two pieces of information that the EXPLAIN output doesn't give you that is incredibly important for troubleshooting. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sepgsql contrib module
2011/1/19 KaiGai Kohei : >> And how about adding a >> ProcessUtility_hook to trap evil non-DML statements that some >> nefarious user might issues? >> > It seems to me reasonable as long as the number of controlled command > are limited. For example, LOAD command may be a candidate being > controlled without exceptions. > However, it will be a tough work, if the plug-in tries to parse and > analyze supplied utility commands by itself. I think the key is to either accept or reject the command based on very simple criteria - decide based only on the command type, and ignore its parameters. > I uploaded my draft here. > http://wiki.postgresql.org/wiki/SEPostgreSQL_Documentation > > If reasonable, I'll move them into *.sgml style. I have yet to review that, but will try to get to it before too much more time goes by. > I may want to simplify the step to installation using an installer > script. OK, but let's get this nailed down as soon as possible. Tempus fugit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sepgsql contrib module
(2011/01/20 12:10), Robert Haas wrote: > 2011/1/5 KaiGai Kohei: >> How about feasibility to merge this 4KL chunks during the rest of >> 45 days? I think we should decide this general direction at first. > > I read through this code tonight and it looks pretty straightforward. > I don't see much reason not to accept this more or less as-is. I'm a > bit suspicious of this line: > > { "translation",SEPG_PROCESS__TRANSITION }, > > I can't help wondering based on the rest of the table whether you > intend to have the same word on that line twice, but you don't. On a > related note, would it make sense to pare down this table to the > entries that are actually used at the moment? Sorry for giving you a confusion. It was my typo, so should be fixed as: { "transition",SEPG_PROCESS_TRANSITION }, This permission shall be checked when a security label of client being switched from X to Y, typically on execution of trusted-procedure. Also, I missed to check on sepgsql_needs_fmgr_hook(). We don't want to allow inlining the function on lack of permission. I'll fix them soon. > And how about adding a > ProcessUtility_hook to trap evil non-DML statements that some > nefarious user might issues? > It seems to me reasonable as long as the number of controlled command are limited. For example, LOAD command may be a candidate being controlled without exceptions. However, it will be a tough work, if the plug-in tries to parse and analyze supplied utility commands by itself. > One other problem is that you need to work on your whitespace a bit. > I believe in a few places you have a mixture of tabs and spaces. More > seriously, pgindent is going to completely mangle things like this: > > /* > * sepgsql_mode > * > * SEPGSQL_MODE_DISABLED: Disabled on runtime > * SEPGSQL_MODE_DEFAULT : Same as system settings > * SEPGSQL_MODE_PERMISSIVE : Always permissive mode > * SEPGSQL_MODE_INTERNAL: Same as SEPGSQL_MODE_PERMISSIVE, > *except for > no audit prints > */ > > You have to write it with a line of dashes on the first and last > lines, if you don't want it reformatted as a paragraph. It might be > worth actually running pgindent over contrib/selinux and then check > over the results. > OK, I'll try to run pgindent to confirm the effect of reformat. > Finally, we need to work on the documentation. > I uploaded my draft here. http://wiki.postgresql.org/wiki/SEPostgreSQL_Documentation If reasonable, I'll move them into *.sgml style. I may want to simplify the step to installation using an installer script. > But overall, it looks pretty good, IMHO. > Thanks for your reviewing in spite of a large patch. -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] REVIEW: EXPLAIN and nfiltered
Greetings, On 2010-01-15 11:37 PM +200, Marko Tiikkaja wrote: > On 2010-11-18 5:45 PM +0200, Marko Tiikkaja wrote: > > Here's a patch for showing in EXPLAIN ANALYZE the number of rows a plan > > qual filtered from a node's input. > > Rebased against master. This patch looked good, in general, to me. I added a few documentation updates and a comment, but it's a very straight-forward patch as far as I can tell. Passes all regressions and my additional testing. commit fac899f7967ce74e14a90af9ca24e1a1f5a580e7 Author: Stephen Frost Date: Wed Jan 19 22:14:54 2011 -0500 Fix < & > in docs to be < >, as required. commit 5fcdb75a646912b8b273703caf33dadb80122e1c Author: Stephen Frost Date: Wed Jan 19 22:05:05 2011 -0500 Update documentation for EXPLAIN ANALYZE/nfiltered This patch updates some documentation around EXPLAIN ANALYZE, whose output has been changed by the patch which added nfiltered to it. Also added a comment in the only place that seemed to need one. commit 9ebb0108a217c2d3b7f815d1d902d6bdcc276104 Author: Stephen Frost Date: Wed Jan 19 21:33:28 2011 -0500 Add nfiltered in EXPLAIN ANALYZE This patch add the number of rows a plan qual filtered from a node's input to the EXPLAIN ANALYZE output. Patch by: Marko Tiikkaja Thanks, Stephen *** a/doc/src/sgml/ref/explain.sgml --- b/doc/src/sgml/ref/explain.sgml *** *** 335,348 EXPLAIN (COSTS FALSE) SELECT * FROM foo WHERE i = 4; function: ! EXPLAIN SELECT sum(i) FROM foo WHERE i < 10; ! ! QUERY PLAN ! - ! Aggregate (cost=23.93..23.93 rows=1 width=4) !-> Index Scan using fi on foo (cost=0.00..23.92 rows=6 width=4) ! Index Cond: (i < 10) ! (3 rows) --- 335,352 function: ! CREATE TABLE test (id integer primary key, bar integer, foo integer); ! ! EXPLAIN SELECT sum(foo) FROM test WHERE id < 10; ! !QUERY PLAN ! ! Aggregate (cost=28.97..28.98 rows=1 width=4) !-> Bitmap Heap Scan on test (cost=9.26..27.35 rows=647 width=4) ! Recheck Cond: (id < 10) ! -> Bitmap Index Scan on test_pkey (cost=0.00..9.10 rows=647 width=0) !Index Cond: (id < 10) ! (5 rows) *** *** 351,369 EXPLAIN SELECT sum(i) FROM foo WHERE i < 10; display the execution plan for a prepared query: PREPARE query(int, int) AS SELECT sum(bar) FROM test WHERE id > $1 AND id < $2 GROUP BY foo; EXPLAIN ANALYZE EXECUTE query(100, 200); !QUERY PLAN ! - ! HashAggregate (cost=39.53..39.53 rows=1 width=8) (actual time=0.661..0.672 rows=7 loops=1) !-> Index Scan using test_pkey on test (cost=0.00..32.97 rows=1311 width=8) (actual time=0.050..0.395 rows=99 loops=1) ! Index Cond: ((id > $1) AND (id < $2)) ! Total runtime: 0.851 ms ! (4 rows) --- 355,377 display the execution plan for a prepared query: + CREATE TABLE test (id integer primary key, bar integer, foo integer); + PREPARE query(int, int) AS SELECT sum(bar) FROM test WHERE id > $1 AND id < $2 GROUP BY foo; EXPLAIN ANALYZE EXECUTE query(100, 200); ! QUERY PLAN ! ! HashAggregate (cost=14.98..15.01 rows=2 width=8) (actual time=0.045..0.045 rows=0 filtered=0 loops=1) !-> Bitmap Heap Scan on test (cost=4.35..14.93 rows=10 width=8) (actual time=0.041..0.041 rows=0 filtered=0 loops=1) ! Recheck Cond: ((id > $1) AND (id < $2)) ! -> Bitmap Index Scan on test_pkey (cost=0.00..4.35 rows=10 width=0) (actual time=0.035..0.035 rows=0 filtered=0 loops=1) !Index Cond: ((id > $1) AND (id < $2)) ! Total runtime: 0.118 ms ! (6 rows) *** a/src/backend/commands/explain.c --- b/src/backend/commands/explain.c *** *** 975,992 ExplainNode(PlanState *planstate, List *ancestors, double startup_sec = 1000.0 * planstate->instrument->startup / nloops; double total_sec = 1000.0 * planstate->instrument->total / nloops; double rows = planstate->instrument->ntuples / nloops; if (es->format == EXPLAIN_FORMAT_TEXT) { appendStringInfo(es->str, ! " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)", ! startup_sec, total_sec, rows, nloops); } else { ExplainPropertyFloat("Actual Startup Time", startu
Re: [HACKERS] ALTER TABLE ... REPLACE WITH
On Wed, Jan 19, 2011 at 9:44 PM, Simon Riggs wrote: > Noah's patch is trivial, as are the changes to make mine work fully. I dispute that. In particular: + /* +* Exchange table contents +* +* Swap heaps, toast tables, toast indexes +* all forks +* all indexes +* +* Checks: +* * table definitions must match +* * constraints must match +* * indexes need not match +* * outbound FKs don't need to match +* * inbound FKs will be set to not validated +* +* No Trigger behaviour +* +* How is it WAL logged? By locks and underlying catalog updates +*/ That's another way of saying "the patch is not anywhere close to being done". > Neither can be achieved barring sensible review. I think Noah posted a very nice review. > This topic delivers important functionality. I think it's more important > than simply who gets the credit. This is not about credit. I like credit as much as the next guy, but this is about the fact that there was a deadline for this CommitFest, and that deadline is now in the past, and this patch is not in a state to be reviewed. The CommitFest deadline is not a deadline by which you much post something; it's a deadline by which you much post something that is reasonably close to being committable, or at least reviewable. That's obviously not the case here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sepgsql contrib module
2011/1/5 KaiGai Kohei : > How about feasibility to merge this 4KL chunks during the rest of > 45 days? I think we should decide this general direction at first. I read through this code tonight and it looks pretty straightforward. I don't see much reason not to accept this more or less as-is. I'm a bit suspicious of this line: { "translation",SEPG_PROCESS__TRANSITION }, I can't help wondering based on the rest of the table whether you intend to have the same word on that line twice, but you don't. On a related note, would it make sense to pare down this table to the entries that are actually used at the moment? And how about adding a ProcessUtility_hook to trap evil non-DML statements that some nefarious user might issues? One other problem is that you need to work on your whitespace a bit. I believe in a few places you have a mixture of tabs and spaces. More seriously, pgindent is going to completely mangle things like this: /* * sepgsql_mode * * SEPGSQL_MODE_DISABLED: Disabled on runtime * SEPGSQL_MODE_DEFAULT : Same as system settings * SEPGSQL_MODE_PERMISSIVE : Always permissive mode * SEPGSQL_MODE_INTERNAL: Same as SEPGSQL_MODE_PERMISSIVE, *except for no audit prints */ You have to write it with a line of dashes on the first and last lines, if you don't want it reformatted as a paragraph. It might be worth actually running pgindent over contrib/selinux and then check over the results. Finally, we need to work on the documentation. But overall, it looks pretty good, IMHO. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Simon Riggs wrote: > I gave you a quick response to let you know that HS need not be a > blocker, for this release. If you are saying you have knowingly > ignored a requirement for a whole year, then I am shocked. How > exactly did you think this would ever be committed? I was asked not to discuss this effort on list for most of that time, and while it was on the Wiki page, I just lost track of it -- not maliciously or intentionally. I really apologize. By the time the 9.0 release was out and it was deemed OK for me to discuss things, I started getting feedback on problems which needed response, and I got into the mode of reacting to that rather than ticking through my issues list. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
* Simon Riggs (si...@2ndquadrant.com) wrote: > I gave you a quick response to let you know that HS need not be a > blocker, for this release. If you are saying you have knowingly ignored > a requirement for a whole year, then I am shocked. How exactly did you > think this would ever be committed? Erm, to be perfectly honest, I think the answer is probably "I was busy.", and "no one provided any feedback on *how* to deal with it." Given the amount of work that Kevin's put into this patch (which has been beyond impressive, imv), I have a hard time finding fault with him not getting time to implement a solution for Hot Standby for this. As you say, it's not a blocker, I agree completely with that, regardless of when it was identified as an issue. What we're talking about is right now, and right now is too late to fix it for HS, and to be perfectly frank, fixing it for HS isn't required or even a terribly important factor in if it should be committed. I'll refrain from casting stones about issues brought up nearly a year ago on certain other patches which are apparently not going to include what I, at least, consider extremely important to PG acceptance by others. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] SSI and Hot Standby
Robert Haas wrote: > Kevin Grittner wrote: >> I agree it's pretty late in the cycle, but I'm going through all >> the loose ends and found this one -- which has been hanging out on >> the Wiki page as an R&D item for over a full year without >> discussion. :-( If we provide the snapshots (which we can safely >> and easily do), can someone else who knows what they're doing with >> WAL and HS get the rest of it safely into the release? That seems >> to me to be the only way it can still happen for 9.1. > > I think it's way too late to be embarking on what will probably > turn out to be a reasonably complex and possibly controversial new > development arc. I don't have a strong position on what we should > do instead, but let's NOT do that. If that can't reasonably be done for 9.1, well, my next sentence was: >> If not, I agree this can be 9.2 material. It'd be sweet if it could still happen 9.1, but hardly a shock if it can't. I didn't want to presume to make the call. Like I said at the start, the alternative is to decide how noisy we want to be about providing snapshot isolation on hot standbys when SERIALIZABLE is requested, and figuring out where to document it. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Wed, 2011-01-19 at 19:34 -0600, Kevin Grittner wrote: > I agree it's pretty late in the cycle, but I'm going through all the > loose ends and found this one -- which has been hanging out on the > Wiki page as an R&D item for over a full year without discussion. > :-( If we provide the snapshots (which we can safely and easily > do), can someone else who knows what they're doing with WAL and HS > get the rest of it safely into the release? That seems to me to be > the only way it can still happen for 9.1. I gave you a quick response to let you know that HS need not be a blocker, for this release. If you are saying you have knowingly ignored a requirement for a whole year, then I am shocked. How exactly did you think this would ever be committed? -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: "writable CTEs" - doc patch
I think that a major goal of the DocBook format is that it separates content from presentation, so whatever tool is used to render that content as HTML for .org isn't necessarily publicly available. -- Regards, Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... REPLACE WITH
On Wed, 2011-01-19 at 21:01 -0500, Robert Haas wrote: > On Wed, Jan 19, 2011 at 8:57 PM, Noah Misch wrote: > > I think Simon was referring to the proof-of-concept sketch I had included > > with > > my review. > > I think it's a bit late to be turning proofs-of-concept into code at > this point, no matter who came up with them. Noah's patch is trivial, as are the changes to make mine work fully. Neither can be achieved barring sensible review. This topic delivers important functionality. I think it's more important than simply who gets the credit. I'm not sure yet where to go, but we have viable options yet for this release. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
On Wed, Jan 19, 2011 at 9:35 PM, Florian Pflug wrote: > Also, in my personal experience this isn't really the area we've got > problems now. The problem cases for me always were queries with a rather > large number of joins (7 to 10 or so) plus rather complex search > conditions. The join order, not the access strategy, then becomes the > deciding factor in plan performance. This certainly matches my experience (except sometimes with more joins). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
On Wed, Jan 19, 2011 at 5:13 PM, Tomas Vondra wrote: >>> Regarding the crash scenario - if the commit fails, just throw away the >>> local estimator copy, it's not needed. I'm not sure how to take care of >>> the case when commit succeeds and the write of the merged estimator >>> fails, but I think it might be possible to write the estimator to xlog >>> or something like that. So it would be replayed during recovery etc. Or >>> is it a stupid idea? >> >> It's not stupid, in the sense that that is what you'd need to do if >> you want to avoid ever having to rescan the table. But it is another >> thing that I think is going to be way too expensive. > > Way too expensive? All you need to put into the logfile is a copy of the > estimator, which is a few kBs. How is that 'way too expensive'? At this point, this is all a matter of religion, right? Neither of us has a working implementation we've benchmarked. But yes, I believe you're going to find that implementing some kind of streaming estimator is going to impose a... 6% performance penalty, even after you've optimized the living daylights out of it. Now you might say... big deal, it improves my problem queries by 100x. OK, but if you could get the same benefit by doing an occasional full table scan during off hours, you could have the same performance with a *0%* performance penalty. Even better, the code changes would be confined to ANALYZE rather than spread out all over the system, which has positive implications for robustness and likelihood of commit. I'm not trying to argue you out of working on this. It's obviously your time to spend, and if works better than I think it will, great! I'm merely offering you an opinion on what will probably happen if you go this route - namely, it'll carry an unpalatable run-time penalty. That opinion may be worth no more than what you paid for it, but there you have it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
On Jan20, 2011, at 02:41 , Nathan Boley wrote: >>> If you think about it, it's a bit ridiculous to look at the whole table >>> *just* to "estimate" ndistinct - if we go that far why dont we just >>> store the full distribution and be done with it? >> >> - the best you could do is to average the >> individual probabilities which gives ... well, 1/ndistinct. > > That is certainly *not* the best you could do in every case. The mean > is only the best estimate in L2, which is definitely not the case > here. No, not in every case. But on average it comes out best, no? > Consider a table with 10K values, 9,990 of which are 1 and the rest of > which are 2, 3, ..., 10, versus a table that has the same 10 distinct > values evenly distributed. For a simple equality query, in the first > case, a bitmap scan might be best. In the second case, a sequential > scan would always be best. True. But selectivity estimates alone don't show the difference there. Also, in my personal experience this isn't really the area we've got problems now. The problem cases for me always were queries with a rather large number of joins (7 to 10 or so) plus rather complex search conditions. The join order, not the access strategy, then becomes the deciding factor in plan performance. And the join order *is* driven purely off the selectivity estimates, unlike the access strategy which even today takes other factors into account (like clusteredness, I believe) > This is precisely the point I was trying to make in my email: the loss > function is very complicated. Improving the ndistinct estimator could > significantly improve the estimates of ndistinct ( in the quadratic > loss sense ) while only marginally improving the plans. The point of this exercise isn't really to improve the ndistinct estimates for single columns. Rather, we'd like to get ndistinct estimates for groups of columns because that allows us to use the uniform bayesian approach to multi-column selectivity estimation that Tomas found literature on. Which on the whole seems like it *does* have a chance of greatly improving the plans in some cases. We could, of course, estimate multi-column ndistinct the same way we estimate single-column ndistincts, but quite a few people raised concerns that this wouldn't work due to the large error our current ndistinct estimates have. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: "writable CTEs" - doc patch
On Wed, Jan 19, 2011 at 9:13 PM, Stephen Frost wrote: > Greetings, > > * Peter Geoghegan (peter.geoghega...@gmail.com) wrote: >> I do this all the time. Anyway, I intend for this doc patch to be >> backported to 8.4 as a bugfix, which is part of the reason why it >> isn't invasive - it's just a clarification. Clearly if it makes sense >> for 9.1, it makes just as much sense for 9.0 and 8.4. > > I agree with the patch, in general, as well as the recommendation to > back-port it. I reviewed it and didn't find any issues (though I > couldn't figure out the right magic things to install to actually build > the docs.. :( ). The only minor change I made was to capitalize Common > Table Expressions (having it as an acronym w/o capitalizing the full > name seemed odd to me..). > > Updated patch attached. Marking as ready for committer. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python refactoring
On 20/01/11 01:26, Jan Urbański wrote: > On 19/01/11 10:57, Jan Urbański wrote: >> On 18/01/11 23:22, Peter Eisentraut wrote: >>> #2: It looks like this loses some information/formatting in the error >>> message. Should we keep the pointing arrow there? > >>> CONTEXT: PL/Python function "sql_syntax_error" >>> -ERROR: syntax error at or near "syntax" >>> -LINE 1: syntax error >>> -^ >>> -QUERY: syntax error >>> +ERROR: PL/Python: plpy.SPIError: syntax error at or near "syntax" >>> CONTEXT: PL/Python function "sql_syntax_error" > >> Yes, the message is less informative, because the error is reported by >> PL/Python, by wrapping the SPI message. I guess I could try to extract >> more info from the caught ErrorData and put it in the new error that >> PL/Python throws. > > All right, I found a way to shoehorn the extra information into > SPIException, I'll post a new patch series with what's left of the > general refactoring patch soon. Here's an updated patch series for PL/Python refactoring. It was 16 patches at first, 8 are committed, 1 got dropped, so we're down to 7. Cheers, Jan >From af0a83a4ff0356fe5b172b6c60692953d7e01bf0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jan=20Urba=C5=84ski?= Date: Sun, 2 Jan 2011 11:42:54 +0100 Subject: [PATCH 7/7] Do not prefix error messages with the string "PL/Python: ". It is redundant, given the error context. Also, call pg_verifymbstr outside of the try/catch block, because it can lead to errorneously reporting a Python error when it is in fact a Postgres error. --- src/pl/plpython/expected/plpython_do.out |2 +- src/pl/plpython/expected/plpython_error.out | 20 ++-- src/pl/plpython/expected/plpython_test.out|2 +- src/pl/plpython/expected/plpython_types.out |2 +- src/pl/plpython/expected/plpython_types_3.out |2 +- src/pl/plpython/plpython.c| 13 ++--- 6 files changed, 20 insertions(+), 21 deletions(-) diff --git a/src/pl/plpython/expected/plpython_do.out b/src/pl/plpython/expected/plpython_do.out index 9de261a..a21b088 100644 --- a/src/pl/plpython/expected/plpython_do.out +++ b/src/pl/plpython/expected/plpython_do.out @@ -2,5 +2,5 @@ DO $$ plpy.notice("This is plpythonu.") $$ LANGUAGE plpythonu; NOTICE: This is plpythonu. CONTEXT: PL/Python anonymous code block DO $$ nonsense $$ LANGUAGE plpythonu; -ERROR: PL/Python: NameError: global name 'nonsense' is not defined +ERROR: NameError: global name 'nonsense' is not defined CONTEXT: PL/Python anonymous code block diff --git a/src/pl/plpython/expected/plpython_error.out b/src/pl/plpython/expected/plpython_error.out index 87225f2..ea4a54c 100644 --- a/src/pl/plpython/expected/plpython_error.out +++ b/src/pl/plpython/expected/plpython_error.out @@ -8,9 +8,9 @@ CREATE FUNCTION sql_syntax_error() RETURNS text 'plpy.execute("syntax error")' LANGUAGE plpythonu; SELECT sql_syntax_error(); -WARNING: PL/Python: plpy.SPIError: unrecognized error in PLy_spi_execute_query +WARNING: plpy.SPIError: unrecognized error in PLy_spi_execute_query CONTEXT: PL/Python function "sql_syntax_error" -ERROR: PL/Python: plpy.SPIError: syntax error at or near "syntax" +ERROR: plpy.SPIError: syntax error at or near "syntax" LINE 1: syntax error ^ QUERY: syntax error @@ -22,7 +22,7 @@ CREATE FUNCTION exception_index_invalid(text) RETURNS text 'return args[1]' LANGUAGE plpythonu; SELECT exception_index_invalid('test'); -ERROR: PL/Python: IndexError: list index out of range +ERROR: IndexError: list index out of range CONTEXT: PL/Python function "exception_index_invalid" /* check handling of nested exceptions */ @@ -32,9 +32,9 @@ CREATE FUNCTION exception_index_invalid_nested() RETURNS text return rv[0]' LANGUAGE plpythonu; SELECT exception_index_invalid_nested(); -WARNING: PL/Python: plpy.SPIError: unrecognized error in PLy_spi_execute_query +WARNING: plpy.SPIError: unrecognized error in PLy_spi_execute_query CONTEXT: PL/Python function "exception_index_invalid_nested" -ERROR: PL/Python: plpy.SPIError: function test5(unknown) does not exist +ERROR: plpy.SPIError: function test5(unknown) does not exist LINE 1: SELECT test5('foo') ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. @@ -54,9 +54,9 @@ return None ' LANGUAGE plpythonu; SELECT invalid_type_uncaught('rick'); -WARNING: PL/Python: plpy.SPIError: unrecognized error in PLy_spi_prepare +WARNING: plpy.SPIError: unrecognized error in PLy_spi_prepare CONTEXT: PL/Python function "invalid_type_uncaught" -ERROR: PL/Python: plpy.SPIError: type "test" does not exist +ERROR: plpy.SPIError: type "test" does not exist CONTEXT: PL/Python function "invalid_type_uncaught" /* for what it's worth catch the exception generated by * the typo, and return None @@ -77,7 +77,7 @@ return None ' LANGUAGE plpythonu; SELECT invalid_type_caught('rick'); -WARNING: PL/Python: plp
[HACKERS] REVIEW: "writable CTEs" - doc patch
Greetings, * Peter Geoghegan (peter.geoghega...@gmail.com) wrote: > I do this all the time. Anyway, I intend for this doc patch to be > backported to 8.4 as a bugfix, which is part of the reason why it > isn't invasive - it's just a clarification. Clearly if it makes sense > for 9.1, it makes just as much sense for 9.0 and 8.4. I agree with the patch, in general, as well as the recommendation to back-port it. I reviewed it and didn't find any issues (though I couldn't figure out the right magic things to install to actually build the docs.. :( ). The only minor change I made was to capitalize Common Table Expressions (having it as an acronym w/o capitalizing the full name seemed odd to me..). Updated patch attached. Marking as ready for committer. commit 91e9e9285752c9fbe0c222708a10a301731594c8 Author: Stephen Frost Date: Wed Jan 19 20:56:44 2011 -0500 Update WITH documentation to capitalize acronym Common Table Expressions, being a 'proper' name and having an acronym associated with them, really should be capitalized. This patch makes that change in the WITH documentation. commit 9e4565cc97b81fd6b3f96d8e346bcb7ee6e8878e Author: Stephen Frost Date: Wed Jan 19 20:54:47 2011 -0500 Add CTE as an acryonym and clarify WITH docs This patch adds CTE to the list of acronyms and then updates the WITH documentation to note that WITH queries are also known as CTEs. Patch by Peter Geoghegan Thanks, Stephen *** a/doc/src/sgml/acronyms.sgml --- b/doc/src/sgml/acronyms.sgml *** *** 99,104 --- 99,113 + CTE + + + Common Table Expression + + + + + CVE *** a/doc/src/sgml/queries.sgml --- b/doc/src/sgml/queries.sgml *** *** 1525,1531 SELECT select_list FROM table_expression ! WITH Queries WITH --- 1525,1531 ! WITH Queries (Common Table Expressions) WITH *** *** 1539,1545 SELECT select_list FROM table_expression WITH provides a way to write subqueries for use in a larger !query. The subqueries can be thought of as defining temporary tables that exist just for this query. One use of this feature is to break down complicated queries into simpler parts. An example is: --- 1539,1546 WITH provides a way to write subqueries for use in a larger !query. The subqueries, which are often referred to as Common Table !Expressions or CTEs, can be thought of as defining temporary tables that exist just for this query. One use of this feature is to break down complicated queries into simpler parts. An example is: signature.asc Description: Digital signature
Re: [HACKERS] Use of O_DIRECT only for open_* sync options
On Wed, Jan 19, 2011 at 1:53 PM, Bruce Momjian wrote: > Is there a reason we only use O_DIRECT with open_* sync options? > xlogdefs.h says: > > /* > * Because O_DIRECT bypasses the kernel buffers, and because we never > * read those buffers except during crash recovery, it is a win to use > * it in all cases where we sync on each write(). We could allow O_DIRECT > * with fsync(), but because skipping the kernel buffer forces writes out > * quickly, it seems best just to use it for O_SYNC. It is hard to imagine > * how fsync() could be a win for O_DIRECT compared to O_SYNC and O_DIRECT. > * Also, O_DIRECT is never enough to force data to the drives, it merely > * tries to bypass the kernel cache, so we still need O_SYNC or fsync(). > */ > > This seems wrong because fsync() can win if there are two writes before > the sync call. Well, the comment does say "...in all cases where we sync on each write()". But that's certainly not true of WAL, so I dunno. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql: Add \dL to show languages
On Wed, Jan 19, 2011 at 5:47 PM, Andreas Karlsson wrote: > The patch looks alright now so I will mark it as ready for committer > now. This patch doesn't seem terribly consistent to me - we show the name of the call handler and the name of the validator, but for the inline handler we just indicate whether there is one or not. That seems like something that we should make consistent, and my vote is to show the name in all cases. Also, I'm wondering whether the System Language column be renamed to Internal Language, for consistency with the terminology used here: http://www.postgresql.org/docs/current/static/catalog-pg-language.html -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Thu, Jan 20, 2011 at 10:53 AM, Tom Lane wrote: >> I'm not sure why that's the right solution. Why do you think that we should >> not create the tablespace under the $PGDATA directory? I'm not surprised >> that people mounts the filesystem on $PGDATA/mnt and creates the >> tablespace on it. > > No? Usually, having a mount point in a non-root-owned directory is > considered a Bad Thing. Hmm.. but ISTM we can have a root-owned mount point in $PGDATA and create a tablespace there. $ su - # mkdir $PGDATA/mnt # mount -t tmpfs tmpfs $PGDATA/mnt # exit $ mkdir $PGDATA/mnt/tblspcdir $ psql =# CREATE TABLESPACE tblspc LOCATION '$PGDATA/mnt/tblspcdir'; CREATE TABLESPACE Am I missing something? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
> > Not true. You're missing the goal of this effort - to get ndistinct > estimate for combination of multiple columns. Which is usually > pathological in case of dependent columns. > Again, don't think about a single column (although even in that case > there are known fail cases). Think about multiple columns and the number > of distinct combinations. With attribute value independence assumption, > this is usually significantly underestimated. And that's a problem as it > often leads to an index scan instead of sequential scan (or something > like that). My point was that, in the case of single columns, we've done pretty well despite the poor ndistinct estimates. I suspect the same will be true in the case of multiple columns; good heuristics will trump minimax estimators. >> ( as I've advocated for before ) this means parameterizing the >> distribution of values ( ie, find that the values are roughly uniform >> ), maybe this means estimating the error of our statistics and using >> the most robust rather than the best plan, or maybe it means something >> totally different. But: ( and the literature seems to support me ) > > Which literature? I've read an awful lot of papers on this topic lately, > and I don't remember anything like that. If there's an interesting > paper, I'd like to read it. Yes, all the papers state estimating a > ndistinct is a particularly tricky task, but that's somehow expected? It is definitely expected - non-parametric minimax results are always *very* weak. The papers that you've cited seem to confirm this; bounding ndistinct estimation error is tricky in the general case ( even with very simple loss functions, which we do not have ). The same is true for other non-parametric methods. Think about kernel density estimation: how many data points do you need to estimate the density of a normal distribution well? What about if you *know* that the data is normal ( or even that it comes from a big family? ). > No, I'm not trying to do this just to improve the ndistinct estimate. > The goal is to improve the cardinality estimate in case of dependent > columns, and one of the papers depends on good ndistinct estimates. > > And actually it does not depend on ndistinct for the columns only, it > depends on ndistinct estimates for the combination of columns. So > improving the ndistinct estimates for columns is just a necessary first > step (and only if it works reasonably well, we can do the next step). I think that any approach which depends on precise estimates of ndistinct is not practical. I am very happy that you've spent so much time on this, and I'm sorry if my previous email came off as combative. My point was only that simple heuristics have served us well in the past and, before we go to the effort of new, complicated schemes, we should see how well similar heuristics work in the multiple column case. I am worried that if the initial plan is too complicated then nothing will happen and, even if something does happen, it will be tough to get it committed ( check the archives for cross column stat threads - there are a lot ). Best, Nathan Boley -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Kevin's suggestion seems eminently reasonable to me and probably the best approach one can do for SSI and hot standby. Pulling it off in time for 9.1 would be a stretch; 9.2 seems quite doable. It's worth noting that one way or another, the semantics of SERIALIZABLE transactions on hot standby replicas could be surprising to some. There's no getting around this; serializability in distributed systems is just a hard problem in general. Either we go with Kevin's suggestion of treating SERIALIZABLE transactions as DEFERRABLE (whether now or for 9.2), causing them to have to use an older snapshot or block until an acceptable snapshot becomes available; or we require them to be downgraded to REPEATABLE READ either implicitly or explicitly. Now, neither of these is as alarming as they might sound, given that replication lag is a fact of life for hot standby systems and REPEATABLE READ is exactly the same as the current (9.0) SERIALIZABLE behavior. But it's definitely something that should be addressed in documentation. Dan -- Dan R. K. Ports MIT CSAILhttp://drkp.net/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... REPLACE WITH
On Wed, Jan 19, 2011 at 8:57 PM, Noah Misch wrote: > I think Simon was referring to the proof-of-concept sketch I had included with > my review. I think it's a bit late to be turning proofs-of-concept into code at this point, no matter who came up with them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Wed, Jan 19, 2011 at 8:34 PM, Kevin Grittner wrote: > I agree it's pretty late in the cycle, but I'm going through all the > loose ends and found this one -- which has been hanging out on the > Wiki page as an R&D item for over a full year without discussion. > :-( If we provide the snapshots (which we can safely and easily > do), can someone else who knows what they're doing with WAL and HS > get the rest of it safely into the release? That seems to me to be > the only way it can still happen for 9.1. I think it's way too late to be embarking on what will probably turn out to be a reasonably complex and possibly controversial new development arc. I don't have a strong position on what we should do instead, but let's NOT do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: WIP: plpgsql - foreach in
* Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 19, 2011 at 6:04 PM, Stephen Frost wrote: > > I'm going to mark this returned to author with feedback. > > That implies you don't think it should be considered further for this > CommitFest. Perhaps you mean Waiting on Author? I did, actually, and that's what I actually marked it as in the CF. Sorry for any confusion. When I went to mark it in CF, I realized my mistake. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] ALTER TABLE ... REPLACE WITH
On Wed, Jan 19, 2011 at 08:55:22PM -0500, Robert Haas wrote: > On Wed, Jan 19, 2011 at 7:57 PM, Simon Riggs wrote: > > On Wed, 2011-01-19 at 17:46 -0500, Noah Misch wrote: > > > >> I'll go ahead and mark the patch Returned with Feedback. > > > > My understanding of the meaning of that is polite rejection. If you do > > that there is no further author comment and we move to July 2011. That > > then also rejects your own patch with what you say is an alternative > > implementation... > > > > Is that what you wish? That isn't what I wish, either way. I suggest you > > mark it Waiting on Author, so we can discuss it further. > > Simon, > > I have no idea what you're talking about here. It is entirely fitting > and appropriate to reject a patch the guts of which have not been > written four days into the final CommitFest. Doing so does not > somehow reject Noah's patches, which stand or fall on their own > merits. I think Simon was referring to the proof-of-concept sketch I had included with my review. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: WIP: plpgsql - foreach in
On Wed, Jan 19, 2011 at 6:04 PM, Stephen Frost wrote: > I'm going to mark this returned to author with feedback. That implies you don't think it should be considered further for this CommitFest. Perhaps you mean Waiting on Author? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... REPLACE WITH
On Wed, Jan 19, 2011 at 7:57 PM, Simon Riggs wrote: > On Wed, 2011-01-19 at 17:46 -0500, Noah Misch wrote: > >> I'll go ahead and mark the patch Returned with Feedback. > > My understanding of the meaning of that is polite rejection. If you do > that there is no further author comment and we move to July 2011. That > then also rejects your own patch with what you say is an alternative > implementation... > > Is that what you wish? That isn't what I wish, either way. I suggest you > mark it Waiting on Author, so we can discuss it further. Simon, I have no idea what you're talking about here. It is entirely fitting and appropriate to reject a patch the guts of which have not been written four days into the final CommitFest. Doing so does not somehow reject Noah's patches, which stand or fall on their own merits. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
Fujii Masao writes: > On Thu, Jan 20, 2011 at 2:21 AM, Tom Lane wrote: >> Fujii Masao writes: >>> What I'm worried about is the case where a tablespace is created >>> under the $PGDATA directory. >> What would be the sense of that? If you're concerned about whether the >> code handles it correctly, maybe the right solution is to add code to >> CREATE TABLESPACE to disallow it. > I'm not sure why that's the right solution. Why do you think that we should > not create the tablespace under the $PGDATA directory? I'm not surprised > that people mounts the filesystem on $PGDATA/mnt and creates the > tablespace on it. No? Usually, having a mount point in a non-root-owned directory is considered a Bad Thing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... REPLACE WITH
On Thu, Jan 20, 2011 at 12:57:23AM +, Simon Riggs wrote: > On Wed, 2011-01-19 at 17:46 -0500, Noah Misch wrote: > > > I'll go ahead and mark the patch Returned with Feedback. > > My understanding of the meaning of that is polite rejection. If you do > that there is no further author comment and we move to July 2011. That > then also rejects your own patch with what you say is an alternative > implementation... > > Is that what you wish? That isn't what I wish, either way. I suggest you > mark it Waiting on Author, so we can discuss it further. Before I consider my wishes too carefully, I've moved the patch back to Waiting on Author, for the reason that it seems wrong to force it elsewhere today as long as the author (you) would like it there. Not that I have some kind of authority over patch disposition in any case. I had put it straight to RWF because it seemed clearly not intended to be applied. No political statement or harm intended, and maybe that determination was not even correct. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Thu, Jan 20, 2011 at 2:21 AM, Tom Lane wrote: > Fujii Masao writes: >> What I'm worried about is the case where a tablespace is created >> under the $PGDATA directory. > > What would be the sense of that? If you're concerned about whether the > code handles it correctly, maybe the right solution is to add code to > CREATE TABLESPACE to disallow it. I'm not sure why that's the right solution. Why do you think that we should not create the tablespace under the $PGDATA directory? I'm not surprised that people mounts the filesystem on $PGDATA/mnt and creates the tablespace on it. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
>> If you think about it, it's a bit ridiculous to look at the whole table >> *just* to "estimate" ndistinct - if we go that far why dont we just >> store the full distribution and be done with it? > > - the best you could do is to average the > individual probabilities which gives ... well, 1/ndistinct. > That is certainly *not* the best you could do in every case. The mean is only the best estimate in L2, which is definitely not the case here. Consider a table with 10K values, 9,990 of which are 1 and the rest of which are 2, 3, ..., 10, versus a table that has the same 10 distinct values evenly distributed. For a simple equality query, in the first case, a bitmap scan might be best. In the second case, a sequential scan would always be best. This is precisely the point I was trying to make in my email: the loss function is very complicated. Improving the ndistinct estimator could significantly improve the estimates of ndistinct ( in the quadratic loss sense ) while only marginally improving the plans. -Nathan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Simon Riggs wrote: > In this release? Maybe? In later releases? Yes. > > If it blocks your excellent contribution in this release, then > from me, "no". If you can achieve this in this release, yes. > However, if this is difficult or complex, then I would rather say > "not yet" quickly now, than spend months working out the > weirdnesses and possibly still get them wrong. We already have a mechanism for generating a good snapshot, the hard part (for me at least) would be to get that snapshot over to the hot standby and have it use the latest one on a request for a serializable transaction. I have no experience with WAL file output, and don't know what it would take for hot standby to use it as I describe. I agree it's pretty late in the cycle, but I'm going through all the loose ends and found this one -- which has been hanging out on the Wiki page as an R&D item for over a full year without discussion. :-( If we provide the snapshots (which we can safely and easily do), can someone else who knows what they're doing with WAL and HS get the rest of it safely into the release? That seems to me to be the only way it can still happen for 9.1. If not, I agree this can be 9.2 material. We just have to decide how to document it and answer the questions near the bottom of my initial post of the thread. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Wed, Jan 19, 2011 at 9:37 PM, Magnus Hagander wrote: >> The "fast or slow" seems to lead users to always choose "fast". Instead, >> what about "fast or smooth", "fast or spread" or "immediate or delayed"? > > Hmm. "fast or spread" seems reasonable to me. And I want to use "fast" > for the fast version, because that's what we call it when you use > pg_start_backup(). I'll go change it to spread for now - it's the one > I can find used in the docs. Fair enough. >> What if pg_basebackup receives a signal while doing a backup? >> For example, users might do Ctrl-C to cancel the long-running backup. >> We should define a signal handler and send a Terminate message >> to the server to cancel the backup? > > Nah, we'll just disconnect and it'll deal with things that way. Just > like we do with e.g. pg_dump. I don't see the need to complicate it > with that. Okay. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote: > Here's an issue for feedback from the community -- do we want to > support truly serializable transactions on hot standby machines? In this release? Maybe? In later releases? Yes. If it blocks your excellent contribution in this release, then from me, "no". If you can achieve this in this release, yes. However, if this is difficult or complex, then I would rather say "not yet" quickly now, than spend months working out the weirdnesses and possibly still get them wrong. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] SSI and Hot Standby
Here's an issue for feedback from the community -- do we want to support truly serializable transactions on hot standby machines? The best way Dan and I have been able to think to do this is to build on the SERIALIZABLE READ ONLY DEFERRABLE behavior. We are able to obtain a snapshot and then check to see if it is at a place in the transaction processing that it would be guaranteed to be serializable without participating in predicate locking, rw-conflict detection, etc. If it's not, we block until a READ WRITE transaction completes, and then check again. Repeat. We may reach a point where we determine that the snapshot can't work, and we get a new one and start over. Due to the somewhat complex rules for this, you are likely to see a safe snapshot fairly quickly even in a mix which always has short-lived READ WRITE transactions running, although a single long-running READ WRITE transaction can block things until it completes. The idea is that whenever we see a valid snapshot which would yield a truly serializable view of the data for a READ ONLY transaction, we add a WAL record with that snapshot information. Of course, we might want some limit of how often they are sent, to avoid WAL bloat. A hot standby could just keep the most recently received of these and use it when a SERIALIZABLE transaction is requested. Perhaps DEFERRABLE in this context could mean that it waits for the *next* one and uses it, to assure "freshness". Actually, we could try to get tricky to avoid sending a complete snapshot by having two WAL messages with no payload -- one would mean "the snapshot you would get now is being tested for serializability". If it failed reach that state we would send another when we started working a new snapshot. The other type of message would mean "the snapshot you built when we last told you we were starting to test one is good." I *think* that can work, and it may require less WAL space. If we don't do something like this, do we just provide REPEATABLE READ on the standby as the strictest level of transaction isolation? If so, do we generate an error on a request for SERIALIZABLE, warn and provide degraded behavior, or just quietly give them REPEATABLE READ behavior? Thoughts? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... REPLACE WITH
On Wed, 2011-01-19 at 17:46 -0500, Noah Misch wrote: > I'll go ahead and mark the patch Returned with Feedback. My understanding of the meaning of that is polite rejection. If you do that there is no further author comment and we move to July 2011. That then also rejects your own patch with what you say is an alternative implementation... Is that what you wish? That isn't what I wish, either way. I suggest you mark it Waiting on Author, so we can discuss it further. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python refactoring
On 19/01/11 10:57, Jan Urbański wrote: > On 18/01/11 23:22, Peter Eisentraut wrote: >> #2: It looks like this loses some information/formatting in the error >> message. Should we keep the pointing arrow there? >> CONTEXT: PL/Python function "sql_syntax_error" >> -ERROR: syntax error at or near "syntax" >> -LINE 1: syntax error >> -^ >> -QUERY: syntax error >> +ERROR: PL/Python: plpy.SPIError: syntax error at or near "syntax" >> CONTEXT: PL/Python function "sql_syntax_error" > Yes, the message is less informative, because the error is reported by > PL/Python, by wrapping the SPI message. I guess I could try to extract > more info from the caught ErrorData and put it in the new error that > PL/Python throws. All right, I found a way to shoehorn the extra information into SPIException, I'll post a new patch series with what's left of the general refactoring patch soon. Shortly after, I'll post updated patches for doing SPI in subxacts, explicit subxact starting and generated SPI exceptions, as they will surely be broken by these changes. Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
On Jan19, 2011, at 23:44 , Nathan Boley wrote: > If you think about it, it's a bit ridiculous to look at the whole table > *just* to "estimate" ndistinct - if we go that far why dont we just > store the full distribution and be done with it? The crucial point that you're missing here is that ndistinct provides an estimate even if you *don't* have a specific value to search for at hand. This is way more common than you may think, it e.g. happens every you time PREPARE are statement with parameters. Even knowing the full distribution has no advantage in this case - the best you could do is to average the individual probabilities which gives ... well, 1/ndistinct. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
Dne 19.1.2011 23:44, Nathan Boley napsal(a): > 1) The distribution of values in a table is rarely pathological, and > usually follows one of several common patterns. ( IOW, we have good > heuristics ) Not true. You're missing the goal of this effort - to get ndistinct estimate for combination of multiple columns. Which is usually pathological in case of dependent columns. Which is exactly the case when the user will explicitly enable this feature to get better estimates (and thus better plans). > 2) The choice of plan is fairly robust to mis-estimation of ndistinct, > because there are only a few things the executer can choose. It > doesn't usually matter if a value composes 5% or 20% ( or 99% ) of > the table, we still usually need to scan the entire table. Again, don't think about a single column (although even in that case there are known fail cases). Think about multiple columns and the number of distinct combinations. With attribute value independence assumption, this is usually significantly underestimated. And that's a problem as it often leads to an index scan instead of sequential scan (or something like that). > Again, in my *very* humble opinion, If the ultimate goal is to improve > the planner, we should try to cut down on the number of cases in which > a poor estimate of ndistinct gives a really bad plan instead of > chasing after marginal gains from a better ndistinct estimator. Maybe What is a marginal gain? The ultimate goal is to build cross-column stats, which in case of dependent columns usually results is poor plans. How is fixing that a marginal gain? Yes, it may turn out it's not worth it, but it's a bit early to say that without an implementation and some hard data. > ( as I've advocated for before ) this means parameterizing the > distribution of values ( ie, find that the values are roughly uniform > ), maybe this means estimating the error of our statistics and using > the most robust rather than the best plan, or maybe it means something > totally different. But: ( and the literature seems to support me ) Which literature? I've read an awful lot of papers on this topic lately, and I don't remember anything like that. If there's an interesting paper, I'd like to read it. Yes, all the papers state estimating a ndistinct is a particularly tricky task, but that's somehow expected? I've even checked how other databases do this - e.g. Oracle does it very similarly to what I proposed (the user has to enable the feature, it has to be rebuilt from time to time, etc.). I'm not saying we should do everything like Oracle, but they are not clueless idiots. > pounding away at the ndistinct estimator seems like a dead end. If you > think about it, it's a bit ridiculous to look at the whole table > *just* to "estimate" ndistinct - if we go that far why dont we just > store the full distribution and be done with it? No, I'm not trying to do this just to improve the ndistinct estimate. The goal is to improve the cardinality estimate in case of dependent columns, and one of the papers depends on good ndistinct estimates. And actually it does not depend on ndistinct for the columns only, it depends on ndistinct estimates for the combination of columns. So improving the ndistinct estimates for columns is just a necessary first step (and only if it works reasonably well, we can do the next step). regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] REVIEW: WIP: plpgsql - foreach in
Greetings, * Pavel Stehule (pavel.steh...@gmail.com) wrote: > attached patch contains a implementation of iteration over a array: I've gone through this patch and, in general, it looks pretty reasonable to me. There's a number of places where I think additional comments would be good and maybe some variable name improvments. Also, my changes should be reviewed to make sure they make sense. Attached is a patch against master which includes my changes, and a patch against Pavel's patch, so he can more easily see my changes and include them if he'd like. I'm going to mark this returned to author with feedback. commit 30295015739930e68c33b29da4f7ef535bc293ea Author: Stephen Frost Date: Wed Jan 19 17:58:24 2011 -0500 Clean up foreach-in-array PL/PgSQL code/comments Minor clean-up of the PL/PgSQL foreach-in-array patch, includes some white-space cleanup, grammar fixes, additional errhint where it makes sense, etc. Also added a number of 'XXX' comments asking for clarification and additional comments on what's happening in the code. commit f1a02fe3a8fa84217dae32d5ba74e9764c77431c Author: Stephen Frost Date: Wed Jan 19 15:11:53 2011 -0500 PL/PgSQL - Add interate-over-array support This patch adds support for iterating over an array in PL/PgSQL. Patch Author: Pavel Stehule Thanks, Stephen *** a/doc/src/sgml/plpgsql.sgml --- b/doc/src/sgml/plpgsql.sgml *** *** 2238,2243 END LOOP label ; --- 2238,2268 + + Looping Through Array + + <
Re: [HACKERS] psql: Add \dL to show languages
On Tue, 2011-01-18 at 19:34 -0500, Josh Kupershmidt wrote: > Got that now too. I lost my ~/.emacs file recently, which is mostly > why I'm making whitespace mistakes. Rebuilding slowly though; > (setq-default show-trailing-whitespace t) is what I needed. Aha, I see. > I left the "Call Handler" and "Validator" columns in the verbose > output since I haven't heard otherwise. > > Josh I do not really care either way. The columns are not terribly useful but not pointless either. The patch looks alright now so I will mark it as ready for committer now. Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... REPLACE WITH
Hi Simon, I'm reviewing this patch for CommitFest 2011-01. On Sat, Jan 15, 2011 at 10:02:03PM +, Simon Riggs wrote: > On Tue, 2010-12-14 at 19:48 +, Simon Riggs wrote: > > REPLACE TABLE ying WITH yang > Patch. Needs work. First, I'd like to note that the thread for this patch had *four* "me-too" responses to the use case. That's extremely unusual; the subject is definitely compelling to people. It addresses the bad behavior of natural attempts to atomically swap two tables in the namespace: psql -c "CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES ('new')" psql -c 'SELECT pg_sleep(2) FROM t' & # block the ALTER or DROP briefly sleep 1 # give prev time to take AccessShareLock # Do it this way, and the next SELECT gets data from the old table. #psql -c 'ALTER TABLE t RENAME TO old_t; ALTER TABLE new_t RENAME TO t' & # Do it this way, and get: ERROR: could not open relation with OID 41380 psql -c 'DROP TABLE t; ALTER TABLE new_t RENAME TO t' & psql -c 'SELECT * FROM t' # I get 'old' or an error, never 'new'. psql -c 'DROP TABLE IF EXISTS t, old_t, new_t' by letting you do this instead: psql -c "CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES ('new')" psql -c 'SELECT pg_sleep(2) FROM t' & # block the ALTER or DROP briefly sleep 1 # give prev time to take AccessShareLock psql -c 'EXCHANGE TABLE new_t TO t & psql -c 'SELECT * FROM t' # I get 'new', finally! psql -c 'DROP TABLE IF EXISTS t, new_t' I find Heikki's (4d07c6ec.2030...@enterprisedb.com) suggestion from the thread interesting: can we just make the first example work? Even granting that the second syntax may be a useful addition, the existing behavior of the first example is surely worthless, even actively harmful. I tossed together a proof-of-concept patch, attached, that makes the first example DTRT. Do you see any value in going down that road? As for your patch itself: > + /* > + * Exchange table contents > + * > + * Swap heaps, toast tables, toast indexes > + * all forks > + * all indexes For indexes, would you basically swap yin<->yang in pg_index.indrelid, update pg_class.relnamespace as needed, and check for naming conflicts (when yin and yang differ in schema)? Is there more? > + * > + * Checks: > + * * table definitions must match Is there a particular reason to require this, or is it just a simplification to avoid updating things to match? > + * * constraints must match Wouldn't CHECK constraints have no need to match? > + * * indexes need not match > + * * outbound FKs don't need to match > + * * inbound FKs will be set to not validated We would need to ensure that inbound FOREIGN KEY constraints still have indexes suitable to implement them. The easiest thing there might be to internally drop and recreate the constraint, so we get all that verification. > + * > + * No Trigger behaviour > + * > + * How is it WAL logged? By locks and underlying catalog updates > + */ I see that the meat of the patch is yet to be written, so this ended up as more of a design review based on your in-patch comments. Hopefully it's of some value. I'll go ahead and mark the patch Returned with Feedback. Thanks, nm *** a/src/backend/access/heap/heapam.c --- b/src/backend/access/heap/heapam.c *** *** 970,995 relation_openrv(const RangeVar *relation, LOCKMODE lockmode) { Oid relOid; ! /* !* Check for shared-cache-inval messages before trying to open the !* relation. This is needed to cover the case where the name identifies a !* rel that has been dropped and recreated since the start of our !* transaction: if we don't flush the old syscache entry then we'll latch !* onto that entry and suffer an error when we do RelationIdGetRelation. !* Note that relation_open does not need to do this, since a relation's !* OID never changes. !* !* We skip this if asked for NoLock, on the assumption that the caller has !* already ensured some appropriate lock is held. !*/ ! if (lockmode != NoLock) ! AcceptInvalidationMessages(); ! ! /* Look up the appropriate relation using namespace search */ ! relOid = RangeVarGetRelid(relation, false); /* Let relation_open do the rest */ ! return relation_open(relOid, lockmode); } /* --- 970,980 { Oid relOid; ! /* Look up and lock the appropriate relation using namespace search */ ! relOid = RangeVarLockRelid(relation, lockmode, false); /* Let rel
Re: [HACKERS] estimating # of distinct values
On Wed, Jan 19, 2011 at 2:13 PM, Tomas Vondra wrote: > Dne 19.1.2011 20:25, Robert Haas napsal(a): >> On Tue, Jan 18, 2011 at 5:16 PM, Tomas Vondra wrote: >>> Yes, I was aware of this problem (amount of memory consumed with lots of >>> updates), and I kind of hoped someone will come up with a reasonable >>> solution. >> >> As far as I can see, periodically sampling some or all of the table is >> really the only thing that has a chance of working OK. You could try >> to track changes only when they're not too big, but I think that's >> still going to have nasty performance consequences. > > Yes, sampling all the table is the only way to get reliable ndistinct > estimates. IMHO continuing to focus on worst case results is a dead end. Yes, for some distributions, ndistinct is very hard to estimate well. However, let us not forget that the current ndistinct estimator is very bad but the postgres planner is, on the whole, very good. As far as I can see this is due to two facts: 1) The distribution of values in a table is rarely pathological, and usually follows one of several common patterns. ( IOW, we have good heuristics ) 2) The choice of plan is fairly robust to mis-estimation of ndistinct, because there are only a few things the executer can choose. It doesn't usually matter if a value composes 5% or 20% ( or 99% ) of the table, we still usually need to scan the entire table. Again, in my *very* humble opinion, If the ultimate goal is to improve the planner, we should try to cut down on the number of cases in which a poor estimate of ndistinct gives a really bad plan instead of chasing after marginal gains from a better ndistinct estimator. Maybe ( as I've advocated for before ) this means parameterizing the distribution of values ( ie, find that the values are roughly uniform ), maybe this means estimating the error of our statistics and using the most robust rather than the best plan, or maybe it means something totally different. But: ( and the literature seems to support me ) pounding away at the ndistinct estimator seems like a dead end. If you think about it, it's a bit ridiculous to look at the whole table *just* to "estimate" ndistinct - if we go that far why dont we just store the full distribution and be done with it? Best, Nathan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo List Item - System Table Index Clustering
Robert Haas writes: > On Wed, Jan 19, 2011 at 3:10 PM, Tom Lane wrote: >> Well, on my machine pg_description is about 210K (per database) as of >> HEAD. 90% of its contents are pg_proc entries, though I have no good >> fix on how much of that is for internal-use-only functions. A very >> rough estimate from counting pg_proc and pg_operator entries suggests >> that the answer might be "about a third". So if we do what was said in >> the above-cited thread, ie move existing comments to pg_operator and >> add boilerplate ones to pg_proc, we probably would pay <100K for it. > I guess that's not enormously expensive, but it's not insignificant > either. On my machine, a template database is 5.5MB. The implementation I was thinking about was to have initdb run a SQL command that would do something like INSERT INTO pg_description SELECT oprcode, 'pg_proc'::regclass, 0, 'implementation of ' || oprname FROM pg_operator WHERE theres-not-already-a-description-of-the-oprcode-function So it would be minimal work to either provide or omit the boilerplate descriptions. I think we can postpone the decision till we have a closer fix on the number of entries we're talking about. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
Dne 19.1.2011 20:46, Tom Lane napsal(a): > Robert Haas writes: >> ... I guess I'm just saying I'd think really, really hard >> before abandoning the idea of periodic sampling. You have to get an >> awful lot of benefit out of those cross-column stats to make it worth >> paying a run-time cost for them. > > I've been trying to not discourage Tomas from blue-sky speculation, > but I have to say I agree with Robert that the chance of any usable > result from this approach is very very small. Feel free to do some > experimentation to prove us wrong --- but don't put a lot of effort > into it before that. > > regards, tom lane Discourage? Not really - I have not expected this to be a simple thing to implement. And yes, it might turn out to be a dead end eventually. OTOH the approach "it seems really expensive so let's not try to build it" is a certain dead end, so I'm not going to surrender due to such speculations (althouh those are valid concerns and I'll have to address them in the future). regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
Dne 19.1.2011 20:25, Robert Haas napsal(a): > On Tue, Jan 18, 2011 at 5:16 PM, Tomas Vondra wrote: >> Yes, I was aware of this problem (amount of memory consumed with lots of >> updates), and I kind of hoped someone will come up with a reasonable >> solution. > > As far as I can see, periodically sampling some or all of the table is > really the only thing that has a chance of working OK. You could try > to track changes only when they're not too big, but I think that's > still going to have nasty performance consequences. Yes, sampling all the table is the only way to get reliable ndistinct estimates. I'm not sure what you mean by 'tracking changes' - as I've mentioned in the previous post, this might be solved by updating a local copy. Which requires a constant space (a few kB, see the previous post). Is that acceptable? I don't know, that's up to the user if he want's to pay this price. >> So the algorithm would be something like this: >> >> 1. create copy for all distinct estimators influenced by the INSERT >> 2. update the local copy >> 3. commit and merge the local copies back into the original estimator > > Well, maybe. But DELETEs seem like a problem. And it's still adding > foreground work to every query, which I just have a hard time > believing is ever going to work out Yes, deletes are difficult to handle. My idea was to compute something like ((tuples changed + tuples deleted) / tuples total), and indicate that a rebuild of the estimator is needed if this coefficient changes too much - e.g. log a message or something like that. >> Regarding the crash scenario - if the commit fails, just throw away the >> local estimator copy, it's not needed. I'm not sure how to take care of >> the case when commit succeeds and the write of the merged estimator >> fails, but I think it might be possible to write the estimator to xlog >> or something like that. So it would be replayed during recovery etc. Or >> is it a stupid idea? > > It's not stupid, in the sense that that is what you'd need to do if > you want to avoid ever having to rescan the table. But it is another > thing that I think is going to be way too expensive. Way too expensive? All you need to put into the logfile is a copy of the estimator, which is a few kBs. How is that 'way too expensive'? Sure, it might be expensive when the user does a lot of small changes in separate transactions, that's true, but I guess we could minimize the amount of data written to the xlog by doing a diff of the estimators or something like that. >> Yes, as I've mentioned above, the multi-column stats are the reason why >> I started working on this. And yes, it really should work like this: >> >> 1. user realizes there's something wrong as the plans are bad >> 2. after analysis, the user realizes the reason is an imprecise >> estimate due to dependence between columns >> 3. the user enables cross-column stats on the columns and checks >> if it actually solved the issue >> 4. if the cost outweights the gains, the user drops the stats >> >> Does that sound reasonable? > > Yes. The only caveat I'm trying to insert is that it's hard for me to > see how the proposed approach could ever be cheap enough to be a win. IMHO the question is not 'How expensive is that?' but rather 'How expensive is it compared to the gains?' If the user gets much better estimates and a then a much better plan, then this may be a completely acceptable price. > If we need some kind of more expensive kind of ANALYZE that scans the > whole table, and it's optional, sure, why not? But that's a one-time > hit. You run it, it sucks, it's over, and now your queries work. > Odds are good you may never need to touch it again. Now, if we can > convince ourselves that multi-column stats are likely to require > constant updating, then maybe there would be more to recommend the > stream-based approach, although even then it seems dreadfully complex > and expensive to me. But I bet these things are pretty static. If No, the multi-column statistics do not require constant updating. There are cases where a sampling is perfectly fine, although you may need a bit larger sample. Generally if you can use a multi-dimensional histogram, you don't need to scan the whole table. But the multi-dimensional histograms are not applicable to some cases. Especially to the ZIP code fail case, that was repeatedly discussed. So in case of discrete data, we need a different approach - and the solution I've proposed is based on using ndistinct estimates to get the estimates (actually it's based on one of the papers listed on wiki). > and expensive to me. But I bet these things are pretty static. If > the city and state tend to imply the zip code when there are 10K rows > in the table, they probably still will when there are 100K rows in the > table. If users with org_id = 1 have a higher karma score on average OK, how will you measure the "implicativeness"? We have discussed this in another threa
Re: [HACKERS] pl/python refactoring
On Wed, Jan 19, 2011 at 2:59 PM, Peter Eisentraut wrote: > On ons, 2011-01-19 at 00:52 -0300, Alvaro Herrera wrote: >> Excerpts from Peter Eisentraut's message of mar ene 18 19:22:50 -0300 2011: >> >> > #16: This is probably pointless because pgindent will reformat this. >> >> pgindent used to remove useless braces around single-statement blocks, >> but this behavior was removed years ago because it'd break formatting >> around PG_TRY blocks. > > Good to know. Committed then. I cracked up upon reading your commit message. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
On Wed, Jan 19, 2011 at 4:18 PM, Tom Lane wrote: > One idea that I think we discussed was to tie cache entries to the > memory context they were demanded in, and mark them unused at the next > context reset/delete. That way they'd be considered unused at the same > points where the current implementation would certainly have discarded > the value. This isn't perfect (because of pfree) but might be good > enough. Yeah, I was thinking that's probably what would have to be done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
"Kevin Grittner" writes: > Tom Lane wrote: >> If we could solve the refcounting problem I think this would be a >> very significant win. > Instead of trying to keep a refcount, how about just evicting from > the buffer as needed based on LRU? Well, unless you know for certain that an item is no longer used, you can't evict it. There are ways you could finesse that --- for instance, you might try to only flush between statements, when certainly no user functions are running --- but the problem is that the cache is likely to contain some large values that you can't adopt such a laissez faire approach with, or you'll run out of memory. One idea that I think we discussed was to tie cache entries to the memory context they were demanded in, and mark them unused at the next context reset/delete. That way they'd be considered unused at the same points where the current implementation would certainly have discarded the value. This isn't perfect (because of pfree) but might be good enough. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Couple document fixes
On 19 January 2011 21:10, Tom Lane wrote: > Thom Brown writes: >> I've attached a couple minor fixes to the docs. One relating to >> SECURITY LABEL and the other for pg_class.relpersistence > > Applied, thanks. Cheers Mr Lane. -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Couple document fixes
Thom Brown writes: > I've attached a couple minor fixes to the docs. One relating to > SECURITY LABEL and the other for pg_class.relpersistence Applied, thanks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
Tom Lane wrote: > If we could solve the refcounting problem I think this would be a > very significant win. Instead of trying to keep a refcount, how about just evicting from the buffer as needed based on LRU? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
Robert Haas writes: > I do remember that discussion. Aside from the problem you mention, it > also seems that maintaining the hash table and doing lookups into it > would have some intrinsic cost. Well, sure, but it's still far cheaper than going out to the toast table (which will require multiple hashtable lookups, not to mention I/O and possible lock blocking). If we could solve the refcounting problem I think this would be a very significant win. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending opfamilies for GIN indexes
Dimitri Fontaine writes: > For the GIN indexes, we have 2 methods for building the index and 3 > others to search it to solve the query. You're proposing that the 2 > former methods would be in the opfamily and the 3 later in the opclass. Actually the other way around. An opclass is the subset of an opfamily that is tightly bound to an index. The "build" methods have to be associatable with an index, so they're part of the index's opclass. The "query" methods could be loose in the opfamily. > So we would want the planner to know that in the GIN case an index built > with any opclass of a given opfamily can help answer a query that would > need any opclass of the opfamily. Right? The planner's not the problem here --- what's missing is the rule for the index AM to look up the right support functions to call at runtime. The trick is to associate the proper query support methods with any given query operator (which'd also be loose in the family, probably). The existing schema for pg_amop and pg_amproc is built on the assumption that the amoplefttype/amoprighttype are sufficient for making this association; but that seems to fall down if we would like to allow contrib modules to add new query operators that coincidentally take the same input types as an existing opfamily member. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: patch: remove redundant code from pl_exec.c
* Tom Lane (t...@sss.pgh.pa.us) wrote: > I think we should reject this one. Works for me. > Using a switch there is a bit problematic since in some cases you want > to use "break" to exit the loop. We could replace such breaks by gotos, > but that would be another strike against the argument that you're making > things more readable. I think the switch in exec_stmt_loop is only > workable because it has no cleanup to do, so it can just "return" in > places where a loop break would otherwise be needed. In short: if you > want to make these all look alike, better to go the other way. Ah, yeah, good point. We do use gotos elsewhere for this reason, might consider revisiting those also, if we're trying to a 'clean-up' patch. In any case, I'll mark this as rejected. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] REVIEW: patch: remove redundant code from pl_exec.c
Stephen Frost writes: > While I can certainly appreciate wanting to remove redundant code, I > don't think this change actually improves the situation. The problem is > more than just that we might want to make a change to 'while' but not > 'for', it's also that it makes it very difficult to follow the code > flow. That was my reaction as well; and I was also concerned that this could have a non-negligible performance price. (At the very least it's adding an additional function call per loop execution, and there could also be a penalty from forcing "rc" to be in memory rather than a register.) I think we should reject this one. > In the end, I'd recommend cleaning up the handling of the exec_stmts() > return code so that all of these pieces follow the same style and look > similar (I'd go with the switch-based approach and remove the if/else > branches). That'll make it easier for anyone coming along later who > does end up needing to change all three. Using a switch there is a bit problematic since in some cases you want to use "break" to exit the loop. We could replace such breaks by gotos, but that would be another strike against the argument that you're making things more readable. I think the switch in exec_stmt_loop is only workable because it has no cleanup to do, so it can just "return" in places where a loop break would otherwise be needed. In short: if you want to make these all look alike, better to go the other way. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo List Item - System Table Index Clustering
On Wed, Jan 19, 2011 at 3:10 PM, Tom Lane wrote: > Robert Haas writes: >> On Wed, Jan 19, 2011 at 2:26 PM, Tom Lane wrote: >>> Which brings up another point though. I have a personal TODO item to >>> make the comments for operator support functions more consistent: >>> http://archives.postgresql.org/message-id/21407.1287157...@sss.pgh.pa.us >>> Should we consider removing those comments altogether, instead? > >> I could go either way on that. Most of those comments are pretty >> short, aren't they? How much storage are they really costing us? > > Well, on my machine pg_description is about 210K (per database) as of > HEAD. 90% of its contents are pg_proc entries, though I have no good > fix on how much of that is for internal-use-only functions. A very > rough estimate from counting pg_proc and pg_operator entries suggests > that the answer might be "about a third". So if we do what was said in > the above-cited thread, ie move existing comments to pg_operator and > add boilerplate ones to pg_proc, we probably would pay <100K for it. I guess that's not enormously expensive, but it's not insignificant either. On my machine, a template database is 5.5MB. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
On Wed, Jan 19, 2011 at 2:58 PM, Tom Lane wrote: > Robert Haas writes: >> On Wed, Jan 19, 2011 at 12:10 PM, Tom Lane wrote: >>> Robert Haas writes: Yeah. Many-times-repeated detoasting is really bad, and this is not the only place in the backend where we have this problem. :-( > >>> Yeah, there's been some discussion of a more general solution, and I >>> think I even had a trial patch at one point (which turned out not to >>> work terribly well, but maybe somebody will have a better idea someday). > >> I'm pretty doubtful that there's going to be a general solution to >> this problem - I think it's going to require gradual refactoring of >> problem spots. > > Do you remember the previous discussion? One idea that was on the table > was to make the TOAST code maintain a cache of detoasted values, which > could be indexed by the toast pointer OIDs (toast rel OID + value OID), > and then PG_DETOAST_DATUM might give back a pointer into the cache > instead of a fresh value. In principle that could be done in a fairly > centralized way. The hard part is to know when a cache entry is not > actively referenced anymore ... I do remember that discussion. Aside from the problem you mention, it also seems that maintaining the hash table and doing lookups into it would have some intrinsic cost. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo List Item - System Table Index Clustering
Robert Haas writes: > On Wed, Jan 19, 2011 at 2:26 PM, Tom Lane wrote: >> Which brings up another point though. I have a personal TODO item to >> make the comments for operator support functions more consistent: >> http://archives.postgresql.org/message-id/21407.1287157...@sss.pgh.pa.us >> Should we consider removing those comments altogether, instead? > I could go either way on that. Most of those comments are pretty > short, aren't they? How much storage are they really costing us? Well, on my machine pg_description is about 210K (per database) as of HEAD. 90% of its contents are pg_proc entries, though I have no good fix on how much of that is for internal-use-only functions. A very rough estimate from counting pg_proc and pg_operator entries suggests that the answer might be "about a third". So if we do what was said in the above-cited thread, ie move existing comments to pg_operator and add boilerplate ones to pg_proc, we probably would pay <100K for it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending opfamilies for GIN indexes
Tom Lane writes: > I think you missed the point: right now, to use both the core and > intarray operators on an integer[] column, you have to create *two* > GIN indexes, which will have exactly identical contents. I'm looking > for a way to let intarray extend the core opfamily definition so that > one index can serve. That I think I understood, but then I mixed opfamily and opclasses badly. Let's try again. For the GIN indexes, we have 2 methods for building the index and 3 others to search it to solve the query. You're proposing that the 2 former methods would be in the opfamily and the 3 later in the opclass. We'd like to be able to use the same index (which building depends on the opfamily) for solving different kind of queries, for which we can use different traversal and search algorithms, that's the opclass. So we would want the planner to know that in the GIN case an index built with any opclass of a given opfamily can help answer a query that would need any opclass of the opfamily. Right? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python refactoring
On ons, 2011-01-19 at 00:52 -0300, Alvaro Herrera wrote: > Excerpts from Peter Eisentraut's message of mar ene 18 19:22:50 -0300 2011: > > > #16: This is probably pointless because pgindent will reformat this. > > pgindent used to remove useless braces around single-statement blocks, > but this behavior was removed years ago because it'd break formatting > around PG_TRY blocks. Good to know. Committed then. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
Robert Haas writes: > On Wed, Jan 19, 2011 at 12:10 PM, Tom Lane wrote: >> Robert Haas writes: >>> Yeah. Many-times-repeated detoasting is really bad, and this is not >>> the only place in the backend where we have this problem. :-( >> Yeah, there's been some discussion of a more general solution, and I >> think I even had a trial patch at one point (which turned out not to >> work terribly well, but maybe somebody will have a better idea someday). > I'm pretty doubtful that there's going to be a general solution to > this problem - I think it's going to require gradual refactoring of > problem spots. Do you remember the previous discussion? One idea that was on the table was to make the TOAST code maintain a cache of detoasted values, which could be indexed by the toast pointer OIDs (toast rel OID + value OID), and then PG_DETOAST_DATUM might give back a pointer into the cache instead of a fresh value. In principle that could be done in a fairly centralized way. The hard part is to know when a cache entry is not actively referenced anymore ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
Robert Haas writes: > ... I guess I'm just saying I'd think really, really hard > before abandoning the idea of periodic sampling. You have to get an > awful lot of benefit out of those cross-column stats to make it worth > paying a run-time cost for them. I've been trying to not discourage Tomas from blue-sky speculation, but I have to say I agree with Robert that the chance of any usable result from this approach is very very small. Feel free to do some experimentation to prove us wrong --- but don't put a lot of effort into it before that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo List Item - System Table Index Clustering
On Wed, Jan 19, 2011 at 2:26 PM, Tom Lane wrote: > Robert Haas writes: >> On Tue, Jan 18, 2011 at 6:49 PM, Simone Aiken >> wrote: >>> Pages like this one have column comments for the system tables: >>> >>> http://www.psql.it/manuale/8.3/catalog-pg-attribute.html > >> Oh, I see. I don't think we want to go there. We'd need some kind of >> system for keeping the two places in sync. > > I seem to recall some muttering about teaching genbki to extract such > comments from the SGML sources or perhaps the C header files. I tend to > agree though that it would be a lot more work than it's worth. And as > you say, pg_description entries aren't free. > > Which brings up another point though. I have a personal TODO item to > make the comments for operator support functions more consistent: > http://archives.postgresql.org/message-id/21407.1287157...@sss.pgh.pa.us > Should we consider removing those comments altogether, instead? I could go either way on that. Most of those comments are pretty short, aren't they? How much storage are they really costing us? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] REVIEW: patch: remove redundant code from pl_exec.c
Greetings, * Pavel Stehule (pavel.steh...@gmail.com) wrote: > This patch remove redundant rows from PL/pgSQL executor (-89 lines). While I can certainly appreciate wanting to remove redundant code, I don't think this change actually improves the situation. The problem is more than just that we might want to make a change to 'while' but not 'for', it's also that it makes it very difficult to follow the code flow. If you could have found a way to make it work for all calls to exec_stmts(), it might be a bit better, but you can't without kludging it up further. If you could, then it might be possible to move some of this logic *into* exec_stmts(), but I don't see that being reasonable either. In the end, I'd recommend cleaning up the handling of the exec_stmts() return code so that all of these pieces follow the same style and look similar (I'd go with the switch-based approach and remove the if/else branches). That'll make it easier for anyone coming along later who does end up needing to change all three. > Doesn't change a functionality. I'm not convinced of this either, to be honest.. In exec_stmt_while(), there is: for(;;) { [...] if (isnull || !value) break; rc = exec_stmts(estate, stmt->body); [...] } return PLPGSQL_RC_OK; You replaced the last return with: return rc; Which could mean returning an uninitialized rc if the above break; happens. In the end, I guess it's up to the committers on how they feel about this, so here's an updated patch which fixes the above issue and improves the comments/grammer. Passes all regressions and works in my limited testing. commit e6639d83db5b301e184bf158b1591007a3f79526 Author: Stephen Frost Date: Wed Jan 19 14:28:36 2011 -0500 Improve comments in pl_exec.c wrt can_leave_loop() This patch improves some of the comments around can_leave_loop(), and fixes a couple of fall-through cases to make sure they behave the same as before the code de-duplication patch that introduced can_leave_loop(). commit f8262adcba662164dbc24d289cb036b3f0aee582 Author: Stephen Frost Date: Wed Jan 19 14:26:27 2011 -0500 remove redundant code from pl_exec.c This patch removes redundant handling of exec_stmts()'s return code by creating a new function to be called after exec_stmts() is run. This new function will then handle the return code, update it if necessary, and return if the loop should continue or not. Patch by Pavel Stehule. Thanks, Stephen *** a/src/pl/plpgsql/src/pl_exec.c --- b/src/pl/plpgsql/src/pl_exec.c *** *** 204,209 static Portal exec_dynquery_with_params(PLpgSQL_execstate *estate, --- 204,211 PLpgSQL_expr *dynquery, List *params, const char *portalname, int cursorOptions); + static bool can_leave_loop(PLpgSQL_execstate *estate, + PLpgSQL_any_loop *stmt, int *rc); /* -- * plpgsql_exec_function Called by the call handler for *** *** 1566,1571 exec_stmt_case(PLpgSQL_execstate *estate, PLpgSQL_stmt_case *stmt) --- 1568,1650 return exec_stmts(estate, stmt->else_stmts); } + /* + * can_leave_loop + * + * Check the result of exec_stmts for the various exec_stmt_loop + * functions (unconditional loop, while loop, for-over-integer loop, + * for-over-portal loop). + * + * Returns true when the outer cycle should be left, otherwise false. + * Will also update the return code (rc) as necessary. + */ + static bool + can_leave_loop(PLpgSQL_execstate *estate, PLpgSQL_any_loop *stmt, int *rc) + { + /* + * Check which return code exec_stmts() returned and handle it + * accordingly. + */ + switch (*rc) + { + case PLPGSQL_RC_OK: + /* Just continue the outer loop on PLPGSQL_RC_OK */ + return false; + + case PLPGSQL_RC_RETURN: + /* Time to break out of the outer loop */ + return true; + + case PLPGSQL_RC_EXIT: + if (estate->exitlabel == NULL) + { + /* unlabelled exit, finish the current loop */ + *rc = PLPGSQL_RC_OK; + } + if (stmt->label != NULL && strcmp(stmt->label, estate->exitlabel) == 0) + { + /* labelled exit, matches the current stmt's label */ + estate->exitlabel = NULL; + *rc = PLPGSQL_RC_OK; + } + + /* + * otherwise, this is a labelled exit that does not match the + * current statement's label, if any: return RC_EXIT so that the + * EXIT continues to propagate up the stack. + */ + return true; + + case PLPGSQL_RC_CONTINUE: + if (estate->exitlabel == NULL) + { + /* anonymous continue, so re-run the loop */ + *rc = PLPGSQL_RC_OK; + } + else if (stmt->label != NULL && + strcmp(stmt->label, estate->exitlabel) == 0) + { + /* label matches named continue, so re-run loop */ + estate->exitlabel = NULL; + *rc = PLPGSQL_RC_OK; +
Re: [HACKERS] ToDo List Item - System Table Index Clustering
Excerpts from Robert Haas's message of mié ene 19 15:25:00 -0300 2011: > Oh, I see. I don't think we want to go there. We'd need some kind of > system for keeping the two places in sync. Maybe autogenerate both the .sgml and the postgres.description files from a single source. > And there'd be no easy way > to upgrade the in-database descriptions when we upgraded to a newer > minor release, supposing they'd changed in the meantime. I wouldn't worry about this issue. We don't do many catalog changes in minor releases anyway. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo List Item - System Table Index Clustering
Robert Haas writes: > On Tue, Jan 18, 2011 at 6:49 PM, Simone Aiken > wrote: >> Pages like this one have column comments for the system tables: >> >> http://www.psql.it/manuale/8.3/catalog-pg-attribute.html > Oh, I see. I don't think we want to go there. We'd need some kind of > system for keeping the two places in sync. I seem to recall some muttering about teaching genbki to extract such comments from the SGML sources or perhaps the C header files. I tend to agree though that it would be a lot more work than it's worth. And as you say, pg_description entries aren't free. Which brings up another point though. I have a personal TODO item to make the comments for operator support functions more consistent: http://archives.postgresql.org/message-id/21407.1287157...@sss.pgh.pa.us Should we consider removing those comments altogether, instead? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
On Tue, Jan 18, 2011 at 5:16 PM, Tomas Vondra wrote: > Yes, I was aware of this problem (amount of memory consumed with lots of > updates), and I kind of hoped someone will come up with a reasonable > solution. As far as I can see, periodically sampling some or all of the table is really the only thing that has a chance of working OK. You could try to track changes only when they're not too big, but I think that's still going to have nasty performance consequences. > I was thinking about it and I believe at least one of the algorithms > (the "adaptive sampling") might handle "merging" i.e. the backends would > not need to store the list of values, just a private copy of the > estimator and update it. And at the end (after commit), the estimator > would be merged back into the actual one. > > So the algorithm would be something like this: > > 1. create copy for all distinct estimators influenced by the INSERT > 2. update the local copy > 3. commit and merge the local copies back into the original estimator Well, maybe. But DELETEs seem like a problem. And it's still adding foreground work to every query, which I just have a hard time believing is ever going to work out > Regarding the crash scenario - if the commit fails, just throw away the > local estimator copy, it's not needed. I'm not sure how to take care of > the case when commit succeeds and the write of the merged estimator > fails, but I think it might be possible to write the estimator to xlog > or something like that. So it would be replayed during recovery etc. Or > is it a stupid idea? It's not stupid, in the sense that that is what you'd need to do if you want to avoid ever having to rescan the table. But it is another thing that I think is going to be way too expensive. > Yes, as I've mentioned above, the multi-column stats are the reason why > I started working on this. And yes, it really should work like this: > > 1. user realizes there's something wrong as the plans are bad > 2. after analysis, the user realizes the reason is an imprecise > estimate due to dependence between columns > 3. the user enables cross-column stats on the columns and checks > if it actually solved the issue > 4. if the cost outweights the gains, the user drops the stats > > Does that sound reasonable? Yes. The only caveat I'm trying to insert is that it's hard for me to see how the proposed approach could ever be cheap enough to be a win. If we need some kind of more expensive kind of ANALYZE that scans the whole table, and it's optional, sure, why not? But that's a one-time hit. You run it, it sucks, it's over, and now your queries work. Odds are good you may never need to touch it again. Now, if we can convince ourselves that multi-column stats are likely to require constant updating, then maybe there would be more to recommend the stream-based approach, although even then it seems dreadfully complex and expensive to me. But I bet these things are pretty static. If the city and state tend to imply the zip code when there are 10K rows in the table, they probably still will when there are 100K rows in the table. If users with org_id = 1 have a higher karma score on average than users with org_id != 1, that'll probably still be true when there are more users in both classes. Whether the database can understand that without rescanning the table depends on the data representation, of course, but I guess I'm just saying I'd think really, really hard before abandoning the idea of periodic sampling. You have to get an awful lot of benefit out of those cross-column stats to make it worth paying a run-time cost for them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending opfamilies for GIN indexes
Robert Haas writes: > On Wed, Jan 19, 2011 at 1:33 PM, Tom Lane wrote: >> AFAICS that means integrating contrib/intarray into core. Independently >> of whether that's a good idea or not, PG is supposed to be an extensible >> system, so it would be nice to have a solution that supported add-on >> extensions. > Yeah, I'm just wondering if it's worth the effort, especially in view > of a rather large patch queue we seem to have outstanding at the > moment. Oh, maybe we're not on the same page here: I wasn't really proposing to do this right now, it's more of a TODO item. Offhand the only reason to do it now would be if we settled on something that required a layout change in pg_amop/pg_amproc. Since we already have one such change in 9.1, getting the additional change done in the same release would be valuable to reduce the number of distinct cases for pg_dump and other clients to support. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Couple document fixes
Thom Brown wrote: > relkind in the same table is the same type, but isn't displayed as > "char" in the docs, and the same applies to many other system tables. > They would need changing too then. > > Examples are: > > pg_type.typtype > pg_proc.provolatile > pg_attribute.attstorage That's a good point. Consistency would trump getting a single entry right, for sure. I wonder, though, whether we shouldn't consistently distinguish them. For one thing, I've seen multiple posts where people were reporting "bugs" because of having confused char with "char". FWIW, \d shows: Table "pg_catalog.pg_class" Column | Type| Modifiers -+---+--- relname | name | not null relnamespace| oid | not null reltype | oid | not null reloftype | oid | not null relowner| oid | not null relam | oid | not null relfilenode | oid | not null reltablespace | oid | not null relpages| integer | not null reltuples | real | not null reltoastrelid | oid | not null reltoastidxid | oid | not null relhasindex | boolean | not null relisshared | boolean | not null relpersistence | "char"| not null relkind | "char"| not null relnatts| smallint | not null relchecks | smallint | not null relhasoids | boolean | not null relhaspkey | boolean | not null relhasexclusion | boolean | not null relhasrules | boolean | not null relhastriggers | boolean | not null relhassubclass | boolean | not null relfrozenxid| xid | not null relacl | aclitem[] | reloptions | text[]| Indexes: "pg_class_oid_index" UNIQUE, btree (oid) "pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace) Currently we don't seem to distinguish them in very many places in the docs: $ find -name '*.sgml' | xargs egrep -n '\"char\"' ./doc/src/sgml/textsearch.sgml:1271:setweight(vector tsvector, weight "char") returns tsvector ./doc/src/sgml/datatype.sgml:1116:length might change in a future release. The type "char" ./doc/src/sgml/datatype.sgml:1134: "char" ./doc/src/sgml/release-old.sgml:4406:Add routines for single-byte "char" type(Thomas) ./doc/src/sgml/release-old.sgml:4747:Make "char" type a synonym for "char(1)" (actually implemented as bpchar)(Thomas) ./doc/src/sgml/xfunc.sgml:1794: "char" ./doc/src/sgml/release-8.0.sgml:3389: "char" data type have been removed. ./doc/src/sgml/release-8.0.sgml:4460:"char" data type have been removed. ./doc/src/sgml/release-8.0.sgml:4466:to do arithmetic on a "char" column, you can cast it to ./doc/src/sgml/func.sgml:8462: setweight(tsvector, "char") ./doc/src/sgml/btree-gin.sgml:17: oid, money, "char", -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Use of O_DIRECT only for open_* sync options
Is there a reason we only use O_DIRECT with open_* sync options? xlogdefs.h says: /* * Because O_DIRECT bypasses the kernel buffers, and because we never * read those buffers except during crash recovery, it is a win to use * it in all cases where we sync on each write(). We could allow O_DIRECT * with fsync(), but because skipping the kernel buffer forces writes out * quickly, it seems best just to use it for O_SYNC. It is hard to imagine * how fsync() could be a win for O_DIRECT compared to O_SYNC and O_DIRECT. * Also, O_DIRECT is never enough to force data to the drives, it merely * tries to bypass the kernel cache, so we still need O_SYNC or fsync(). */ This seems wrong because fsync() can win if there are two writes before the sync call. Can kernels not issue fsync() if the write was O_DIRECT? If that is the cause, we should document it. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
Noah Misch writes: > On Wed, Jan 19, 2011 at 12:10:16PM -0500, Tom Lane wrote: >> In the meantime, the proposal at hand seems like a bit of a stop-gap, >> which is why I'd prefer to see something with a very minimal code >> footprint. Detoast at assignment would likely need only a few lines >> of code added in a single place. > What qualifies a patch as stop-gap scale? Pavel's patch is ~60 lines. Yeah, but they're spread out all over plpgsql, and seem likely to metastasize to other places --- the additional field that needs to be initialized is the main culprit. I'd like a one-spot patch that will be easy to remove when/if it's no longer needed. > If adding PLpgSQL_var.should_be_detoasted is your main pain point, testing > VARATT_IS_EXTENDED there might be the least-harmful way to avoid it. I thought about that too, but adding an additional set of tests into exec_eval_datum isn't free --- that's a hot spot for plpgsql execution. Doing it in exec_assign_value would be significantly cheaper, first because it's reasonable to assume that assignments are less frequent than reads, and second because there's already a test there to detect pass-by-ref datatypes, as well as a datumCopy() step that could be skipped altogether when we detoast. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending opfamilies for GIN indexes
On Wed, Jan 19, 2011 at 1:33 PM, Tom Lane wrote: > Robert Haas writes: >> On Wed, Jan 19, 2011 at 12:29 PM, Tom Lane wrote: >>> I think you missed the point: right now, to use both the core and >>> intarray operators on an integer[] column, you have to create *two* >>> GIN indexes, which will have exactly identical contents. I'm looking >>> for a way to let intarray extend the core opfamily definition so that >>> one index can serve. > >> Maybe this is a dumb question, but why not just put whatever stuff >> intarray[] adds directly into the core opfamily? > > AFAICS that means integrating contrib/intarray into core. Independently > of whether that's a good idea or not, PG is supposed to be an extensible > system, so it would be nice to have a solution that supported add-on > extensions. Yeah, I'm just wondering if it's worth the effort, especially in view of a rather large patch queue we seem to have outstanding at the moment. > The subtext here is that GIN, unlike the other index AMs, uses a > representation that seems pretty amenable to supporting a wide variety > of query types with a single index. contrib/intarray's "query_int" > operators are not at all like the subset-inclusion-testing operators > that the core opclass supports, and it's not very hard to think of > additional cases that could be of interest to somebody (example: find > all arrays that contain some/all entries within a given integer range). > I think we're going to come up against similar situations over and over > until we find a solution. Interesting. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending opfamilies for GIN indexes
Robert Haas writes: > On Wed, Jan 19, 2011 at 12:29 PM, Tom Lane wrote: >> I think you missed the point: right now, to use both the core and >> intarray operators on an integer[] column, you have to create *two* >> GIN indexes, which will have exactly identical contents. I'm looking >> for a way to let intarray extend the core opfamily definition so that >> one index can serve. > Maybe this is a dumb question, but why not just put whatever stuff > intarray[] adds directly into the core opfamily? AFAICS that means integrating contrib/intarray into core. Independently of whether that's a good idea or not, PG is supposed to be an extensible system, so it would be nice to have a solution that supported add-on extensions. The subtext here is that GIN, unlike the other index AMs, uses a representation that seems pretty amenable to supporting a wide variety of query types with a single index. contrib/intarray's "query_int" operators are not at all like the subset-inclusion-testing operators that the core opclass supports, and it's not very hard to think of additional cases that could be of interest to somebody (example: find all arrays that contain some/all entries within a given integer range). I think we're going to come up against similar situations over and over until we find a solution. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Couple document fixes
On 19 January 2011 18:11, Kevin Grittner wrote: > Thom Brown wrote: > >> I've attached a couple minor fixes to the docs. One relating to >> SECURITY LABEL and the other for pg_class.relpersistence > > relpersistence should be "char", not char. > Oddly enough, there is a difference. > > -Kevin relkind in the same table is the same type, but isn't displayed as "char" in the docs, and the same applies to many other system tables. They would need changing too then. Examples are: pg_type.typtype pg_proc.provolatile pg_attribute.attstorage -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo List Item - System Table Index Clustering
On Tue, Jan 18, 2011 at 6:49 PM, Simone Aiken wrote: > Pages like this one have column comments for the system tables: > > http://www.psql.it/manuale/8.3/catalog-pg-attribute.html Oh, I see. I don't think we want to go there. We'd need some kind of system for keeping the two places in sync. And there'd be no easy way to upgrade the in-database descriptions when we upgraded to a newer minor release, supposing they'd changed in the meantime. And some of the descriptions are quite long, so they wouldn't fit nicely in the amount of space you typically have available when you run \d+. And it would enlarge the size of an empty database by however much was required to store all those comments, which could be an issue for PostgreSQL instances that have many small databases. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Couple document fixes
Thom Brown wrote: > I've attached a couple minor fixes to the docs. One relating to > SECURITY LABEL and the other for pg_class.relpersistence relpersistence should be "char", not char. Oddly enough, there is a difference. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Couple document fixes
Hi, I've attached a couple minor fixes to the docs. One relating to SECURITY LABEL and the other for pg_class.relpersistence -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 doc_fixes.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
On Wed, Jan 19, 2011 at 12:10:16PM -0500, Tom Lane wrote: > Robert Haas writes: > > On Tue, Jan 18, 2011 at 5:22 PM, Pavel Stehule > > wrote: > >> opinion isn't strong in this topic. One or twenty useless detoasting > >> isn't really significant in almost use cases (problem is thousands > >> detoasting). > > > Yeah. Many-times-repeated detoasting is really bad, and this is not > > the only place in the backend where we have this problem. :-( > > Yeah, there's been some discussion of a more general solution, and I > think I even had a trial patch at one point (which turned out not to > work terribly well, but maybe somebody will have a better idea someday). > In the meantime, the proposal at hand seems like a bit of a stop-gap, > which is why I'd prefer to see something with a very minimal code > footprint. Detoast at assignment would likely need only a few lines > of code added in a single place. What qualifies a patch as stop-gap scale? Pavel's patch is ~60 lines. If adding PLpgSQL_var.should_be_detoasted is your main pain point, testing VARATT_IS_EXTENDED there might be the least-harmful way to avoid it. Saving a few more lines by moving the work to exec_assign_value probably does not justify the associated performance regressions Pavel has cited. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending opfamilies for GIN indexes
On Wed, Jan 19, 2011 at 12:29 PM, Tom Lane wrote: > Dimitri Fontaine writes: >> Tom Lane writes: >>> Oh, wait a minute: there's a bad restriction there, namely that a >>> contrib module could only add "loose" operators that had different >>> declared input types from the ones known to the core opclass. > >> I would have though that such contrib would then need to offer their own >> opfamily and opclasses, and users would have to use the specific opclass >> manually like they do e.g. for text_pattern_ops. Can't it work that way? > > I think you missed the point: right now, to use both the core and > intarray operators on an integer[] column, you have to create *two* > GIN indexes, which will have exactly identical contents. I'm looking > for a way to let intarray extend the core opfamily definition so that > one index can serve. Maybe this is a dumb question, but why not just put whatever stuff intarray[] adds directly into the core opfamily? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
On Wed, Jan 19, 2011 at 12:10 PM, Tom Lane wrote: > Robert Haas writes: >> On Tue, Jan 18, 2011 at 5:22 PM, Pavel Stehule >> wrote: >>> opinion isn't strong in this topic. One or twenty useless detoasting >>> isn't really significant in almost use cases (problem is thousands >>> detoasting). > >> Yeah. Many-times-repeated detoasting is really bad, and this is not >> the only place in the backend where we have this problem. :-( > > Yeah, there's been some discussion of a more general solution, and I > think I even had a trial patch at one point (which turned out not to > work terribly well, but maybe somebody will have a better idea someday). I'm pretty doubtful that there's going to be a general solution to this problem - I think it's going to require gradual refactoring of problem spots. > In the meantime, the proposal at hand seems like a bit of a stop-gap, > which is why I'd prefer to see something with a very minimal code > footprint. Detoast at assignment would likely need only a few lines > of code added in a single place. OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending opfamilies for GIN indexes
Dimitri Fontaine writes: > Tom Lane writes: >> Oh, wait a minute: there's a bad restriction there, namely that a >> contrib module could only add "loose" operators that had different >> declared input types from the ones known to the core opclass. > I would have though that such contrib would then need to offer their own > opfamily and opclasses, and users would have to use the specific opclass > manually like they do e.g. for text_pattern_ops. Can't it work that way? I think you missed the point: right now, to use both the core and intarray operators on an integer[] column, you have to create *two* GIN indexes, which will have exactly identical contents. I'm looking for a way to let intarray extend the core opfamily definition so that one index can serve. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On Wed, Jan 19, 2011 at 11:52 AM, Tom Lane wrote: > Robert Haas writes: >> ... So what we >> want to do is write a percentage of them, in a way that guarantees >> that they'll all eventually get written if people continue to access >> the same data. > > The word "guarantee" seems quite inappropriate here, since as far as I > can see this approach provides no such guarantee --- even after many > cycles you'd never be really certain all the bits were set. > > What I asked for upthread was that we continue to have some > deterministic, practical way to force all hint bits in a table to be > set. This is not *remotely* responding to that request. It's still not > deterministic, and even if it were, vacuuming a large table 20 times > isn't a very practical solution. I get the impression you haven't spent as much time reading my email as I spent writing it. Perhaps I'm wrong, but in any case the code doesn't do what you're suggesting. In the most recently posted version of this patch, which is v2, if VACUUM hits a page that is hint-bit-dirty, it always writes it. Full stop. The "20 times" bit applies to a SELECT * FROM table, which is a rather different case. As I write this, I realize that there is a small fly in the ointment here, which is that neither VACUUM nor SELECT force out all the pages they modify to disk. So there is some small amount of remaining nondeterminism, even if you VACUUM, because VACUUM will leave the last few pages it dirties in shared_buffers, and whether those hint bits hit the disk will depend on a decision made at the time they're evicted, not at the time they were dirtied. Possibly I could fix that by making SetBufferCommitInfoNeedsSave() set BM_DIRTY during vacuum and BM_HINT_BITS at other times. That would nail the lid shut pretty tight. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers