[HACKERS] CREATE FUNCTION .. LEAKPROOF docs
I think the docs for the LEAKPROOF option in create_function.sgml ought to mention RLS as well as security barrier views. Also the current text is no longer strictly correct in light of commit dcbf5948e12aa60b4d6ab65b6445897dfc971e01. Suggested rewording attached. Regards, Dean diff --git a/doc/src/sgml/ref/create_function.sgml b/doc/src/sgml/ref/create_function.sgml new file mode 100644 index c5beb16..cc2098c --- a/doc/src/sgml/ref/create_function.sgml +++ b/doc/src/sgml/ref/create_function.sgml @@ -350,9 +350,18 @@ CREATE [ OR REPLACE ] FUNCTION effects. It reveals no information about its arguments other than by its return value. For example, a function which throws an error message for some argument values but not others, or which includes the argument - values in any error message, is not leakproof. The query planner may - push leakproof functions (but not others) into views created with the - literalsecurity_barrier/literal option. See + values in any error message, is not leakproof. This affects how the + system executes queries against views created with the + literalsecurity_barrier/literal option or tables with row level + security enabled. The system will enforce conditions from security + policies and security barrier views before any user-supplied conditions + from the query itself that contain non-leakproof functions, in order to + prevent the inadvertent exposure of data. Functions and operators + marked as leakproof are assumed to be trustworthy, and may be executed + before conditions from security policies and security barrier views. + In addtion, functions which do not take arguments or which are not + passed any arguments from the security barrier view or table do not have + to be marked as leakproof to be executed before security conditions. See xref linkend=sql-createview and xref linkend=rules-privileges. This option can only be set by the superuser. /para -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5 release notes
On 11 June 2015 at 05:15, Bruce Momjian br...@momjian.us wrote: I have committed the first draft of the 9.5 release notes. You can view the output here: http://momjian.us/pgsql_docs/release-9-5.html I think it's worth mentioning dcbf5948e12aa60b4d6ab65b6445897dfc971e01, probably under General Performance. It's an optimisation, and also a user-visible change to the way LEAKPROOF works. Perhaps something like Allow pushdown of non-leakproof functions into security barrier views if the function is not passed any arguments from the view. Previously only functions marked as LEAKPROOF could be pushed down into security barrier views. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] s_lock() seems too aggressive for machines with many sockets
The whole thing turns out to be based on wrong baseline data, taken with a pgbench client running from a remote machine. It all started out from an investigation against 9.3. Curiously enough, the s_lock() problem that existed in 9.3 has a very similar effect on throughput as a network bottleneck has on 9.5. Sorry for the noise, Jan On 06/10/2015 09:18 AM, Jan Wieck wrote: Hi, I think I may have found one of the problems, PostgreSQL has on machines with many NUMA nodes. I am not yet sure what exactly happens on the NUMA bus, but there seems to be a tipping point at which the spinlock concurrency wreaks havoc and the performance of the database collapses. On a machine with 8 sockets, 64 cores, Hyperthreaded 128 threads total, a pgbench -S peaks with 50-60 clients around 85,000 TPS. The throughput then takes a very sharp dive and reaches around 20,000 TPS at 120 clients. It never recovers from there. The attached patch demonstrates that less aggressive spinning and (much) more often delaying improves the performance on this type of machine. The 8 socket machine in question scales to over 350,000 TPS. The patch is meant to demonstrate this effect only. It has a negative performance impact on smaller machines and client counts #cores, so the real solution will probably look much different. But I thought it would be good to share this and start the discussion about reevaluating the spinlock code before PGCon. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5 release notes
* Bruce Momjian (br...@momjian.us) wrote: On Sat, Jun 13, 2015 at 11:30:30PM -0400, Bruce Momjian wrote: Note that I guess MauMau is a nickname. I think we are fine consistenly putting Japanese last names first _if_ we always capitalize the last name, e.g. FUJITA Etsuro --- is that a good idea? Does everyone like that? Does any other country want that? OK, new idea. What about, instead of having the last name be all-caps, we have the last name start with an uppercase letter, then use smallcaps for the rest of the last name: https://en.wikipedia.org/wiki/Small_caps That way, the last name will not appear too large, but will be clear as something different from other names. Peter, I assume small-caps is possible. I realize this might come across as overkill, but, well, we actually have this database system for anyone who has an account on postgresql.org, the wiki, been to one of the conferences that uses the PG auth system, etc. Perhaps we just need to add a display as field or similar to that system? Or have a check-box for each individual to indicate if they'd prefer to use 'First Last' or 'LAST First'. We already track first and last names as different fields. We also have a unique username field and we might be able to come up with a system along the lines of 'user:username' which is then replaced by whatever is in the display as field in the database. My thought would be for that to be a one-time thing which is then reviewed and then the release notes are published as they are today. I don't think we'd want to dynamically change the release notes based on this on the actual webpage, for various reasons. My 2c on the whole thing is that it should be up to the individual and not based on country, as I can imagine individuals who woluld be annoyed if their name started showing up in the release notes as 'FROST Stephen' simply because they moved to Japan (no, I don't have any intentions of doing so, just saying). Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Collection of memory leaks for ECPG driver
On Sat, Jun 13, 2015 at 6:25 PM, Michael Meskes mes...@postgresql.org wrote: On Sat, Jun 13, 2015 at 12:02:40AM -0400, Tom Lane wrote: But having said that, I would not be in a hurry to remove any existing if-guards for the case. For one thing, it makes the code look more similar to backend code that uses palloc/pfree, where we do *not* allow pfree(NULL). There's also the point that changing longstanding code creates back-patching hazards, so unless there's a clear gain it's best not to. Makes sense, but there is no point in adding hos if-guards to old code that doesn't have it either,right? In any case, whatever the final decision done here, I just wanted to point out that there is still a leak in connect.c. Per se the attached patch, that does not check for a NULL pointer before ecpg_free because other code paths in the routine patched don't do so. So you get something locally consistent ;) -- Michael diff --git a/src/interfaces/ecpg/ecpglib/connect.c b/src/interfaces/ecpg/ecpglib/connect.c index 55c5680..e45d17f 100644 --- a/src/interfaces/ecpg/ecpglib/connect.c +++ b/src/interfaces/ecpg/ecpglib/connect.c @@ -321,7 +321,10 @@ ECPGconnect(int lineno, int c, const char *name, const char *user, const char *p } if ((this = (struct connection *) ecpg_alloc(sizeof(struct connection), lineno)) == NULL) + { + ecpg_free(dbname); return false; + } if (dbname != NULL) { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5 release notes
On Sat, Jun 13, 2015 at 11:30:30PM -0400, Bruce Momjian wrote: Note that I guess MauMau is a nickname. I think we are fine consistenly putting Japanese last names first _if_ we always capitalize the last name, e.g. FUJITA Etsuro --- is that a good idea? Does everyone like that? Does any other country want that? OK, new idea. What about, instead of having the last name be all-caps, we have the last name start with an uppercase letter, then use smallcaps for the rest of the last name: https://en.wikipedia.org/wiki/Small_caps That way, the last name will not appear too large, but will be clear as something different from other names. Peter, I assume small-caps is possible. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git push hook to check for outdated timestamps
On Fri, Jun 12, 2015 at 4:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Dunstan and...@dunslane.net writes: On 06/12/2015 09:31 AM, Robert Haas wrote: Could we update our git hook to refuse a push of a new commit whose timestamp is more than, say, 24 hours in the past? Our commit history has some timestamps in it now that are over a month off, and it's really easy to do, because when you rebase a commit, it keeps the old timestamp. If you then merge or cherry-pick that into an official branch rather than patch + commit, you end up with this problem unless you are careful to fix it by hand. It would be nice to prevent further mistakes of this sort, as they create confusion. I think 24 hours is probably fairly generous, Yeah ... if we're going to try to enforce a linear-looking history, ISTM the requirement ought to be newer than the latest commit on the same branch. Perhaps that would be unduly hard to implement though. From a quick look at our existing script, I think that's doable, but I'd have to do some more detailed verification before I'm sure. And we'd have to figure out some way to deal with a push with multiple commits in it, but it should certainly be doable if the first one is. Would we in that case want to enforce linearity *and* recent-ness, or just linearity? as in, do we care about the commit time even if it doesn't change the order? FWIW, our git_changelog script tries to avoid this problem by paying attention to CommitDate not Date. But I agree that it's confusing when those fields are far apart. That brings it back to the enforcing - would we want to enforce both those? (And to confuse it even more, Date gets renamed to AuthorDate when you do a full log.. But AFAIK it's the same date, they just change the name of it) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] 9.5 release notes
On 2015-06-14 03:02, Bruce Momjian wrote: On Sat, Jun 13, 2015 at 05:47:59AM -0300, Alvaro Herrera wrote: Petr Jelinek wrote: Hi, + listitem + para +Add typeJSONB/ functions functionjsonb_set()/ and +functionjsonb_pretty/ (Dmitry Dolgov, Andrew Dunstan) + /para + /listitem I believe I should be 3rd author for this one as I rewrote large parts of this functionality as part of the review. Also, it looks like we could spell your last name with an accent, assuming the i-acute character in Latin1 is fine. Ah, you are correct. I found a few commits that did have that accent. All _seven_ of Petr's 9.5 commit mentions updated to add the accent. :-) Thanks. Yes i-accute is correct, thanks. I don't really mind my name being without the accent either though :) -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Entities created in one query not available in another in extended protocol
Tom, I agree this is entirely a client-side issue. Regardless, as Simon says it would be useful to have some documentation for client implementors. Sehrope, thanks for the JDBC link! I was actually thinking of going about it another way in Npgsql: 1. Send messages normally until the first Execute message is sent. 2. From that point on, socket writes should simply be non-blocking. As long as buffers aren't full, there's no issue, we continue writing. The moment a non-blocking write exits because it would block, we transfer control to the user, who can now read data from queries (the ADO.NET.API allows for multiple resultsets). 3. When the user finishes processing the resultsets, control is transferred back to Npgsql which continues sending messages (back to step 1). This approach has the advantage of not caring about buffer sizes or trying to assume how many bytes are sent by the server: we simply write as much as we can without blocking, then switch to reading until we've exhausted outstanding data, and back to writing. The main issue I'm concerned about is SSL/TLS, which is a layer on top of the sockets and which might not work well with non-blocking sockets... Any comments? Shay On Sat, Jun 13, 2015 at 5:08 AM, Simon Riggs si...@2ndquadrant.com wrote: On 12 June 2015 at 20:06, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 11 June 2015 at 22:12, Shay Rojansky r...@roji.org wrote: Just in case it's interesting to you... The reason we implemented things this way is in order to avoid a deadlock situation - if we send two queries as P1/D1/B1/E1/P2/D2/B2/E2, and the first query has a large resultset, PostgreSQL may block writing the resultset, since Npgsql isn't reading it at that point. Npgsql on its part may get stuck writing the second query (if it's big enough) since PostgreSQL isn't reading on its end (thanks to Emil Lenngren for pointing this out originally). That part does sound like a problem that we have no good answer to. Sounds worth starting a new thread on that. I do not accept that the backend needs to deal with that; it's the responsibility of the client side to manage buffering properly if it is trying to overlap sending the next query with receipt of data from a previous one. See commit 2a3f6e368 for a related issue in libpq. Then it's our responsibility to define what manage buffering properly means and document it. People should be able to talk to us without risk of deadlock. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
[HACKERS] Tab completion for TABLESAMPLE
Hi, looks like I omitted psql tab completion from the TABLESAMPLE patch. The attached patch adds it. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index b9f5acc..5801a06 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -737,6 +737,11 @@ static const SchemaQuery Query_for_list_of_matviews = { FROM pg_catalog.pg_event_trigger \ WHERE substring(pg_catalog.quote_ident(evtname),1,%d)='%s' +#define Query_for_list_of_tablesample_methods \ + SELECT pg_catalog.quote_ident(tsmname) \ + FROM pg_catalog.pg_tablesample_method \ + WHERE substring(pg_catalog.quote_ident(tsmname),1,%d)='%s' + /* * This is a list of all things in Pgsql, which can show up after CREATE or * DROP; and there is also a query to get a list of them. @@ -3578,6 +3583,10 @@ psql_completion(const char *text, int start, int end) prev2_wd[0] == '\0') COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_relations, NULL); +/* TABLESAMPLE */ + else if (pg_strcasecmp(prev_wd, TABLESAMPLE) == 0) + COMPLETE_WITH_QUERY(Query_for_list_of_tablesample_methods); + /* TRUNCATE */ else if (pg_strcasecmp(prev_wd, TRUNCATE) == 0) COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with CREATE TABLE ... (LIKE ... INCLUDING INDEXES)
On 14 June 2015 at 04:25, Michael Paquier michael.paqu...@gmail.com wrote: On Sun, Jun 14, 2015 at 11:38 AM, Thom Brown t...@linux.com wrote: As you can see, 3 indexes are missing, which happen to be ones that would duplicate the column definition of another index. Is this intentional? If so, shouldn't it be documented behaviour? Looking at the code (transformIndexConstraints in parse_utilcmd.c), this is intentional behavior: /* * Scan the index list and remove any redundant index specifications. This * can happen if, for instance, the user writes UNIQUE PRIMARY KEY. A * strict reading of SQL would suggest raising an error instead, but that * strikes me as too anal-retentive. - tgl 2001-02-14 * * XXX in ALTER TABLE case, it'd be nice to look for duplicate * pre-existing indexes, too. */ Per this commit: commit: c7d2ce7bc6eb02eac0c10fae9caf2936a71ad25c author: Tom Lane t...@sss.pgh.pa.us date: Wed, 14 Feb 2001 23:32:38 + Repair problems with duplicate index names generated when CREATE TABLE specifies redundant UNIQUE conditions. Perhaps a mention in the docs in the page of CREATE TABLE would be welcome. Something like Redundant index definitions are ignored with INCLUDING INDEXES. Thoughts? The commit refers to duplicate index names, and only for UNIQUE indexes. This behaviour is beyond that. And how does it determine which index to copy? In my example, I placed an index in a different tablespace. That could be on a drive with very different read/write characteristics than the default tablespace (seek latency/sequential read rate/write speed etc.) and possibly with different GUC parameters, but there's no way for us to determine if this is the case, so Postgres can easily remove the more performant one. -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On columnar storage
On 2015-06-11 20:03:16 -0300, Alvaro Herrera wrote: Rewriter Parsing occurs as currently. During query rewrite, specifically at the bottom of the per-relation loop in fireRIRrules(), we will modify the query tree: each relation RTE containing a colstore will be replaced with a JoinExpr containing the relation as left child and the colstore as right child (1). The colstore RTE will be of a new RTEKind. For each such change, all Var nodes that point to attnums stored in the colstore will modified so that they reference the RTE of the colstore instead (2). FWIW, I think this is a pretty bad place to tackle this. For one I think we shouldn't add more stuff using the rewriter unless it's clearly the best interface. For another, doing things in the rewriter will make optimizing things much harder - the planner will have to reconstruct knowledge which of the joins are column store joins and such. Why do you want to do things there? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On columnar storage
Andres Freund and...@anarazel.de writes: On 2015-06-11 20:03:16 -0300, Alvaro Herrera wrote: Parsing occurs as currently. During query rewrite, specifically at the bottom of the per-relation loop in fireRIRrules(), we will modify the query tree: each relation RTE containing a colstore will be replaced with a JoinExpr containing the relation as left child and the colstore as right child (1). The colstore RTE will be of a new RTEKind. For each such change, all Var nodes that point to attnums stored in the colstore will modified so that they reference the RTE of the colstore instead (2). FWIW, I think this is a pretty bad place to tackle this. For one I think we shouldn't add more stuff using the rewriter unless it's clearly the best interface. For another, doing things in the rewriter will make optimizing things much harder - the planner will have to reconstruct knowledge which of the joins are column store joins and such. As a comparison point, one of my Salesforce colleagues just put in a somewhat similar though single-purpose thing, to expand what originally is a simple table reference into a join (against a system catalog that's nowhere mentioned in the original query). In our case, we wanted to force a scan on a large table to have a constraint on the leading primary key column; if the query has such a constraint already, then fine, else create one by joining to a catalog that lists the allowed values of that column. We started out by trying to do it in the rewriter, and that didn't work well at all. We ended up actually doing it at createplan.c time, which is conceptually ugly, but there was no good place to do it earlier without duplicating a lot of indexqual analysis. But the thing that made that painful was that the transformation was optional, and indeed might happen or not happen for a given query depending on the selected plan shape. AFAICT the transformation Alvaro is proposing is unconditional, so it might be all right to do it in the rewriter. As you say, if the planner needs to reconstruct what happened, that would be a strike against this way, but it's not clear from here whether any additional info is needed beyond the already-suggested extra RTEKind. Another model that could be followed is expansion of inheritance-tree references, which happens early in the planner. In that case the planner does keep additional information about what it did (the appendrel data structures), so that could be a good model if this code needs to do likewise. The existing join-alias-var flattening logic in the planner might be of interest as well for the variable-substitution business, which I suspect is the main reason Alvaro is proposing doing it in the rewriter. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On columnar storage
Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Won't this cause issues to MergeAppend optimizations? Like what? Well, as I understand, MergeAppend needs to know the sort order of the child node, right? But that's available only on the relation RTE, not on the colstore-join RTE. Uh, what? Sort order is a property of a path, not an RTE. And we have always understood which join types preserve sort order. And if there are such issues, why do you think you wouldn't be expected to solve them? Precisely. If I simply reject having column stores in partitioned tables, then I don't *need* to solve them. You misunderstood the thrust of my comment, which basically is that I doubt anyone will think that rejecting that combination is an acceptable implementation restriction. It might be all right if it doesn't work very well in v0, but not if the implementation is designed so that it can never be fixed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with CREATE TABLE ... (LIKE ... INCLUDING INDEXES)
Thom Brown t...@linux.com writes: The commit refers to duplicate index names, and only for UNIQUE indexes. This behaviour is beyond that. And how does it determine which index to copy? In my example, I placed an index in a different tablespace. That could be on a drive with very different read/write characteristics than the default tablespace (seek latency/sequential read rate/write speed etc.) and possibly with different GUC parameters, but there's no way for us to determine if this is the case, so Postgres can easily remove the more performant one. TBH, I have no particular concern for this argument. If you created duplicate indexes you did a dumb thing anyway; you should not be expecting that the system's response to that situation will be remarkably intelligent. As the comment indicates, the code in question is really only meant to deal with a specific kind of redundancy we'd observed in real-world CREATE TABLE commands. It's probably accidental that it gets applied in CREATE TABLE LIKE cases, but it doesn't bother me that it is. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_resetsysid
Hi, This is an attempt to revive the resetsysid tool submission that didn't go anywhere last year [1]. In that thread there were mainly two complains, one that pg_resetxlog would preferably be tool for more advanced stuff and similar utilities that are not as dangerous should have separate binaries using some of the functionality of pg_resetxlog as library and the other was with the string to uint64 parsing which is apparently not so easy to do portably. About the second one there was also separate discussion between Andres and Tom [2]. Here is series of patches that implements both the original feature and solves the issues mentioned above. 0001 - This patch splits the code of pg_resetxlog to common.c/h api and pg_resetxlog.c itself which does the program logic using the api. The only logical change in the XLogSegNoOffsetToRecPtr call is moved into main() function, which I think is ok as the main() does all the other ControlFile changes. All other changes should be just cosmetic in order to not rely on global static variables. 0002 - Adds pg_resetsysid utility which changes the system id to newly generated one. 0003 - Adds -s option to pg_resetxlog to change the system id to the one specified - this is separate from the other one as it can be potentially more dangerous. It also adds some defines that try to portably map pg_strtoint64 and pg_strtouint64 to the underlying implementations (strtol/strtoll/_strtoi64 and unsigned variants of those). It will leave the pg_strtoint64/pg_strtouint64 undefined if the platform does not support the functionality. Tom was unsure if there are such platforms on our supported list anymore. Which is why I also added 0004, that adds fall-back implementation of the two functions based on very slightly modified code from OpenBSD. This is here mostly in case buildfarm gets red after applying 0003 or we have reports from users that things got broken. [1] http://www.postgresql.org/message-id/539b97fc.8040...@2ndquadrant.com [2] http://www.postgresql.org/message-id/20140603144654.gm24...@awork2.anarazel.de -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From f587fd47c9e43f5c08cf840978ff21b64f39d24d Mon Sep 17 00:00:00 2001 From: Petr Jelinek pjmo...@pjmodos.net Date: Sun, 14 Jun 2015 16:23:39 +0200 Subject: [PATCH 4/4] Add fallback implementations of pg_strto(u)int64. --- src/include/c.h | 6 ++ src/port/Makefile| 2 +- src/port/pgstrtoint64.c | 169 +++ src/port/pgstrtouint64.c | 129 4 files changed, 305 insertions(+), 1 deletion(-) create mode 100644 src/port/pgstrtoint64.c create mode 100644 src/port/pgstrtouint64.c diff --git a/src/include/c.h b/src/include/c.h index 537b4d0..a01b455 100644 --- a/src/include/c.h +++ b/src/include/c.h @@ -1102,6 +1102,9 @@ extern int fdatasync(int fildes); #define pg_strtoint64 strtoll #elif defined(WIN32) #define pg_strtoint64 _strtoi64 +#else +extern int64 pg_strtoint64(const char *nptr, char **endptr, int base); +#define PGSTRTOINT64_IMPL 1 #endif /* Define portable pg_strtouint64() */ @@ -,6 +1114,9 @@ extern int fdatasync(int fildes); #define pg_strtouint64 strtoull #elif defined(WIN32) #define pg_strtouint64 _strtoui64 +#else +extern uint64 pg_strtouint64(const char *nptr, char **endptr, int base); +#define PGSTRTOUINT64_IMPL 1 #endif /* diff --git a/src/port/Makefile b/src/port/Makefile index bc9b63a..91e4a2b 100644 --- a/src/port/Makefile +++ b/src/port/Makefile @@ -32,7 +32,7 @@ LIBS += $(PTHREAD_LIBS) OBJS = $(LIBOBJS) $(PG_CRC32C_OBJS) chklocale.o erand48.o inet_net_ntop.o \ noblock.o path.o pgcheckdir.o pgmkdirp.o pgsleep.o \ - pgstrcasecmp.o pqsignal.o \ + pgstrcasecmp.o pgstrtoint64.o pgstrtouint64.o pqsignal.o \ qsort.o qsort_arg.o quotes.o sprompt.o tar.o thread.o # foo_srv.o and foo.o are both built from foo.c, but only foo.o has -DFRONTEND diff --git a/src/port/pgstrtoint64.c b/src/port/pgstrtoint64.c new file mode 100644 index 000..e968df0 --- /dev/null +++ b/src/port/pgstrtoint64.c @@ -0,0 +1,169 @@ +/*- + * + * pgstrtoint64.c + * + * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/port/pgstrtoint64.c + * + * Based on OpenBSD strtoll() implementation. + * Differences from vanilla implementaion: + * - renamed to pg_strtoint64 + * - returns int64 instead of long long + * - internal params defined as int64 instead of long long + * - uses PG_INT64_MIN/PG_INT64_MAX + * + * The OpenBSD copyright terms follow. + */ + +/* $OpenBSD: strtoll.c,v 1.7 2013/03/28 18:09:38 martynas Exp $ */ +/*- + * Copyright (c) 1992 The Regents of the University of California. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or
Re: [HACKERS] On columnar storage
Hi, On 06/13/15 00:07, Michael Nolan wrote: On Thu, Jun 11, 2015 at 7:03 PM, Alvaro Herrera alvhe...@2ndquadrant.com mailto:alvhe...@2ndquadrant.com wrote: We hope to have a chance to discuss this during the upcoming developer unconference in Ottawa. Here are some preliminary ideas to shed some light on what we're trying to do. I've been trying to figure out a plan to enable native column stores (CS or colstore) for Postgres. Motivations: * avoid the 32 TB limit for tables * avoid the 1600 column limit for tables * increased performance Are you looking to avoid all hardware-based limits, or would using a 64 bit row pointer be possible? That would give you 2^64 or 1.8 E19 unique rows over whatever granularity/uniqueness you use (per table, per database, etc.) -- Mike Nolan. I don't think the number of tuples is the main problem here, it's the number of pages a single relation can have. Looking at the numbers of rows as a direct function of TID size is misleading, because the TID is split into two fixed parts - page number (32b) and tuple number (16b). For the record, 2^48 is 281,474,976,710,656 which ought to be enough for anybody, but we waste large part of that because we assume there might be up to 2^16 tuples per page, although the actual limit is way lower (~290 for 8kB pages, and ~1200 for 32kB pages. So we can only have ~4 billion pages, which is where the 32TB limit comes from (with 32kB pages it's 128TB). Longer TIDs are one a straightforward way to work around this limit, assuming you add the bits to the 'page number' field. Adding 16 bits (thus using 64-bit pointers) would increase the limit 2^16-times to about 2048 petabytes (with 8kB pages). But that of course comes with a cost, because you have to keep those larger TIDs in indexes etc. Another option might be to split the 48 bits differently, by moving 5 bits to the page number part of TID (so that we expect ~2048 tuples per page at most). That'd increase the limit to 1PB (4PB with 32kB pages). The column store approach is somehow orthogonal to this, because it splits the table vertically into multiple pieces, each stored in a separate relfilenode and thus using a separate sequence of page numbers. And of course, the usual 'horizontal' partitioning has a very similar effect (separate filenodes). regards -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On columnar storage
Robert Haas wrote: On Thu, Jun 11, 2015 at 7:03 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I've been trying to figure out a plan to enable native column stores (CS or colstore) for Postgres. Motivations: * avoid the 32 TB limit for tables * avoid the 1600 column limit for tables * increased performance To me, it feels like there are two different features here that would be better separated. First, there's the idea of having a table that gets auto-joined to other tables whenever you access it, so that the user sees one really wide table but really the data is segregated by column groups under the hood. That's a neat idea. Thanks. (It also seems pretty tricky to implement.) Second, there's the idea of a way of storing tuples that is different from PostgreSQL's usual mechanism - i.e. a storage manager API. I understand your concerns about going through the FDW API so maybe that's not the right way to do it, but it seems to me that in the end you are going to end up with something awfully similar to that + a local relfilenode + WAL logging support. I'm not clear on why you want to make the column store API totally different from the FDW API; there may be a reason, but I don't immediately see it. I just don't see that the FDW API is such a good fit for what I'm trying to do. Anything using the FDW API needs to implement its own visibility checking, for instance. I want to avoid that, because it's its own complex problem. Also, it doesn't look like the FDW API supports things that I want to do (neither my proposed LateColumnMaterialization nor my proposed BitmapColumnScan). I would have to extend the FDW API, then contort my stuff so that it fits in the existing API; then I will need to make sure that existing FDWs are not broken by the changes I would propose elsewhere. Round peg, square hole is all I see here. All in all, this seems too much additional work, just to make to things that are really addressing different problems go through the same code. You're correct about local WAL logging. We will need a solution to that problem. I was hoping to defer that until we had something like Alexander Korotkov's proposed pluggable WAL stuff. Each of these two features is independently useful. If you just had the first feature, you could use the existing table format as your columnar store. I'm sure it's possible to do a lot better in some cases, but there could easily be cases where that's a really big win, because the existing table format has far more sophisticated indexing capabilities than any columnar store is likely to have in an early version. Yeah, sure, it's pretty likely that the first experimental colstore implementation will just be based on existing infrastructure. The second capability, of course, opens up all sorts of interesting possibilities, like compressed read-only storage or index-organized tables. And it would also let you have an all-columnar table, similar to what Citus's cstore_fdw does, which doesn't seem like it would be supported in your design, and could be a pretty big win for certain kinds of tables. Well, I would like to know about those use cases that Citus stuff is good at, so that we can make sure they work reasonably under my proposed design. Maybe I have to require vacuuming so that the all-visible bits are set, so that a column scan can skip visibility checking for most of the underlying heap tuples to produce a large aggregation report. That seems pretty reasonable to me. BTW, I'm not sure if it's a good idea to call all of this stuff cstore. The name is intuitive, but cstore_fdw has enough traction already that it may create some confusion. I'm not thinking of calling anything user-visible with the name cstore. It's just a development term. Surely we're not reserving names from what is used in third party code. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5 release notes
Bruce Momjian br...@momjian.us writes: OK, new idea. What about, instead of having the last name be all-caps, we have the last name start with an uppercase letter, then use smallcaps for the rest of the last name: https://en.wikipedia.org/wiki/Small_caps That way, the last name will not appear too large, but will be clear as something different from other names. Peter, I assume small-caps is possible. FWIW, I vote strongly against having any contributor names in caps in the release notes. It would be visually distracting, and it would make the name look like the most important thing in the entry, while in point of fact it's the *least* important. (Maybe not to the contributor, but certainly to anybody else.) For pretty much the same reason, I'm not in favor of small caps either. Even assuming we can do that consistently (which I bet we can't; we do not have all that much control over how web browsers render HTML), it would be calling attention to itself, which is exactly not the result I think we should be after. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On columnar storage
On 06/14/15 17:22, Alvaro Herrera wrote: Robert Haas wrote: ,,, Second, there's the idea of a way of storing tuples that is different from PostgreSQL's usual mechanism - i.e. a storage manager API. I understand your concerns about going through the FDW API so maybe that's not the right way to do it, but it seems to me that in the end you are going to end up with something awfully similar to that + a local relfilenode + WAL logging support. I'm not clear on why you want to make the column store API totally different from the FDW API; there may be a reason, but I don't immediately see it. I just don't see that the FDW API is such a good fit for what I'm trying to do. Anything using the FDW API needs to implement its own visibility checking, for instance. I want to avoid that, because it's its own complex problem. Also, it doesn't look like the FDW API supports things that I want to do (neither my proposed LateColumnMaterialization nor my proposed BitmapColumnScan). I would have to extend the FDW API, then contort my stuff so that it fits in the existing API; then I will need to make sure that existing FDWs are not broken by the changes I would propose elsewhere. Round peg, square hole is all I see here. All in all, this seems too much additional work, just to make to things that are really addressing different problems go through the same code. You're correct about local WAL logging. We will need a solution to that problem. I was hoping to defer that until we had something like Alexander Korotkov's proposed pluggable WAL stuff. Probably worth mentioning we don't expect the column store API to be used only by extensions, but even by code from within the core which can use the current WAL infrastructure etc. -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On columnar storage
Andres Freund wrote: On 2015-06-11 20:03:16 -0300, Alvaro Herrera wrote: Rewriter Parsing occurs as currently. During query rewrite, specifically at the bottom of the per-relation loop in fireRIRrules(), we will modify the query tree: each relation RTE containing a colstore will be replaced with a JoinExpr containing the relation as left child and the colstore as right child (1). The colstore RTE will be of a new RTEKind. For each such change, all Var nodes that point to attnums stored in the colstore will modified so that they reference the RTE of the colstore instead (2). FWIW, I think this is a pretty bad place to tackle this. For one I think we shouldn't add more stuff using the rewriter unless it's clearly the best interface. For another, doing things in the rewriter will make optimizing things much harder - the planner will have to reconstruct knowledge which of the joins are column store joins and such. What do you mean reconstruct? That bit will be explicit -- I'm thinking the planner will have specialized bits to deal with such nodes, i.e. it will know there's a one-to-one relationship between colstore tuples and heap tuples, so it will know how to move nodes around. Or at least, that's how I'm hoping it will be ... Why do you want to do things there? Because I see no better place. Isn't the rewriter the place where we modify the query tree, mostly? Another choice I considered was subquery_planner: in the spot where expand_inherited_tables() is called, add a new call to expand the RTEs that correspond to tables containing stores. But I had second thoughts because that function says it's safe because it only adds baserels, not joins, and I'm adding joins. I guess I could use a spot earlier than wherever it is that joins are checked, but this planner code is recursive anyway and for this I must do a single pass. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5 release notes
On Sun, Jun 14, 2015 at 01:57:08PM -0300, Alvaro Herrera wrote: Bruce Momjian wrote: On Sat, Jun 13, 2015 at 05:47:59AM -0300, Alvaro Herrera wrote: Also, it looks like we could spell your last name with an accent, assuming the i-acute character in Latin1 is fine. Ah, you are correct. I found a few commits that did have that accent. All _seven_ of Petr's 9.5 commit mentions updated to add the accent. :-) Thanks. ... though two of them now say Petr Petr. Thanks, fixed. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On columnar storage
Tom Lane wrote: I wrote: Another model that could be followed is expansion of inheritance-tree references, which happens early in the planner. Actually ... if you intend to allow column storage to work with inherited tables (and if you don't, you'd better have a darn good reason why not), I think you probably want to do this join insertion *after* inheritance expansion, so you can join child column stores only to the appropriate child heap table, and not to the entire inheritance tree. Won't this cause issues to MergeAppend optimizations? I guess we could have the join be of a new type that's transparent to such optimizations but, you see, that seems complicated enough that I want to avoid that ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Git humor
On 06/14/2015 02:07 PM, Bruce Momjian wrote: If you ever had trouble understanding a git manual page, this humorous random git man page generator is full of fun: http://git-man-page-generator.lokaltog.net/ Try a few generate runs to find one that seems logical. :-) This looked good to me: | | |git-dodge-tree|dodge some local trees outside a few fscked applied tips cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] The real reason why TAP testing isn't ready for prime time
Buildfarm member hamster has failed a pretty significant fraction of its recent runs in the BinInstallCheck step: http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=hamsterbr=HEAD Since other critters aren't equally distressed, it seems likely that this is just an out-of-disk-space type of problem. But maybe it's trying to tell us that there's a genuine platform-specific bug there. In any case, I challenge anyone to figure out what's happening from the information available from the buildfarm logs. I don't know whether this is just that the buildfarm script isn't collecting data that it should be. But my experiences with the TAP test scripts haven't been very positive. When they fail, it takes a lot of digging to find out why. Basically, that entire mechanism sucks as far as debuggability is concerned. I think there is a good argument for turning this off in the buildfarm until there is a better way of identifying and solving problems. It is not helping us that hamster is red half the time for undiscoverable reasons. That just conditions people to ignore it, and it may well be masking real problems that the machine could be finding if it weren't failing at this step. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Need Multixact Freezing Docs
Josh Berkus wrote: You can see the current multixact value in pg_controldata output. Keep timestamped values of that somewhere (a table?) so that you can measure consumption rate. I don't think we provide SQL-level access to those values. Bleh. Do we provide SQL-level access in 9.4? If not, I think that's a requirement before release. Telling users to monitor a setting using a restricted-permission command-line utility which produces a version-specific text file they have to parse is not going to win us a lot of fans. I found that I had written a very quick accessor function to multixact shared state data awhile ago. This might be useful for monitoring purposes. What do people think of including this for 9.5? It needs a small change to add the newly added oldestOffset (plus a little cleanup and docs). -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services From 2134f7928cad8c8d1a1e2d752c9443fb44a92a7e Mon Sep 17 00:00:00 2001 From: Alvaro Herrera alvhe...@alvh.no-ip.org Date: Sun, 14 Jun 2015 11:48:58 -0300 Subject: [PATCH] pg_get_multixact_range --- src/backend/access/transam/multixact.c | 31 +++ src/include/catalog/pg_proc.h | 2 ++ 2 files changed, 33 insertions(+) diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index 516a89f..dedded3 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -3274,3 +3274,34 @@ pg_get_multixact_members(PG_FUNCTION_ARGS) SRF_RETURN_DONE(funccxt); } + +#include access/htup_details.h + +Datum +pg_get_multixact_range(PG_FUNCTION_ARGS); + +Datum +pg_get_multixact_range(PG_FUNCTION_ARGS) +{ + TupleDesc tupdesc; + HeapTuple htup; + Datum values[3]; + bool nulls[3]; + + /* Build a tuple descriptor for our result type */ + if (get_call_result_type(fcinfo, NULL, tupdesc) != TYPEFUNC_COMPOSITE) + elog(ERROR, return type must be a row type); + tupdesc = BlessTupleDesc(tupdesc); + + LWLockAcquire(MultiXactGenLock, LW_SHARED); + values[0] = MultiXactState-oldestMultiXactId; + values[1] = MultiXactState-nextMXact; + values[2] = MultiXactState-nextOffset; + LWLockRelease(MultiXactGenLock); + + nulls[0] = nulls[1] = nulls[2] = false; + + htup = heap_form_tuple(tupdesc, values, nulls); + + PG_RETURN_DATUM(HeapTupleGetDatum(htup)); +} diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index 6b3d194..f87d3d0 100644 --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h @@ -3079,6 +3079,8 @@ DATA(insert OID = 1065 ( pg_prepared_xact PGNSP PGUID 12 1 1000 0 0 f f f f t t DESCR(view two-phase transactions); DATA(insert OID = 3819 ( pg_get_multixact_members PGNSP PGUID 12 1 1000 0 0 f f f f t t v 1 0 2249 28 {28,28,25} {i,o,o} {multixid,xid,mode} _null_ _null_ pg_get_multixact_members _null_ _null_ _null_ )); DESCR(view members of a multixactid); +DATA(insert OID = 3401 ( pg_get_multixact_range PGNSP PGUID 12 10 0 0 f f f f t f s 0 0 2249 {28,28,28} {o,o,o} {oldestmulti,nextmulti,nextoffset} _null_ _null_ pg_get_multixact_range _null_ _null_ _null_ )); +DESCR(get range of live multixacts); DATA(insert OID = 3581 ( pg_xact_commit_timestamp PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 1184 28 _null_ _null_ _null_ _null_ _null_ pg_xact_commit_timestamp _null_ _null_ _null_ )); DESCR(get commit timestamp of a transaction); -- 2.1.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Entities created in one query not available in another in extended protocol
On Sun, Jun 14, 2015 at 6:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: Shay Rojansky r...@roji.org writes: [ rely on non-blocking sockets to avoid deadlock ] Yeah, that's pretty much the approach libpq has taken: write (or read) when you can, but press on when you can't. Good to hear. The main issue I'm concerned about is SSL/TLS, which is a layer on top of the sockets and which might not work well with non-blocking sockets... We have not had word of any such problem with libpq. It's possible that the intersection of SSL users with non-blocking-mode users is nil, but I kinda doubt that. You do need to interpret openssl's return codes correctly ... I don't think there's a problem with non-blocking I/O and SSL per se, the question is about the .NET TLS/SSL implementation Npgsql uses - so it's really totally unrelated to PostgreSQL... Shay
[HACKERS] Git humor
If you ever had trouble understanding a git manual page, this humorous random git man page generator is full of fun: http://git-man-page-generator.lokaltog.net/ Try a few generate runs to find one that seems logical. :-) -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On columnar storage
Tom Lane wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: Actually ... if you intend to allow column storage to work with inherited tables (and if you don't, you'd better have a darn good reason why not), I think you probably want to do this join insertion *after* inheritance expansion, so you can join child column stores only to the appropriate child heap table, and not to the entire inheritance tree. Won't this cause issues to MergeAppend optimizations? Like what? Well, as I understand, MergeAppend needs to know the sort order of the child node, right? But that's available only on the relation RTE, not on the colstore-join RTE. Though now that I think about it, maybe I can push that info from the relation RTE to the colstore-join RTE, since I know the ordering will be the same. And if there are such issues, why do you think you wouldn't be expected to solve them? Precisely. If I simply reject having column stores in partitioned tables, then I don't *need* to solve them. Later in the project, when some planner hacker decides to join, I can ask them for advice on how to tackle them. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.5 feature count
I have run a script to count the number of listitem items in the major release notes of each major version of Postgres back to 7.4: 7.4280 8.0238 8.1187 8.2230 8.3237 8.4330 9.0252 9.1213 9.2250 9.3187 9.4217 9.5176 The 9.5 number will only change a little by 9.5 final. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5 release notes
Bruce Momjian wrote: On Sat, Jun 13, 2015 at 05:47:59AM -0300, Alvaro Herrera wrote: Also, it looks like we could spell your last name with an accent, assuming the i-acute character in Latin1 is fine. Ah, you are correct. I found a few commits that did have that accent. All _seven_ of Petr's 9.5 commit mentions updated to add the accent. :-) Thanks. ... though two of them now say Petr Petr. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On columnar storage
I wrote: Another model that could be followed is expansion of inheritance-tree references, which happens early in the planner. Actually ... if you intend to allow column storage to work with inherited tables (and if you don't, you'd better have a darn good reason why not), I think you probably want to do this join insertion *after* inheritance expansion, so you can join child column stores only to the appropriate child heap table, and not to the entire inheritance tree. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On columnar storage
Alvaro Herrera alvhe...@2ndquadrant.com writes: Another choice I considered was subquery_planner: in the spot where expand_inherited_tables() is called, add a new call to expand the RTEs that correspond to tables containing stores. But I had second thoughts because that function says it's safe because it only adds baserels, not joins, and I'm adding joins. I think that comment is only justifying why we don't need to redetermine hasJoinRTEs/hasLateralRTEs/hasOuterJoins afterwards. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory Accounting v11
Jeff Davis pg...@j-davis.com writes: This patch tracks memory usage (at the block level) for all memory contexts. Individual palloc()s aren't tracked; only the new blocks allocated to the memory context with malloc(). ... My general answer to the performance concerns is that they aren't a reason to block this patch, unless someone has a suggestion about how to fix them. Adding one field to a struct and a few arithmetic operations for each malloc() or realloc() seems reasonable to me. Maybe, but this here is a pretty weak argument: The current state, where HashAgg just blows up the memory, is just not reasonable, and we need to track the memory to fix that problem. Meh. HashAgg could track its memory usage without loading the entire system with a penalty. Moreover, this is about fourth or fifth down the list of the implementation problems with spilling hash aggregation to disk. It would be good to see credible solutions for the bigger issues before we buy into adding overhead for a mechanism with no uses in core. Others have also mentioned that we might want to use this mechanism to track memory for other operators, like Sort or HashJoin, which might be simpler and more accurate. That would imply redefining the meaning of work_mem for those operations; it would suddenly include a lot more overhead space than it used to, causing them to spill to disk much more quickly than before, unless one changes the work_mem setting to compensate. Not sure that people would like such a change. An even bigger problem is that it would pretty much break the logic around LACKMEM() in tuplesort.c, which supposes that spilling individual tuples to disk is a reliable and linear way to decrease the accounted-for memory. Individual pfree's would typically not decrease the accounted total in this implementation. Depending on what you have in mind for the spill-to-disk behavior, this issue could be a fail for HashAgg as well, which is another reason not to commit this patch in advance of seeing the use-case. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5 release notes
On Sat, Jun 13, 2015 at 6:24 PM, Bruce Momjian br...@momjian.us wrote: I think we should really address this. Attached patch adds a new release note item for it. It also adds to the documentation that explains why users should prefer varchar(n)/text to character(n); the lack of abbreviated key support now becomes a huge disadvantage for character(n), whereas in previous versions the disadvantages were fairly minor. In passing, I updated the existing sort item to reflect that only varchar(n), text, and numeric benefit from the abbreviation optimization (not character types more generally + numeric), and added a note on the effectiveness of the abbreviation optimization alone. This all also seems like overkill to me. Users just don't care about this level of detail, and are easily confused by it. Really? The pre-check thing wasn't too complex for Magnus to have a couple of bullet points on it *specifically* in his high level NYC talk on Postgres 9.5 features [1]. I don't think it's hard to understand at all. Also, it's simply incorrect to describe abbreviation as: Improve the speed of sorting character and numeric fields. Character fields presumably include character(n), and as I pointed out character(n) lacks abbreviation support. That needs to be documented as an additional disadvantage of character(n) vs text/varchar(n) in the documentation where the trade-off is already discussed (i.e. not in the release notes), because it makes a *huge* difference. And ISTM that it should also be clear from the release notes themselves. We seemingly always have type /type tags when types are named in release notes. [1] http://www.hagander.net/talks/postgresql95.pdf -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Git humor
On June 14, 2015 02:07:16 PM Bruce Momjian wrote: If you ever had trouble understanding a git manual page, this humorous random git man page generator is full of fun: http://git-man-page-generator.lokaltog.net/ Try a few generate runs to find one that seems logical. :-) To my untrained eye those look an aweful lot like they're lifted from random pgsql-hackers threads. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Memory Accounting v11
This patch tracks memory usage (at the block level) for all memory contexts. Individual palloc()s aren't tracked; only the new blocks allocated to the memory context with malloc(). It also adds a new function, MemoryContextMemAllocated() which can either retrieve the total allocated for the context, or it can recurse and sum up the allocations for all subcontexts as well. This is intended to be used by HashAgg in an upcoming patch that will bound the memory and spill to disk. Previous discussion here: http://www.postgresql.org/message-id/1407012053.15301.53.camel@jeff-desktop Previous concerns: * There was a slowdown reported of around 1-3% (depending on the exact version of the patch) on an IBM power machine when doing an index rebuild. The results were fairly noisy for me, but it seemed to hold up. See http://www.postgresql.org/message-id/CA +Tgmobnu7XEn1gRdXnFo37P79bF=qLt46=37ajp3cro9db...@mail.gmail.com * Adding a struct field to MemoryContextData may be bad for the CPU caching behavior, and may be the cause of the above slowdown. * Previous versions of the patch updated the parent contexts' allocations as well, which avoided the need to recurse when querying the total allocation. That seemed to penalize cases that didn't need to track the allocation. We discussed trying to opt-in to this behavior, but it seemed more awkward than helpful. Now, the patch only updates the allocation of the current context, and querying means recursing through the child contexts. * There was a concern that, if MemoryContextMemAllocated needs to recurse to the child contexts, it will be too slow for HashAgg of array_agg, which creates a child context per group. That was solved with http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b419865a814abbca12bdd6eef6a3d5ed67f432e1 My general answer to the performance concerns is that they aren't a reason to block this patch, unless someone has a suggestion about how to fix them. Adding one field to a struct and a few arithmetic operations for each malloc() or realloc() seems reasonable to me. The current state, where HashAgg just blows up the memory, is just not reasonable, and we need to track the memory to fix that problem. Others have also mentioned that we might want to use this mechanism to track memory for other operators, like Sort or HashJoin, which might be simpler and more accurate. Regards, Jeff Davis *** a/src/backend/utils/mmgr/aset.c --- b/src/backend/utils/mmgr/aset.c *** *** 500,505 AllocSetContextCreate(MemoryContext parent, --- 500,508 errdetail(Failed while creating memory context \%s\., name))); } + + ((MemoryContext) set)-mem_allocated += blksize; + block-aset = set; block-freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ; block-endptr = ((char *) block) + blksize; *** *** 590,595 AllocSetReset(MemoryContext context) --- 593,600 else { /* Normal case, release the block */ + context-mem_allocated -= block-endptr - ((char*) block); + #ifdef CLOBBER_FREED_MEMORY wipe_mem(block, block-freeptr - ((char *) block)); #endif *** *** 632,637 AllocSetDelete(MemoryContext context) --- 637,644 { AllocBlock next = block-next; + context-mem_allocated -= block-endptr - ((char *) block); + #ifdef CLOBBER_FREED_MEMORY wipe_mem(block, block-freeptr - ((char *) block)); #endif *** *** 672,677 AllocSetAlloc(MemoryContext context, Size size) --- 679,687 block = (AllocBlock) malloc(blksize); if (block == NULL) return NULL; + + context-mem_allocated += blksize; + block-aset = set; block-freeptr = block-endptr = ((char *) block) + blksize; *** *** 861,866 AllocSetAlloc(MemoryContext context, Size size) --- 871,878 if (block == NULL) return NULL; + context-mem_allocated += blksize; + block-aset = set; block-freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ; block-endptr = ((char *) block) + blksize; *** *** 964,969 AllocSetFree(MemoryContext context, void *pointer) --- 976,984 set-blocks = block-next; else prevblock-next = block-next; + + context-mem_allocated -= block-endptr - ((char*) block); + #ifdef CLOBBER_FREED_MEMORY wipe_mem(block, block-freeptr - ((char *) block)); #endif *** *** 1077,1082 AllocSetRealloc(MemoryContext context, void *pointer, Size size) --- 1092,1098 AllocBlock prevblock = NULL; Size chksize; Size blksize; + Size oldblksize; while (block != NULL) { *** *** 1094,1102 AllocSetRealloc(MemoryContext context, void *pointer, Size size) --- 1110,1123 /* Do the realloc */ chksize = MAXALIGN(size); blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ; + oldblksize = block-endptr - ((char *)block); + block = (AllocBlock) realloc(block, blksize); if
Re: [HACKERS] On columnar storage
Tom Lane wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Amit Kapila wrote: Will the column store obey snapshot model similar to current heap tuples, if so will it derive the transaction information from heap tuple? Yes, visibility will be tied to the heap tuple -- a value is accessed only when its corresponding heap row has already been determined to be visible. One interesting point that raises from this is about vacuum: when are we able to remove a value from the store? I have some not-completely-formed ideas about this. Hm. This seems not terribly ambitious --- mightn't a column store extension wish to store tables *entirely* in the column store, rather than tying them to a perhaps-vestigial heap table? Well, yes, it might, but that opens a huge can of worms. What heapam offers is not just tuple storage, but a lot of functionality on top of that -- in particular, tuple locking and visibility. I am certainly not considering re-implementing any of that. We might eventually go there, but we will *additionally* need different implementations of those things, and I'm pretty sure that will be painful, so I'm trying to stay away from that. Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: I can't help thinking that this could tie in with the storage level API that I was waving my arms about last year. Or maybe not --- the goals are substantially different --- but I think we ought to reflect on that rather than just doing a narrow hack for column stores used in the particular way you're describing here. I can't seem to remember this proposal you mention. Care to be more specific? Perhaps a link to archives is enough. I never got to the point of having a concrete proposal, but there was a discussion about it at last year's PGCon unconference; were you there? No, regrettably I wasn't there. Anyway the idea was to try to cut a clearer line between heap storage and the upper levels of the system, particularly the catalog/DDL code that we have so much of. Based on Salesforce's experience so far, it's near impossible to get rid of HeapTuple as the lingua franca for representing rows in the upper system levels, so we've not really tried; but it would be nice if the DDL code weren't so much in bed with heap-specific knowledge, like the wired-into-many-places assumption that row insert and update actions require index updates but deletions don't. Agreed on both counts. As far as catalog code goes, removing direct mapping from HeapTuple to C structs would require a huge rewrite of tons of code. Unless we're considering rewriting small pieces of specific catalog handling at a time, it seems unlikely that we will have columns from system catalogs in column stores. (It seems reasonable that as soon as we have column stores, we can have particular catalog code to work with columnar storage, but I don't think there's much need for that currently.) I agree with your second point also --- it might be good to have a layer in between, and it seems not completely unreasonable. It would require touching lots of places but not hugely transforming things. (I think it's not in the scope of the things I'm currently after, though.) We're also not very happy with the general assumption that a TID is an adequate row identifier (as our storage engine does not have TIDs), so I'm a bit disappointed to see you doubling down on that restriction rather than trying to lift it. Well, in the general design, there is room for different tuple identifiers. I'm just not implementing it for the first version. Now much of this pain only comes into play if one is trying to change the underlying storage format for system catalogs, which I gather is not considered in your proposal. But if you want one format for catalogs and another for user tables then you have issues like how do you guarantee atomic commit and crash safety across multiple storage engines. That way lies a mess, especially if you're trying to keep the engines at arms' length which is what a pluggable architecture implies. MySQL is a cautionary example we should be keeping in mind while thinking about this. Right. I don't want a separate storage engine that needs to reimplement transactions (as is the case in MySQL), or visibility rules. I don't want to have a different format for tables or catalogs; both would still be based on the current heapam API. I simply want to extend the API so that I can have some columns in a separate place. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On columnar storage
On 2015-06-14 17:36, Tomas Vondra wrote: On 06/14/15 17:22, Alvaro Herrera wrote: Robert Haas wrote: ,,, Second, there's the idea of a way of storing tuples that is different from PostgreSQL's usual mechanism - i.e. a storage manager API. I understand your concerns about going through the FDW API so maybe that's not the right way to do it, but it seems to me that in the end you are going to end up with something awfully similar to that + a local relfilenode + WAL logging support. I'm not clear on why you want to make the column store API totally different from the FDW API; there may be a reason, but I don't immediately see it. I just don't see that the FDW API is such a good fit for what I'm trying to do. Anything using the FDW API needs to implement its own visibility checking, for instance. I want to avoid that, because it's its own complex problem. Also, it doesn't look like the FDW API supports things that I want to do (neither my proposed LateColumnMaterialization nor my proposed BitmapColumnScan). I would have to extend the FDW API, then contort my stuff so that it fits in the existing API; then I will need to make sure that existing FDWs are not broken by the changes I would propose elsewhere. Round peg, square hole is all I see here. All in all, this seems too much additional work, just to make to things that are really addressing different problems go through the same code. You're correct about local WAL logging. We will need a solution to that problem. I was hoping to defer that until we had something like Alexander Korotkov's proposed pluggable WAL stuff. Probably worth mentioning we don't expect the column store API to be used only by extensions, but even by code from within the core which can use the current WAL infrastructure etc. I think it would be even ok if the initial implementation had restrictions in a way that extensions can't use it. I think we want implementation in the core first, support for extensions can come later. But it's still worth it to have an API since single implementation probably won't fit every purpose (see indexes). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On columnar storage
Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: Actually ... if you intend to allow column storage to work with inherited tables (and if you don't, you'd better have a darn good reason why not), I think you probably want to do this join insertion *after* inheritance expansion, so you can join child column stores only to the appropriate child heap table, and not to the entire inheritance tree. Won't this cause issues to MergeAppend optimizations? Like what? And if there are such issues, why do you think you wouldn't be expected to solve them? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On columnar storage
Tom Lane wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Won't this cause issues to MergeAppend optimizations? Like what? Well, as I understand, MergeAppend needs to know the sort order of the child node, right? But that's available only on the relation RTE, not on the colstore-join RTE. Uh, what? Sort order is a property of a path, not an RTE. Evidently need to do more digging .. but that makes plenty of sense. And we have always understood which join types preserve sort order. That's obvious now that you say it. You misunderstood the thrust of my comment, which basically is that I doubt anyone will think that rejecting that combination is an acceptable implementation restriction. It might be all right if it doesn't work very well in v0, but not if the implementation is designed so that it can never be fixed. Gotcha. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory Accounting v11
Hi, On 06/14/15 22:21, Tom Lane wrote: Jeff Davis pg...@j-davis.com writes: This patch tracks memory usage (at the block level) for all memory contexts. Individual palloc()s aren't tracked; only the new blocks allocated to the memory context with malloc(). ... My general answer to the performance concerns is that they aren't a reason to block this patch, unless someone has a suggestion about how to fix them. Adding one field to a struct and a few arithmetic operations for each malloc() or realloc() seems reasonable to me. Maybe, but this here is a pretty weak argument: The current state, where HashAgg just blows up the memory, is just not reasonable, and we need to track the memory to fix that problem. Meh. HashAgg could track its memory usage without loading the entire system with a penalty. +1 to a solution like that, although I don't think that's doable without digging the info from memory contexts somehow. Moreover, this is about fourth or fifth down the list of the implementation problems with spilling hash aggregation to disk. It would be good to see credible solutions for the bigger issues before we buy into adding overhead for a mechanism with no uses in core. I don't think so. If we don't have an acceptable solution for tracking memory in hashagg, there's not really much point in doing any of the following steps. That's why Jeff is tackling this problem first. Also, Jeff already posted a PoC of the memory-bounded hashagg, although it only worked for aggregates with fixed-size state, and not that great for cases like array_agg where the state grows. Maybe the improvements to aggregate functions proposed by David Rowley last week might help in those cases, as that'd allow spilling those states to disk. So while the approach proposed by Jeff may turn out not to be the right one, I think memory accounting needs to be solved first (even if it's not committed until the whole feature is ready). Others have also mentioned that we might want to use this mechanism to track memory for other operators, like Sort or HashJoin, which might be simpler and more accurate. That would imply redefining the meaning of work_mem for those operations; it would suddenly include a lot more overhead space than it used to, causing them to spill to disk much more quickly than before, unless one changes the work_mem setting to compensate. Not sure that people would like such a change. Don't we tweak the work_mem meaning over time anyway? Maybe not to such extent, but for example the hashjoin 9.5 improvements certainly change this by packing the tuples more densely, sizing the buckets differently etc. It's true the changes are in the other direction (i.e. batching less frequently) though. OTOH this would make the accounting more accurate (e.g. eliminating differences between cases with different amounts of overhead because of different tuple width, for example). Wouldn't that be a good thing, even if people need to tweak the work_mem? Isn't that kinda acceptable when upgrading to the next major version? An even bigger problem is that it would pretty much break the logic around LACKMEM() in tuplesort.c, which supposes that spilling individual tuples to disk is a reliable and linear way to decrease the accounted-for memory. Individual pfree's would typically not decrease the accounted total in this implementation. Depending on what you have in mind for the spill-to-disk behavior, this issue could be a fail for HashAgg as well, which is another reason not to commit this patch in advance of seeing the use-case. That's true, but I think the plan was always to wait for the whole feature, and only then commit all the pieces. -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] creating extension including dependencies
Hi, I am getting tired installing manually required extensions manually. I was wondering if we might want to add option to CREATE SEQUENCE that would allow automatic creation of the extensions required by the extension that is being installed by the user. I also wrote some prototype patch that implements this. I call it prototype mainly because the syntax (CREATE EXTENSION ... RECURSIVE) could be improved, I originally wanted to do something like INCLUDING DEPENDENCIES but that need news (unreserved) keyword and I don't think it's worth it, plus it's wordy. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 7d5db7faae0b30a70ceae020638fa7779810339d Mon Sep 17 00:00:00 2001 From: Petr Jelinek pjmo...@pjmodos.net Date: Sat, 13 Jun 2015 19:49:10 +0200 Subject: [PATCH] CREATE EXTENSION RECURSIVE --- doc/src/sgml/ref/create_extension.sgml | 11 src/backend/commands/extension.c | 61 -- src/backend/parser/gram.y | 4 ++ src/test/modules/test_extensions/.gitignore| 3 ++ src/test/modules/test_extensions/Makefile | 22 .../test_extensions/expected/test_extensions.out | 16 ++ .../test_extensions/sql/test_extensions.sql| 11 .../modules/test_extensions/test_ext1--1.0.sql | 0 src/test/modules/test_extensions/test_ext1.control | 4 ++ .../modules/test_extensions/test_ext2--1.0.sql | 0 src/test/modules/test_extensions/test_ext2.control | 4 ++ .../modules/test_extensions/test_ext3--1.0.sql | 0 src/test/modules/test_extensions/test_ext3.control | 3 ++ .../test_extensions/test_ext_cyclic1--1.0.sql | 0 .../test_extensions/test_ext_cyclic1.control | 4 ++ .../test_extensions/test_ext_cyclic2--1.0.sql | 0 .../test_extensions/test_ext_cyclic2.control | 4 ++ 17 files changed, 143 insertions(+), 4 deletions(-) create mode 100644 src/test/modules/test_extensions/.gitignore create mode 100644 src/test/modules/test_extensions/Makefile create mode 100644 src/test/modules/test_extensions/expected/test_extensions.out create mode 100644 src/test/modules/test_extensions/sql/test_extensions.sql create mode 100644 src/test/modules/test_extensions/test_ext1--1.0.sql create mode 100644 src/test/modules/test_extensions/test_ext1.control create mode 100644 src/test/modules/test_extensions/test_ext2--1.0.sql create mode 100644 src/test/modules/test_extensions/test_ext2.control create mode 100644 src/test/modules/test_extensions/test_ext3--1.0.sql create mode 100644 src/test/modules/test_extensions/test_ext3.control create mode 100644 src/test/modules/test_extensions/test_ext_cyclic1--1.0.sql create mode 100644 src/test/modules/test_extensions/test_ext_cyclic1.control create mode 100644 src/test/modules/test_extensions/test_ext_cyclic2--1.0.sql create mode 100644 src/test/modules/test_extensions/test_ext_cyclic2.control diff --git a/doc/src/sgml/ref/create_extension.sgml b/doc/src/sgml/ref/create_extension.sgml index a1e7e4f8..d151fd6 100644 --- a/doc/src/sgml/ref/create_extension.sgml +++ b/doc/src/sgml/ref/create_extension.sgml @@ -25,6 +25,7 @@ CREATE EXTENSION [ IF NOT EXISTS ] replaceable class=parameterextension_name [ WITH ] [ SCHEMA replaceable class=parameterschema_name/replaceable ] [ VERSION replaceable class=parameterversion/replaceable ] [ FROM replaceable class=parameterold_version/replaceable ] + [ RECURSIVE ] /synopsis /refsynopsisdiv @@ -139,6 +140,16 @@ CREATE EXTENSION [ IF NOT EXISTS ] replaceable class=parameterextension_name /para /listitem /varlistentry + + varlistentry + termliteralRECURSIVE//term + listitem + para +Try to install extension including the required dependencies +recursively. + /para + /listitem + /varlistentry /variablelist /refsect1 diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c index 5cc74d0..4395519 100644 --- a/src/backend/commands/extension.c +++ b/src/backend/commands/extension.c @@ -46,6 +46,7 @@ #include funcapi.h #include mb/pg_wchar.h #include miscadmin.h +#include nodes/makefuncs.h #include storage/fd.h #include tcop/utility.h #include utils/builtins.h @@ -1176,10 +1177,13 @@ CreateExtension(CreateExtensionStmt *stmt) DefElem*d_schema = NULL; DefElem*d_new_version = NULL; DefElem*d_old_version = NULL; + DefElem *d_recursive = NULL; char *schemaName; Oid schemaOid; char *versionName; char *oldVersionName; + bool recursive = false; + List *recursive_parents = NIL; Oid extowner = GetUserId(); ExtensionControlFile *pcontrol; ExtensionControlFile *control; @@ -1263,6 +1267,14 @@ CreateExtension(CreateExtensionStmt *stmt) errmsg(conflicting or redundant options))); d_old_version = defel; }
[HACKERS] Sharing aggregate states between different aggregate functions
Simon and I have been going over some ideas about how to make improvements to aggregate performance by cutting down on the duplicate work that's done when 2 aggregate functions are used where one knows how to satisfy all the requirements of the other. To cut a long story short, all our ideas will require some additions or modifications to CREATE AGGREGATE and also pg_dump support. Tom came up with a more simple idea, that gets us some of the way, without all that pg_dump stuff. http://www.postgresql.org/message-id/30851.1433860...@sss.pgh.pa.us This basically allows an aggregate's state to be shared between other aggregate functions when both aggregate's transition functions (and a few other things) match There's quite a number of aggregates in our standard set which will benefit from this optimisation. Please find attached a patch which implements this idea. The performance improvements are as follows: create table t1 as select x.x::numeric from generate_series(1,100) x(x); -- standard case. select sum(x),avg(x) from t1; Master: Time: 350.303 ms Time: 353.716 ms Time: 349.703 ms Patched: Time: 227.687 ms Time: 222.563 ms Time: 224.691 ms -- extreme case. select stddev_samp(x),stddev(x),variance(x),var_samp(x),var_pop(x),stddev_pop(x) from t1; Master: Time: 1464.461 ms Time: 1462.343 ms Time: 1450.232 ms Patched: Time: 346.473 ms Time: 348.445 ms Time: 351.365 ms Regards David Rowley -- David Rowley http://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training Services sharing_agg_states_2c3d4a9_2015-06-15.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to improve a few appendStringInfo* calls
On 29 May 2015 at 12:51, Peter Eisentraut pete...@gmx.net wrote: On 5/12/15 4:33 AM, David Rowley wrote: Shortly after I sent the previous patch I did a few more searches and also found some more things that are not quite right. Most of these are to use the binary append method when the length of the string is already known. For these cases it might be better to invent additional functions such as stringinfo_to_text() and appendStringInfoStringInfo() instead of repeating the pattern of referring to data and length separately. You're probably right. It would be nicer to see some sort of wrapper functions that cleaned these up a bit. I really think that's something for another patch though, this patch just intends to put things the way they're meant to be in the least invasive way possible. What you're proposing is changing the way it's meant to work, which will cause much more code churn. I've attached a re-based patch. Regards David Rowley -- David Rowley http://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training Services appendStringInfo_fixes_fa48767_2015-06-15.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory Accounting v11
Hi, On 06/14/15 21:43, Jeff Davis wrote: This patch tracks memory usage (at the block level) for all memory contexts. Individual palloc()s aren't tracked; only the new blocks allocated to the memory context with malloc(). I see it's adding the new field as int64 - wouldn't a Size be more appropriate, considering that's what we use in mctx.h and aset.c? It also adds a new function, MemoryContextMemAllocated() which can either retrieve the total allocated for the context, or it can recurse and sum up the allocations for all subcontexts as well. This is intended to be used by HashAgg in an upcoming patch that will bound the memory and spill to disk. Previous discussion here: http://www.postgresql.org/message-id/1407012053.15301.53.camel@jeff-desktop Previous concerns: * There was a slowdown reported of around 1-3% (depending on the exact version of the patch) on an IBM power machine when doing an index rebuild. The results were fairly noisy for me, but it seemed to hold up. See http://www.postgresql.org/message-id/CA +Tgmobnu7XEn1gRdXnFo37P79bF=qLt46=37ajp3cro9db...@mail.gmail.com * Adding a struct field to MemoryContextData may be bad for the CPU caching behavior, and may be the cause of the above slowdown. * Previous versions of the patch updated the parent contexts' allocations as well, which avoided the need to recurse when querying the total allocation. That seemed to penalize cases that didn't need to track the allocation. We discussed trying to opt-in to this behavior, but it seemed more awkward than helpful. Now, the patch only updates the allocation of the current context, and querying means recursing through the child contexts. I don't think the opt-in idea itself was awkward, it was rather about the particular APIs that we came up with, especially when combined with the 'context inheritance' thing. I still think the opt-in approach and updating accounting for the parent contexts was the best one, because it (a) minimizes impact in cases that don't use the accounting, and (b) makes finding the current amount of memory cheap ... * There was a concern that, if MemoryContextMemAllocated needs to recurse to the child contexts, it will be too slow for HashAgg of array_agg, which creates a child context per group. That was solved with http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b419865a814abbca12bdd6eef6a3d5ed67f432e1 I wouldn't say this was solved - we have fixed one particular example of such aggregate implementation, because it was causing OOM issues with many groups, but there may be other custom aggregates using the same pattern. Granted, built-in aggregates are probably more critical than aggregates provided by extensions, but I wouldn't dare to mark this solved ... My general answer to the performance concerns is that they aren't a reason to block this patch, unless someone has a suggestion about how to fix them. Adding one field to a struct and a few arithmetic operations for each malloc() or realloc() seems reasonable to me. I'm not buying this, sorry. While I agree that we should not expect the memory accounting to be entirely free, we should be very careful about the overhead especially if we're dropping the opt-in and thus imposing the overhead on everyone. But performance concerns are not a reason to block this patch approach seems wrong. With any other patch a 3% regression would be considered a serious issue IMNSHO. The current state, where HashAgg just blows up the memory, is just not reasonable, and we need to track the memory to fix that problem. Others have also mentioned that we might want to use this mechanism to track memory for other operators, like Sort or HashJoin, which might be simpler and more accurate. Dropping the memory accounting implementations and keeping just this new solution would be nice, only if we agree the performance impact to be acceptable. We already have accounting solution for each of those places, so I don't think the unification alone outweighs the regression. regards -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The real reason why TAP testing isn't ready for prime time
On Mon, Jun 15, 2015 at 3:37 AM, Tom Lane t...@sss.pgh.pa.us wrote: Buildfarm member hamster has failed a pretty significant fraction of its recent runs in the BinInstallCheck step: http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=hamsterbr=HEAD Since other critters aren't equally distressed, it seems likely that this is just an out-of-disk-space type of problem. But maybe it's trying to tell us that there's a genuine platform-specific bug there. In any case, I challenge anyone to figure out what's happening from the information available from the buildfarm logs. I don't know whether this is just that the buildfarm script isn't collecting data that it should be. But my experiences with the TAP test scripts haven't been very positive. When they fail, it takes a lot of digging to find out why. Basically, that entire mechanism sucks as far as debuggability is concerned. Indeed. I think that one step in the good direction would be to replace all the calls to system and system_or_bail with a wrapper routine that calls IPC::Run able to catch the logs and store those logs in each test's base path. The same applies to pg_rewind tests. I think there is a good argument for turning this off in the buildfarm until there is a better way of identifying and solving problems. It is not helping us that hamster is red half the time for undiscoverable reasons. That just conditions people to ignore it, and it may well be masking real problems that the machine could be finding if it weren't failing at this step. hamster is legendary slow and has a slow disc, hence it improves chances of catching race conditions, and it is the only slow buildfarm machine enabling the TAP tests (by comparison dangomushi has never failed with the TAP tests) hence I would prefer thinking that the problem is not specific to ArchLinux ARM. In this case the failure seems to be related to the timing test servers stop and start even if -w switch is used with pg_ctl, particularly that PGPORT is set to the same value for all servers... Still, for the time being I don't mind disabling them and just did so now. I will try to investigate further on the machine itself. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On columnar storage
On Sun, Jun 14, 2015 at 10:30 AM, Tomas Vondra tomas.von...@2ndquadrant.com wrote: Are you looking to avoid all hardware-based limits, or would using a 64 bit row pointer be possible? That would give you 2^64 or 1.8 E19 unique rows over whatever granularity/uniqueness you use (per table, per database, etc.) -- Mike Nolan. I don't think the number of tuples is the main problem here, it's the number of pages a single relation can have. Looking at the numbers of rows as a direct function of TID size is misleading, because the TID is split into two fixed parts - page number (32b) and tuple number (16b). For the record, 2^48 is 281,474,976,710,656 which ought to be enough for anybody, but we waste large part of that because we assume there might be up to 2^16 tuples per page, although the actual limit is way lower (~290 for 8kB pages, and ~1200 for 32kB pages. So we can only have ~4 billion pages, which is where the 32TB limit comes from (with 32kB pages it's 128TB). Longer TIDs are one a straightforward way to work around this limit, assuming you add the bits to the 'page number' field. Adding 16 bits (thus using 64-bit pointers) would increase the limit 2^16-times to about 2048 petabytes (with 8kB pages). But that of course comes with a cost, because you have to keep those larger TIDs in indexes etc. Another option might be to split the 48 bits differently, by moving 5 bits to the page number part of TID (so that we expect ~2048 tuples per page at most). That'd increase the limit to 1PB (4PB with 32kB pages). The column store approach is somehow orthogonal to this, because it splits the table vertically into multiple pieces, each stored in a separate relfilenode and thus using a separate sequence of page numbers. And of course, the usual 'horizontal' partitioning has a very similar effect (separate filenodes). regards -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services Thanks for the reply. It's been a while since my last data structures course (1971), but I do remember a few things. I have never personally needed more than 1500 columns in a table, but can see how some might. Likewise, the 32TB limit hasn't affected me yet, either. I doubt either ever will. Solving either or both of those seems like it may at some point require a larger bit space for (at least some) TIDs, which is why I was wondering if a goal here is to eliminate all (practical) limits, It probably doesn't make sense to force all users to use that large bit space (with the associated space and performance penalties) If there's a way to do this, then you are all truly wizards. (This all reminds me of how the IP4 bit space was parcelled up into Class A, B, C and D addresses, at a time when people thought 32 bits would last us forever. Maybe 128 bits actually will.) -- Mike Nolan
[HACKERS] query execution time faster with geqo on than off: bug?
I've encountered a query with 11 joins whose execution time (i.e., the time not taken up by planning) is significantly faster with geqo on rather than off. This is surprising to me and seems like it might be a bug in the planner, so I am posting it here rather than to -performance. The query is below, along with EXPLAIN ANALYZE results with geqo on and off. The server version is 9.4.4. The various geqo options are all set to the default. join_collapse_limit is set to 12 (the query is much slower with it set to the default of 8). Let me know what other information might be helpful in debugging this further. Thanks! QUERY: SELECT ex.ex, ex.lv, ex.tt, ex.td, dnsrc.ex AS trex, ((uiuq_score(array_agg(ap.ui), array_agg(ap.uq)) + uiuq_score(array_agg(apsrc.ui), array_agg(apsrc.uq))) / 2) AS trq FROM ex INNER JOIN lv ON (lv.lv = ex.lv) INNER JOIN dn ON (dn.ex = ex.ex) INNER JOIN mn ON (mn.mn = dn.mn) INNER JOIN ap ON (ap.ap = mn.ap) INNER JOIN dn AS dn2 ON (dn2.mn = dn.mn) INNER JOIN dn AS dn3 ON (dn3.ex = dn2.ex) INNER JOIN dn AS dnsrc ON (dnsrc.mn = dn3.mn) INNER JOIN mn AS mnsrc ON (mnsrc.mn = dnsrc.mn) INNER JOIN ap AS apsrc ON (apsrc.ap = mnsrc.ap) INNER JOIN ex AS exsrc ON (exsrc.ex = dnsrc.ex) INNER JOIN lv AS lvsrc ON (lvsrc.lv = exsrc.lv) WHERE dn.ex != dn2.ex AND dn3.ex != dnsrc.ex AND mn.ap != mnsrc.ap AND dn.ex != dnsrc.ex AND lcvc(lv.lc, lv.vc) IN ('zul-000') AND lcvc(lvsrc.lc, lvsrc.vc) IN ('gle-000') AND exsrc.tt IN ('doras') GROUP BY ex.ex, dnsrc.ex ORDER BY trq desc LIMIT 2000; EXPLAIN ANALYZE with geqo on: QUERY PLAN - Limit (cost=3603.82..3603.82 rows=2 width=57) (actual time=623.322..623.326 rows=31 loops=1) - Sort (cost=3603.82..3603.82 rows=2 width=57) (actual time=623.321..623.321 rows=31 loops=1) Sort Key: (((uiuq_score(array_agg(ap.ui), array_agg(ap.uq)) + uiuq_score(array_agg(apsrc.ui), array_agg(apsrc.uq))) / 2)) Sort Method: quicksort Memory: 27kB - HashAggregate (cost=3603.60..3603.82 rows=2 width=57) (actual time=622.074..623.283 rows=31 loops=1) Group Key: ex.ex, dnsrc.ex - Nested Loop (cost=3075.23..3603.59 rows=2 width=57) (actual time=8.023..620.761 rows=766 loops=1) - Nested Loop (cost=3075.17..3603.43 rows=2 width=55) (actual time=8.009..619.417 rows=766 loops=1) Join Filter: (mn.ap mnsrc.ap) Rows Removed by Join Filter: 3793 - Hash Join (cost=3075.08..3603.21 rows=2 width=63) (actual time=7.984..610.206 rows=4559 loops=1) Hash Cond: (dn.ex = ex.ex) - Nested Loop (cost=3.04..510.58 rows=20564 width=26) (actual time=0.918..509.574 rows=1047967 loops=1) Join Filter: (dn.ex dnsrc.ex) Rows Removed by Join Filter: 5396 - Nested Loop (cost=2.92..277.03 rows=118 width=22) (actual time=0.913..113.830 rows=40525 loops=1) - Nested Loop (cost=2.87..267.76 rows=118 width=16) (actual time=0.899..60.170 rows=40525 loops=1) - Hash Join (cost=2.76..266.08 rows=1 width=12) (actual time=0.878..5.057 rows=1254 loops=1) Hash Cond: ( exsrc.lv = lvsrc.lv) - Nested Loop (cost=0.43..263.08 rows=871 width=16) (actual time=0.169..4.626 rows=1516 loops=1) - Nested Loop (cost=0.34..164.79 rows=871 width=20) (actual time=0.138..1.180 rows=1516 loops=1) - Nested Loop (cost=0.23..155.09 rows=5 width=12) (actual time=0.097..0.306 rows=43 loops=1) - Index Scan using ex_tt_idx on ex exsrc (cost=0.11..2.57 rows=2 width=8) (actual time=0.056..0.061 rows=5 loops=1) Index Cond: (tt = 'doras'::text) - Index Scan using dn_ex_idx on dn dnsrc (cost=0.11..75.55 rows=71 width=8) (actual time=0.027..0.046 rows=9 loops=5) Index Cond: (ex = exsrc.ex) - Index Only Scan using dn_mn_ex_key on dn dn3 (cost=0.11..1.13 rows=81 width=8) (actual time=0.008..0.015 rows=35 loops=43) Index Cond: (mn = dnsrc.mn) Filter: (ex dnsrc.ex) Rows Removed by Filter: 1 Heap Fetches: 0 - Index Scan using mn_pkey on mn mnsrc (cost=0.09..0.10 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1516)
Re: [HACKERS] Moving Pivotal's Greenplum work upstream
Has anything actually happened on the Greenplum release at this point? All I can find are the same old press releases. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory Accounting v11
On Sun, 2015-06-14 at 16:21 -0400, Tom Lane wrote: Meh. HashAgg could track its memory usage without loading the entire system with a penalty. When I tried doing it outside of the MemoryContexts, it seemed to get awkward quickly. I'm open to suggestion if you can point me in the right direction. Maybe I can peek at the sizes of chunks holding state values and group keys, and combine that with the hash table size-estimating code? Moreover, this is about fourth or fifth down the list of the implementation problems with spilling hash aggregation to disk. It would be good to see credible solutions for the bigger issues before we buy into adding overhead for a mechanism with no uses in core. I had several iterations of a full implementation of the spill-to-disk HashAgg patch[1]. Tomas Vondra has some constructive review comments, but all of them seemed solvable. What do you see as a major unsolved issue? If I recall, you were concerned about things like array_agg, where an individual state could get larger than work_mem. That's a valid concern, but it's not the problem I was trying to solve. Regards, Jeff Davis [1] http://www.postgresql.org/message-id/1407706010.6623.16.camel@jeff-desktop -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] creating extension including dependencies
+1 Is it working in runtime too? Dne 15.6.2015 0:51 napsal uživatel Petr Jelinek p...@2ndquadrant.com: Hi, I am getting tired installing manually required extensions manually. I was wondering if we might want to add option to CREATE SEQUENCE that would allow automatic creation of the extensions required by the extension that is being installed by the user. I also wrote some prototype patch that implements this. I call it prototype mainly because the syntax (CREATE EXTENSION ... RECURSIVE) could be improved, I originally wanted to do something like INCLUDING DEPENDENCIES but that need news (unreserved) keyword and I don't think it's worth it, plus it's wordy. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Time to get rid of PQnoPasswordSupplied?
Hi all I frequently see users confused by one of our more common and less clear error messages: fe_sendauth: no password supplied What this really means is that the server requested a password for md5 or cleartext authentication but no password was supplied to the client and it cannot prompt for one in this context. I'd like to get rid of it. It's clear others have wanted to in the past, since it's a backward compat #define in libpq-fe.h : /* Error when no password was given. */ /* Note: depending on this is deprecated; use PQconnectionNeedsPassword(). */ #define PQnoPasswordSuppliedfe_sendauth: no password supplied\n but given the git blame for it: 4f9bf7fc (Tom Lane 2007-12-09 19:01:40 + 493) /* Note: depending on this is deprecated; use PQconnectionNeedsPassword(). */ 88fd162e (Bruce Momjian 2004-10-16 03:10:17 + 494) #define PQnoPasswordSuppliedfe_sendauth: no password supplied\n I'm wondering if it's time for this to go away, so we can have a decent error message for this common error. It's been deprecated for eight years. How about: The server requested a password but no password was supplied to the client ? I'm sure it can be better than that, but ... well, it's not fe_sendauth: no password supplied. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving Pivotal's Greenplum work upstream
On 06/14/2015 11:44 PM, Craig Ringer wrote: Has anything actually happened on the Greenplum release at this point? All I can find are the same old press releases. Nothing has happened. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The Future of Aggregation
On 12 June 2015 at 23:57, David Rowley david.row...@2ndquadrant.com wrote: On 11 June 2015 at 01:39, Kevin Grittner kgri...@ymail.com wrote: One question that arose in my mind running this was whether might be able to combine sum(x) with count(*) if x was NOT NULL, even though the arguments don't match. It might not be worth the gymnastics of recognizing the special case, and I certainly wouldn't recommend looking at that optimization in a first pass; but it might be worth jotting down on a list somewhere I think it's worth looking into that at some stage. I think I might have some of the code that would be required for the NULL checking over here - http://www.postgresql.org/message-id/CAApHDvqRB-iFBy68=dcgqs46arep7aun2pou4ktwl8kx9yo...@mail.gmail.com I'm just not so sure what the logic would be to decide when we could apply this. The only properties I can see that may be along the right lines are pg_proc.pronargs for int8inc and inc8inc_any. Actually a possible realistic solution to this just came to me. Add another property to pg_aggregate which stores the oid of an alternative aggregate function which can be used when all arguments to the transition function can be proved to be not null. This of course would be set to InvalidOid in most cases. But at least one good usage case would be COUNT(not_null_expr) would use COUNT(*). This likely could not be applied if DISTINCT or ORDER BY were used. I'm not proposing that we go and do this. More analysis would have to be done to see if the trade offs between extra code, extra planning time produce a net win overall. The aggregate code has already gotten quite a bit more complex over the past couple of releases. -- David Rowley http://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training Services