Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch
On 05/30/2014 12:53 AM, John Lumby wrote: Date: Thu, 29 May 2014 18:00:28 -0300 Subject: Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch From: klaussfre...@gmail.com To: hlinnakan...@vmware.com CC: johnlu...@hotmail.com; pgsql-hackers@postgresql.org On Thu, May 29, 2014 at 5:39 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 05/29/2014 11:34 PM, Claudio Freire wrote: On Thu, May 29, 2014 at 5:23 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 05/29/2014 04:12 PM, John Lumby wrote: On 05/28/2014 11:52 PM, John Lumby wrote: The patch seems to assume that you can put the aiocb struct in shared memory, initiate an asynchronous I/O request from one process, and wait for its completion from another process. I'm pretty surprised if that works on any platform. It works on linux.Actually this ability allows the asyncio implementation to reduce complexity in one respect (yes I know it looks complex enough) : it makes waiting for completion of an in-progress IO simpler than for the existing synchronous IO case,. since librt takes care of the waiting. specifically, no need for extra wait-for-io control blocks such as in bufmgr's WaitIO() [checks]. No, it doesn't work. See attached test program. Thanks for checkingand thanks for coming up with that test program. However, yes, it really does work -- always (on linux). Your test program is doing things in the wrong order - it calls aio_suspend *before* aio_error. However, the rule is, call aio_suspend *after* aio_error and *only* if aio_error returns EINPROGRESS. I see no such a rule in the man pages of any of the functions involved. And it wouldn't matter anyway; the behavior is exactly the same if you aio_error() first. See the code changes to fd.c function FileCompleteaio() to see how we have done it. And I am attaching corrected version of your test program which runs just fine. As Claudio mentioned earlier, the way FileCompleteaio() uses aio_suspend is just a complicated way of polling. You might as well replace the aio_suspend() calls with pg_usleep(). It kinda seems to work sometimes, because of the way it's implemented in glibc. The aiocb struct has a field for the result value and errno, and when the I/O is finished, the worker thread fills them in. aio_error() and aio_return() just return the values of those fields, so calling aio_error() or aio_return() do in fact happen to work from a different process. aio_suspend(), however, is implemented by sleeping on a process-local mutex, which does not work from a different process. Even if it worked on Linux today, it would be a bad idea to rely on it from a portability point of view. No, the only sane way to make this work is that the process that initiates an I/O request is responsible for completing it. If another process needs to wait for an async I/O to complete, we must use some other means to do the waiting. Like the io_in_progress_lock that we already have, for the same purpose. But calls to it are timeouted by 10us, effectively turning the thing into polling mode. We don't want polling... And even if we did, calling aio_suspend() in a way that's known to be broken, in a loop, is a pretty crappy way of polling. Well, as mentioned earlier, it is not broken. Whether it is efficient I am not sure. I have looked at the mutex in aio_suspend that you mentioned and I am not quite convinced that, if caller is not the original aio_read process, it renders the suspend() into an instant timeout. I will see if I can verify that. I don't see the point of pursuing this design further. Surely we don't want to use polling here, and you're relying on undefined behavior anyway. I'm pretty sure aio_return/aio_error won't work from a different process on all platforms, even if it happens to work on Linux. Even on Linux, it might stop working if the underlying implementation changes from the glibc pthread emulation to something kernel-based. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: Typo fixes in Solution.pm, part of MSVC scripts
On Fri, May 30, 2014 at 4:31 AM, Robert Haas robertmh...@gmail.com wrote: I suspect it would be good, then, to separate this into two patches, one for back-patch and one not. Attached are patches for master and 9.3. 9.2 and older versions are not impacted. Regards, -- Michael diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm index 9c4e9f7..473185f 100644 --- a/src/tools/msvc/Solution.pm +++ b/src/tools/msvc/Solution.pm @@ -756,8 +756,8 @@ sub new $self-{solutionFileVersion}= '12.00'; $self-{vcver} = '12.00'; $self-{visualStudioName} = 'Visual Studio 2013'; - $self-{VisualStudioVersion}= '12.0.21005.1', - $self-{MinimumVisualStudioVersion} = '10.0.40219.1', + $self-{VisualStudioVersion}= '12.0.21005.1'; + $self-{MinimumVisualStudioVersion} = '10.0.40219.1'; return $self; } diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm index 283d399..c0d7f38 100644 --- a/src/tools/msvc/Solution.pm +++ b/src/tools/msvc/Solution.pm @@ -747,13 +747,13 @@ sub new my $self = $classname-SUPER::_new(@_); bless($self, $classname); - $self-{solutionFileVersion} = '12.00'; - $self-{vcver}= '12.00'; - $self-{visualStudioName} = 'Visual Studio 2013'; - $self-{VisualStudioVersion} = '12.0.21005.1', - $self-{MinimumVisualStudioVersion} = '10.0.40219.1', + $self-{solutionFileVersion}= '12.00'; + $self-{vcver} = '12.00'; + $self-{visualStudioName} = 'Visual Studio 2013'; + $self-{VisualStudioVersion}= '12.0.21005.1'; + $self-{MinimumVisualStudioVersion} = '10.0.40219.1'; - return $self; + return $self; } sub GetAdditionalHeaders -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: Typo fixes in Solution.pm, part of MSVC scripts
On 05/30/2014 10:25 AM, Michael Paquier wrote: On Fri, May 30, 2014 at 4:31 AM, Robert Haas robertmh...@gmail.com wrote: I suspect it would be good, then, to separate this into two patches, one for back-patch and one not. Attached are patches for master and 9.3. 9.2 and older versions are not impacted. Thanks, applied. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fix xpath() to return namespace definitions
Hi all, While developing some XML processing queries, i stumbled on an old bug mentioned in http://wiki.postgresql.org/wiki/Todo#XML: Fix Nested or repeated xpath() that apparently mess up namespaces. Source of the bug is that libxml2's xmlNodeDump won't output XML namespace definitions that declared in the node's parents. As per https://bug639036.bugzilla-attachments.gnome.org/attachment.cgi?id=177858, the behavior is intentional. This patch uses function xmlCopyNode that copies a node, including its namespace definitions as required (without unused namespace in the node or its children). When the copy dumped, the resulting XML is complete with its namespaces. Calling xmlCopyNode will need additional memory to execute, but reimplementing its routine to handle namespace definition will introduce much complexity to the code. Note: This is my very first postgresql patch. -- Ali Akbar diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index 422be69..93e335c 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -3602,19 +3602,28 @@ xml_xmlnodetoxmltype(xmlNodePtr cur) if (cur-type == XML_ELEMENT_NODE) { xmlBufferPtr buf; + xmlNodePtr cur_copy; buf = xmlBufferCreate(); + + /* the result of xmlNodeDump won't contain namespace definitions, + * but libxml2 has xmlCopyNode that duplicates a node, along + * with its required namespace definitions. + */ + cur_copy = xmlCopyNode(cur, 1); PG_TRY(); { - xmlNodeDump(buf, NULL, cur, 0, 1); + xmlNodeDump(buf, NULL, cur_copy, 0, 1); result = xmlBuffer_to_xmltype(buf); } PG_CATCH(); { + xmlFreeNode(cur_copy); xmlBufferFree(buf); PG_RE_THROW(); } PG_END_TRY(); + xmlFreeNode(cur_copy); xmlBufferFree(buf); } else diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out index 382f9df..a6d26f7 100644 --- a/src/test/regress/expected/xml.out +++ b/src/test/regress/expected/xml.out @@ -584,6 +584,12 @@ SELECT xpath('//loc:piece/@id', 'local:data xmlns:local=http://127.0.0.1;loc {1,2} (1 row) +SELECT xpath('//loc:piece', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data', ARRAY[ARRAY['loc', 'http://127.0.0.1']]); + xpath + + {local:piece xmlns:local=\http://127.0.0.1\; id=\1\number one/local:piece,local:piece xmlns:local=\http://127.0.0.1\; id=\2\/} +(1 row) + SELECT xpath('//b', 'aone btwo/b three betc/b/a'); xpath - diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out index a34d1f4..c7bcf91 100644 --- a/src/test/regress/expected/xml_1.out +++ b/src/test/regress/expected/xml_1.out @@ -498,6 +498,12 @@ LINE 1: SELECT xpath('//loc:piece/@id', 'local:data xmlns:local=ht... ^ DETAIL: This functionality requires the server to be built with libxml support. HINT: You need to rebuild PostgreSQL using --with-libxml. +SELECT xpath('//loc:piece', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data', ARRAY[ARRAY['loc', 'http://127.0.0.1']]); +ERROR: unsupported XML feature +LINE 1: SELECT xpath('//loc:piece', 'local:data xmlns:local=http:/... +^ +DETAIL: This functionality requires the server to be built with libxml support. +HINT: You need to rebuild PostgreSQL using --with-libxml. SELECT xpath('//b', 'aone btwo/b three betc/b/a'); ERROR: unsupported XML feature LINE 1: SELECT xpath('//b', 'aone btwo/b three betc/b/a'... diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql index 90d4d67..241a5d6 100644 --- a/src/test/regress/sql/xml.sql +++ b/src/test/regress/sql/xml.sql @@ -174,6 +174,7 @@ SELECT xpath(NULL, NULL) IS NULL FROM xmltest; SELECT xpath('', '!-- error --'); SELECT xpath('//text()', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data'); SELECT xpath('//loc:piece/@id', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data', ARRAY[ARRAY['loc', 'http://127.0.0.1']]); +SELECT xpath('//loc:piece', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data', ARRAY[ARRAY['loc', 'http://127.0.0.1']]); SELECT xpath('//b', 'aone btwo/b three betc/b/a'); SELECT xpath('//text()', 'rootlt;/root'); SELECT xpath('//@value', 'root value=lt;/'); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_class.relpages/allvisible probably shouldn't be a int4
On 2014-05-11 23:30:37 +0200, Andres Freund wrote: On 2014-05-11 12:24:30 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-05-10 23:21:34 -0700, Peter Geoghegan wrote: On Fri, May 9, 2014 at 1:50 PM, Andres Freund and...@2ndquadrant.com wrote: And adding a proper unsigned type doesn't sound like a small amount of work. Perhaps not, but it's overdue. We ought to have one. Maybe. But there's so many things to decide around it that I don't think it's a good prerequisite for not showing essentially corrupted values in a supported scenario. It's a lot harder than it sounds at first; see past discussions about how we could shoehorn one into the numeric type hierarchy. And think about how C expressions that mix signed and unsigned inputs tend to give surprising results :-( Yea. I really don't like to take on such a major project to solve a minor problem. What I am thinking about right now is to expose a 'pg_blocknumber' type. That only does very basic operations and implicitly casts to int64. That's probably a much more handleable problem and it also might give us some more experience with unsigned types. Comments? As a note towards that: e.g. pageinspect deals with blocknumbers and uses int4 for that. That makes accessing the higher blocks really awkward... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_class.relpages/allvisible probably shouldn't be a int4
On 12 May 2014 08:15, Andres Freund and...@2ndquadrant.com wrote: But I concur that in practice, if you're dealing with 16TB tables, it's time to partition. Well, we need to improve our partitioning for that to be viable for all relations. Not having usable foreign and unique keys makes it a pita in some cases. As discussed, declarative partitioning is on the roadmap for this next release, so I would say lets just document that tablesizes above 16TB don't report correctly and move on. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_class.relpages/allvisible probably shouldn't be a int4
On 2014-05-30 12:00:47 +0100, Simon Riggs wrote: On 12 May 2014 08:15, Andres Freund and...@2ndquadrant.com wrote: But I concur that in practice, if you're dealing with 16TB tables, it's time to partition. Well, we need to improve our partitioning for that to be viable for all relations. Not having usable foreign and unique keys makes it a pita in some cases. As discussed, declarative partitioning is on the roadmap for this next release, so I would say lets just document that tablesizes above 16TB don't report correctly and move on. I doubt we'll fix all the snags - like foreign keys, unique keys, etc - that partitioning has in this release... Introducing a blocknumber type seems easy and mechanical enough. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Odd uuid-ossp behavior on smew and shearwater
On Fri, May 30, 2014 at 12:45 AM, Andrew Dunstan and...@dunslane.net wrote: On 05/29/2014 05:41 PM, Josh Kupershmidt wrote: On Thu, May 29, 2014 at 4:06 PM, Andrew Dunstan and...@dunslane.net wrote: Almost all my critters run in VMs (all but jacana and bowerbird). They're not running on OpenVZ, are they? `ifconfig` on shearwater says: [...] and it seems this all-zeros MAC address is a common (mis-?)configuration on OpenVZ: https://github.com/robertdavidgraham/masscan/issues/43 http://stackoverflow.com/questions/5838225/how-do-i- get-a-default-gridgain-node-in-openvz-discover-other-nodes-on-the-same http://forum.openvz.org/index.php?t=msggoto=8117 Yeah, that's a bit ugly. Mine are on qemu, one (sitella) is on Xen. That's likely the difference between real virtualization and containerization. It'd be interesting to see how it behaves on FreeBSD jails or LXC-containers. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
[HACKERS] jsonb access operators inefficiency
Hi! jsonb operators -text, -text,-int, -int use inefficient methods to access to needed field, proportional O(N/2). Attached patch suggests for text operators O(log(N)) and for integer - O(1). The fuctions with fast access already are existed in current code and are used in contains operation, for example. Attached patch uses that functions instead of sequentual loop over object/array. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ jsonb_access.patch.gz Description: Unix tar archive -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE REPLICATION SLOT fails on a timeout
On 05/28/2014 06:41 PM, Andres Freund wrote: Hi, Pushed a fix for it. I am pretty sure it will, but could you still test that it fixes your problem? Thanks! The fix seems to work (I am no longer getting the timeout on slot creation) Thanks Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch
On Fri, May 30, 2014 at 4:15 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: We don't want polling... And even if we did, calling aio_suspend() in a way that's known to be broken, in a loop, is a pretty crappy way of polling. Well, as mentioned earlier, it is not broken. Whether it is efficient I am not sure. I have looked at the mutex in aio_suspend that you mentioned and I am not quite convinced that, if caller is not the original aio_read process, it renders the suspend() into an instant timeout. I will see if I can verify that. I don't see the point of pursuing this design further. Surely we don't want to use polling here, and you're relying on undefined behavior anyway. I'm pretty sure aio_return/aio_error won't work from a different process on all platforms, even if it happens to work on Linux. Even on Linux, it might stop working if the underlying implementation changes from the glibc pthread emulation to something kernel-based. I'll try to do some measuring of performance with: a) git head b) git head + patch as-is c) git head + patch without aio_suspend in foreign processes (just re-read) d) git head + patch with a lwlock (or whatever works) instead of aio_suspend a-c will be the fastest, d might take some while. I'll let you know of the results as I get them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_class.relpages/allvisible probably shouldn't be a int4
Andres Freund wrote: On 2014-05-11 23:30:37 +0200, Andres Freund wrote: Yea. I really don't like to take on such a major project to solve a minor problem. What I am thinking about right now is to expose a 'pg_blocknumber' type. That only does very basic operations and implicitly casts to int64. That's probably a much more handleable problem and it also might give us some more experience with unsigned types. Comments? As a note towards that: e.g. pageinspect deals with blocknumbers and uses int4 for that. That makes accessing the higher blocks really awkward... Sounds like pg_blocknumber could be a good idea, but I wouldn't add the bigint implicit casts to int64. We might enlarge BlockNumber to unsigned 64 bits in the future -- we will regret the implicit casts to signed int64 then. Anyway it's not like implicit casts will get us any extra functionality. We can just add + and - operators, and that should cover 99% of what people would initially want ... Now, adding casts FROM int32 and int64 to pg_blocknumber doesn't sound bad. (Maybe have them error out if the value is negative?) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SP-GiST bug.
The point I'm making is that the scenario your test case exposes is not an infinite loop of picksplits, but an infinite loop of choose calls. Thank you, now I see what I missed before. After some brainstorming, it's possible to use '\0' only as end of string marker. The idea is: when we split allthesame tuple to upper/lower we copy node label to upper tuple instead of set it to '\0'. Node labels of lower tuple will be unchanged but should be ignored for reconstructionValue - and they are ignored because of allTheSame flag. See attached patch. Unfortunately, it will break on-disk compatibility. Although it will not cause a panic/error/coredump allTheSame approach will not find tuples. Users should recreate spgist indexes over text column. It's certainly possible that we could/should change what checkAllTheSame is doing --- on re-reading the comment, I'm not really sure that the scenario it's concerned about can happen. However, that will not fix I rewrited a patch to fix missed way - allTheSame result of picksplit and tooBig is set. I believe this patch is still needed because it could make a tree more efficient as it was demonstrated for quad tree. this patch doesn't break on-disk compatibility, although index recreation is recommended. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ spgist_textproc.patch.gz Description: Unix tar archive spgist_allthesame.patch.2.gz Description: Unix tar archive -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb access operators inefficiency
On 05/30/2014 09:35 AM, Teodor Sigaev wrote: Hi! jsonb operators -text, -text,-int, -int use inefficient methods to access to needed field, proportional O(N/2). Attached patch suggests for text operators O(log(N)) and for integer - O(1). The fuctions with fast access already are existed in current code and are used in contains operation, for example. Attached patch uses that functions instead of sequentual loop over object/array. Teodor, thanks for this. My only worry is this sort of code, which in a quick search I didn't find used elsewhere - (void) JsonbToCString(jtext, tjb-root , -1); - result = cstring_to_text_with_len(jtext-data, jtext-len); + appendStringInfoSpaces(jtext, VARHDRSZ); + (void) JsonbToCString(jtext, v-val.binary.data, -1); + + result = (text*)jtext-data; + SET_VARSIZE(result, jtext-len); If we're going to construct varlena objects inside a StringInfo, maybe we need a proper API for it. Isn't there a danger that data member of the StringInfo won't be properly aligned to allow us to do this? In any case, we should get most of the benefit of your patch without this optimization. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery testing for beta
On 05/29/2014 07:39 PM, Jeff Janes wrote: It also implicitly tested the xlog parallel write slots thing, as that is common code to all recovery. During development, I hit a lot of bugs in that patch by setting wal_buffers to 32kb (the minimum). Causes more backends to wait for each other, exposing deadlocks. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
On Fri, May 30, 2014 at 3:59 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: If transaction A commits synchronously with commit LSN 1, and transaction B commits asynchronously with commit LSN 2, B cannot become visible before A. And we cannot acknowledge B as committed to the client until it's visible to other transactions. That means that B will have to wait for A's commit record to be flushed to disk, before it can return, even though it was an asynchronous commit. I thought that's what happens now. What's more of a concern is synchronousl replication. We currently have a hack that makes transactions committed locally invisible to other transactions even though they've committed and synced to disk until the slave responds that it's received the transaction. (I think this is bogus personally, it just shifts the failure modes around. If we wanted to do it properly we would have to do two-phase commit.) I guess it still works because we don't support having synchronous replication for just some transactions and not others. It would be nice to support that but I think it would mean making it work like local synchronous commit. It would only affect how long the commit blocks, not when other transactions see the committed data. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SP-GiST bug.
On Fri, May 30, 2014 at 06:55:18PM +0400, Teodor Sigaev wrote: The point I'm making is that the scenario your test case exposes is not an infinite loop of picksplits, but an infinite loop of choose calls. Thank you, now I see what I missed before. After some brainstorming, it's possible to use '\0' only as end of string marker. The idea is: when we split allthesame tuple to upper/lower we copy node label to upper tuple instead of set it to '\0'. Node labels of lower tuple will be unchanged but should be ignored for reconstructionValue - and they are ignored because of allTheSame flag. See attached patch. Unfortunately, it will break on-disk compatibility. Although it will not cause a panic/error/coredump allTheSame approach will not find tuples. Users should recreate spgist indexes over text column. If we bump the system catalog version, pg_upgrade can mark those indexes as invalid and output a script to reindex them. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
On 05/30/2014 06:09 PM, Greg Stark wrote: On Fri, May 30, 2014 at 3:59 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: If transaction A commits synchronously with commit LSN 1, and transaction B commits asynchronously with commit LSN 2, B cannot become visible before A. And we cannot acknowledge B as committed to the client until it's visible to other transactions. That means that B will have to wait for A's commit record to be flushed to disk, before it can return, even though it was an asynchronous commit. I thought that's what happens now. What's more of a concern is synchronousl replication. We currently have a hack that makes transactions committed locally invisible to other transactions even though they've committed and synced to disk until the slave responds that it's received the transaction. (I think this is bogus personally, it just shifts the failure modes around. If we wanted to do it properly we would have to do two-phase commit.) Yeah. To recap, the failure mode is that if the master crashes and restarts, the transaction becomes visible in the master even though it was never replicated. I guess it still works because we don't support having synchronous replication for just some transactions and not others. It would be nice to support that but I think it would mean making it work like local synchronous commit. It would only affect how long the commit blocks, not when other transactions see the committed data. Actually, we do support that, see synchronous_commit=local. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
Hi, On 2014-05-30 17:59:23 +0300, Heikki Linnakangas wrote: So, here's a first version of the patch. Still very much WIP. Cool. One thorny issue came up in discussions with other hackers on this in PGCon: When a transaction is committed asynchronously, it becomes visible to other backends before the commit WAL record is flushed. With CSN-based snapshots, the order that transactions become visible is always based on the LSNs of the WAL records. This is a problem when there is a mix of synchronous and asynchronous commits: If transaction A commits synchronously with commit LSN 1, and transaction B commits asynchronously with commit LSN 2, B cannot become visible before A. And we cannot acknowledge B as committed to the client until it's visible to other transactions. That means that B will have to wait for A's commit record to be flushed to disk, before it can return, even though it was an asynchronous commit. I personally think that's annoying, but we can live with it. The most common usage of synchronous_commit=off is to run a lot of transactions in that mode, setting it in postgresql.conf. And it wouldn't completely defeat the purpose of mixing synchronous and asynchronous commits either: an asynchronous commit still only needs to wait for any already-logged synchronous commits to be flushed to disk, not the commit record of the asynchronous transaction itself. I have a hard time believing that users won't hate us for such a regression. It's pretty common to mix both sorts of transactions and this will - by my guesstimate - dramatically reduce throughput for the async backends. * Logical decoding is broken. I hacked on it enough that it looks roughly sane and it compiles, but didn't spend more time to debug. I think we can live with it not working for the first few iterations. I'll look into it once the patch has stabilized a bit. * I expanded pg_clog to 64-bits per XID, but people suggested keeping pg_clog as is, with two bits per commit, and adding a new SLRU for the commit LSNs beside it. Probably will need to do something like that to avoid bloating the clog. It also influences how on-disk compatibility is dealt with. So: How are you planning to deal with on-disk compatibility? * Add some kind of backend-private caching of clog, to make it faster to access. The visibility checks are now hitting the clog a lot more heavily than before, as you need to check the clog even if the hint bits are set, if the XID falls between xmin and xmax of the snapshot. That'll hurt a lot in concurrent scenarios :/. Have you measured how 'wide' xmax-xmin usually is? I wonder if we could just copy a range of values from the clog when we start scanning Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pivot aggregation with a patched intarray
Hello, (sorry for this long post) I have patched intarray with 3 additional functions in order to count[distinct] event IDs into arrays, whereas the array position correspond to the integer values. (mimic column oriented storage) I need to use an array for the event IDs as I don't know how many of them exist, and the list may increase slowly upon the time. The first results are quite promising. Although this approach is only usable for a narrow set of use cases (*), I wonder if I should look at other places in the source code to achieve a better implementation and if there are discussions or plan to enhance Postgres with some support for such kind of storage (vectors of int). (*) The array sizes should ideally not exceed a few tens and the number of events should be unknown, otherwise using one column per event would be faster) use case c: category s: session e: event int4 (small range, mostly less than 20) c s e - - - 1 1 1 1 1 1 1 1 3 1 2 1 2 2 3 3 1 5 WITH SES AS ( SELECT s, c, icount_to_array(e) ecount, iarray_flag(e) edistinct FROM foo GROUP BY s, c) SELECT c, iarray_sum(ecount) ecount, iarray_sum(edistinct)edistinct FROM SES GROUP BY c cecount edistinct ---- - 1[3,0,1] [2,0,1] 2[0,0,1] [0,0,1] 3[0,0,0,0,1] [0,0,0,0,1] e.g.: the event 1 was met 3 times in the category 1, in 2 distinct sessions. source code === The 3 aggregates use following functions: isumv: isumv([1,1,1],[0,0,1,3]) = [1,1,2,3]) iarray_addat : iarray_addat([3,3,3],2) = [3,4,3] iarray_setat : iarray_setat([0,0],2) = [0,1] iarray_setat([0,1],2) = [0,1] I've added them at the end of _int_op.c from intarray (attached) missing in code: checks for integer overflow and that the array lower bound are all 1. These are my first C lines ever and I've never learned it. Hence I'm open for critics ;-) I've started with this great posting by Craig Ringer: http://stackoverflow.com/questions/16992339/why-is-postgresql-array-access-so-much-faster-in-c-than-in-pl-pgsql?answertab=votes#tab-top The rest is mostly copy and paste from other parts of intarray. iarray_setat is just to set a bitmap position to 1, possibly enlarging it when required. It is a trivial modification from iarray_addat It should be possible to implement this more efficiently with bit[] or varbit, but this lies beyond my C capbilities. performances advantage of icount_to_array === As this aggregate allows to reduce the cardinality of GROUP BYs it often performs better than a vertical approach. With the horizontal storage, the result can be stored in smaller tables with much less rows, hence bringing more advantages when it comes to indices. e.g.: select icount_to_array(user_id) from foo {256655,0,0,1075,0,1,91154,36,0 (...) Aggregate (cost=36930.44..36930.45 rows=1 width=4) (actual time=333.378..333.378 rows=1 loops=1) - Seq Scan on foo (cost=0.00..32279.95 rows=1860195 width=4) (actual time=0.010..131.117 rows=1860179 loops=1) Total runtime: 333.420 ms select user_id, count(*) from foo group by user_id order by 1 Sort (cost=41582.75..41582.87 rows=48 width=4) (actual time=492.638..492.638 rows=79 loops=1) Sort Key: user_id Sort Method: quicksort Memory: 28kB - HashAggregate (cost=41580.93..41581.41 rows=48 width=4) (actual time=492.606..492.615 rows=79 loops=1) - Seq Scan on foo (cost=0.00..32279.95 rows=1860195 width=4) (actual time=0.010..128.800 rows=1860179 loops=1) Total runtime: 492.699 ms 1;256656 4;1075 7;91157 8;36 17;1455583 (...) It will swap later on - ... which result in a significant difference in some cases. create temp table ev AS SELECT * FROM ( generate_series(1,1000)s cross join generate_series(1,500)c cross join generate_series(1,15)e cross join generate_series(1,5) repeat ) explain analyze select s, c, icount_to_array(e) from ev group by s,c HashAggregate (cost=830575.54..830975.54 rows=4 width=12) (actual time=19188.384..19379.154 rows=50 loops=1) - Seq Scan on ev (cost=0.00..561487.31 rows=35878431 width=12) (actual time=0.046..4849.977 rows=3750 loops=1) Total runtime: 19399.151 ms explain analyze select s, c, e, count(*) from ev group by s,c,e GroupAggregate (cost=5589186.88..6073545.71 rows=3587844 width=12) (actual time=63290.168..89336.265 rows=750 loops=1) - Sort (cost=5589186.88..5678882.96 rows=35878431 width=12) (actual time=63290.156..83981.772 rows=3750 loops=1) Sort Key: s, c, e Sort Method: external merge Disk: 806424kB - Seq Scan on ev (cost=0.00..561487.31 rows=35878431 width=12) (actual time=0.039..4957.481 rows=3750 loops=1) Total runtime: 89680.844 ms Aggregates definition: ==
Re: [HACKERS] SP-GiST bug.
If we bump the system catalog version, pg_upgrade can mark those indexes as invalid and output a script to reindex them. Between 9.3 - 9.4 catalog version is changed anyway. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pivot aggregation with a patched intarray
(reposted, this time with attachment. sorry) Hello, (sorry for this long post) I have patched intarray with 3 additional functions in order to count[distinct] event IDs into arrays, whereas the array position correspond to the integer values. (mimic column oriented storage) I need to use an array for the event IDs as I don't know how many of them exist, and the list may increase slowly upon the time. The first results are quite promising. Although this approach is only usable for a narrow set of use cases (*), I wonder if I should look at other places in the source code to achieve a better implementation and if there are discussions or plan to enhance Postgres with some support for such kind of storage (vectors of int). (*) The array sizes should ideally not exceed a few tens and the number of events should be unknown, otherwise using one column per event would be faster) use case c: category s: session e: event int4 (small range, mostly less than 20) c s e - - - 1 1 1 1 1 1 1 1 3 1 2 1 2 2 3 3 1 5 WITH SES AS ( SELECT s, c, icount_to_array(e) ecount, iarray_flag(e) edistinct FROM foo GROUP BY s, c) SELECT c, iarray_sum(ecount) ecount, iarray_sum(edistinct)edistinct FROM SES GROUP BY c cecount edistinct ---- - 1[3,0,1] [2,0,1] 2[0,0,1] [0,0,1] 3[0,0,0,0,1] [0,0,0,0,1] e.g.: the event 1 was met 3 times in the category 1, in 2 distinct sessions. source code === The 3 aggregates use following functions: isumv: isumv([1,1,1],[0,0,1,3]) = [1,1,2,3]) iarray_addat : iarray_addat([3,3,3],2) = [3,4,3] iarray_setat : iarray_setat([0,0],2) = [0,1] iarray_setat([0,1],2) = [0,1] I've added them at the end of _int_op.c from intarray (attached) missing in code: checks for integer overflow and that the array lower bound are all 1. These are my first C lines ever and I've never learned it. Hence I'm open for critics ;-) I've started with this great posting by Craig Ringer: http://stackoverflow.com/questions/16992339/why-is-postgresql-array-access-so-much-faster-in-c-than-in-pl-pgsql?answertab=votes#tab-top The rest is mostly copy and paste from other parts of intarray. iarray_setat is just to set a bitmap position to 1, possibly enlarging it when required. It is a trivial modification from iarray_addat It should be possible to implement this more efficiently with bit[] or varbit, but this lies beyond my C capbilities. performances advantage of icount_to_array === As this aggregate allows to reduce the cardinality of GROUP BYs it often performs better than a vertical approach. With the horizontal storage, the result can be stored in smaller tables with much less rows, hence bringing more advantages when it comes to indices. e.g.: select icount_to_array(user_id) from foo {256655,0,0,1075,0,1,91154,36,0 (...) Aggregate (cost=36930.44..36930.45 rows=1 width=4) (actual time=333.378..333.378 rows=1 loops=1) - Seq Scan on foo (cost=0.00..32279.95 rows=1860195 width=4) (actual time=0.010..131.117 rows=1860179 loops=1) Total runtime: 333.420 ms select user_id, count(*) from foo group by user_id order by 1 Sort (cost=41582.75..41582.87 rows=48 width=4) (actual time=492.638..492.638 rows=79 loops=1) Sort Key: user_id Sort Method: quicksort Memory: 28kB - HashAggregate (cost=41580.93..41581.41 rows=48 width=4) (actual time=492.606..492.615 rows=79 loops=1) - Seq Scan on foo (cost=0.00..32279.95 rows=1860195 width=4) (actual time=0.010..128.800 rows=1860179 loops=1) Total runtime: 492.699 ms 1;256656 4;1075 7;91157 8;36 17;1455583 (...) It will swap later on - ... which result in a significant difference in some cases. create temp table ev AS SELECT * FROM ( generate_series(1,1000)s cross join generate_series(1,500)c cross join generate_series(1,15)e cross join generate_series(1,5) repeat ) explain analyze select s, c, icount_to_array(e) from ev group by s,c HashAggregate (cost=830575.54..830975.54 rows=4 width=12) (actual time=19188.384..19379.154 rows=50 loops=1) - Seq Scan on ev (cost=0.00..561487.31 rows=35878431 width=12) (actual time=0.046..4849.977 rows=3750 loops=1) Total runtime: 19399.151 ms explain analyze select s, c, e, count(*) from ev group by s,c,e GroupAggregate (cost=5589186.88..6073545.71 rows=3587844 width=12) (actual time=63290.168..89336.265 rows=750 loops=1) - Sort (cost=5589186.88..5678882.96 rows=35878431 width=12) (actual time=63290.156..83981.772 rows=3750 loops=1) Sort Key: s, c, e Sort Method: external merge Disk: 806424kB - Seq Scan on ev (cost=0.00..561487.31 rows=35878431 width=12) (actual time=0.039..4957.481 rows=3750 loops=1) Total runtime: 89680.844 ms
Re: [HACKERS] jsonb access operators inefficiency
On 05/30/2014 11:08 AM, Andrew Dunstan wrote: On 05/30/2014 09:35 AM, Teodor Sigaev wrote: Hi! jsonb operators -text, -text,-int, -int use inefficient methods to access to needed field, proportional O(N/2). Attached patch suggests for text operators O(log(N)) and for integer - O(1). The fuctions with fast access already are existed in current code and are used in contains operation, for example. Attached patch uses that functions instead of sequentual loop over object/array. Teodor, thanks for this. My only worry is this sort of code, which in a quick search I didn't find used elsewhere - (void) JsonbToCString(jtext, tjb-root , -1); - result = cstring_to_text_with_len(jtext-data, jtext-len); + appendStringInfoSpaces(jtext, VARHDRSZ); + (void) JsonbToCString(jtext, v-val.binary.data, -1); + + result = (text*)jtext-data; + SET_VARSIZE(result, jtext-len); If we're going to construct varlena objects inside a StringInfo, maybe we need a proper API for it. Isn't there a danger that data member of the StringInfo won't be properly aligned to allow us to do this? In any case, we should get most of the benefit of your patch without this optimization. I see that palloc.h says: The result of palloc() is always word-aligned so maybe my alignment fear is misplaced. So my remaining question is whether this is OK stylistically. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb access operators inefficiency
If we're going to construct varlena objects inside a StringInfo, maybe we need a proper API for it. Isn't there a danger that data member of the StringInfo won't be properly aligned to allow us to do this? In any case, we should get most of the benefit of your patch without this optimization. I believe that StringInfo-data is palloc'ed, it means it's MAXALIGNed. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb access operators inefficiency
I see that palloc.h says: The result of palloc() is always word-aligned void * repalloc(void *pointer, Size size) { ... /* * Try to detect bogus pointers handed to us, poorly though we can. * Presumably, a pointer that isn't MAXALIGNED isn't pointing at an * allocated chunk. */ Assert(pointer != NULL); Assert(pointer == (void *) MAXALIGN(pointer)); ... so maybe my alignment fear is misplaced. So my remaining question is whether this is OK stylistically. Something like this? initStringInfoVarlena()/makeStringInfoVarlena() { initStringInfo() appendStringInfoSpaces(jtext, VARHDRSZ); } char* formStringInfoVarlena() { SET_VARSIZE(jtext-data, jtext-len); return jtext-data; } -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb access operators inefficiency
The patch really improves access performance to jsonb. On the delicious bookmarks I got 5 times better performance.Now jsonb outperforms json on simple access (slide 12 of pgcon presentation) by 103 times ! Oleg On Fri, May 30, 2014 at 9:35 AM, Teodor Sigaev teo...@sigaev.ru wrote: Hi! jsonb operators -text, -text,-int, -int use inefficient methods to access to needed field, proportional O(N/2). Attached patch suggests for text operators O(log(N)) and for integer - O(1). The fuctions with fast access already are existed in current code and are used in contains operation, for example. Attached patch uses that functions instead of sequentual loop over object/array. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb access operators inefficiency
On 05/30/2014 01:30 PM, Oleg Bartunov wrote: The patch really improves access performance to jsonb. On the delicious bookmarks I got 5 times better performance.Now jsonb outperforms json on simple access (slide 12 of pgcon presentation) by 103 times ! (Oleg, please try not to top-post) The question is whether the speedup comes from the reduction in lookup times or the reduction in string copying. I have a strong suspicion that it's mostly the first, not the second. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb access operators inefficiency
Andrew Dunstan and...@dunslane.net writes: If we're going to construct varlena objects inside a StringInfo, maybe we need a proper API for it. Isn't there a danger that data member of the StringInfo won't be properly aligned to allow us to do this? In any case, we should get most of the benefit of your patch without this optimization. As noted, the data buffer is maxaligned; but nonetheless I agree that this is a serious stylistic abuse, and it's not buying much compared to the rest of the patch. I'd stick with the cstring_to_text_with_len coding. At the other end of the process, why are we using PG_GETARG_TEXT_P and not PG_GETARG_TEXT_PP to avoid a detoast cycle on short-header inputs? The function body is using VARDATA_ANY/VARSIZE_ANY_EXHDR, so it's already prepared for unaligned input. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb access operators inefficiency
The question is whether the speedup comes from the reduction in lookup times or the reduction in string copying. I have a strong suspicion that it's mostly the first, not the second. Agree with about Oleg's test, but some users put huge values into json... Actually, I don't insist, ITSM, that's easy way to prevent extra copying. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
On 05/30/2014 06:27 PM, Andres Freund wrote: On 2014-05-30 17:59:23 +0300, Heikki Linnakangas wrote: One thorny issue came up in discussions with other hackers on this in PGCon: When a transaction is committed asynchronously, it becomes visible to other backends before the commit WAL record is flushed. With CSN-based snapshots, the order that transactions become visible is always based on the LSNs of the WAL records. This is a problem when there is a mix of synchronous and asynchronous commits: If transaction A commits synchronously with commit LSN 1, and transaction B commits asynchronously with commit LSN 2, B cannot become visible before A. And we cannot acknowledge B as committed to the client until it's visible to other transactions. That means that B will have to wait for A's commit record to be flushed to disk, before it can return, even though it was an asynchronous commit. I personally think that's annoying, but we can live with it. The most common usage of synchronous_commit=off is to run a lot of transactions in that mode, setting it in postgresql.conf. And it wouldn't completely defeat the purpose of mixing synchronous and asynchronous commits either: an asynchronous commit still only needs to wait for any already-logged synchronous commits to be flushed to disk, not the commit record of the asynchronous transaction itself. I have a hard time believing that users won't hate us for such a regression. It's pretty common to mix both sorts of transactions and this will - by my guesstimate - dramatically reduce throughput for the async backends. Yeah, it probably would. Not sure how many people would care. For an asynchronous commit, we could store the current WAL flush location as the commit LSN, instead of the location of the commit record. That would break the property that LSN == commit order, but that property is fundamentally incompatible with having async commits become visible without flushing previous transactions. Or we could even make it configurable, it would be fairly easy to support both behaviors. * Logical decoding is broken. I hacked on it enough that it looks roughly sane and it compiles, but didn't spend more time to debug. I think we can live with it not working for the first few iterations. I'll look into it once the patch has stabilized a bit. Thanks! * I expanded pg_clog to 64-bits per XID, but people suggested keeping pg_clog as is, with two bits per commit, and adding a new SLRU for the commit LSNs beside it. Probably will need to do something like that to avoid bloating the clog. It also influences how on-disk compatibility is dealt with. So: How are you planning to deal with on-disk compatibility? * Add some kind of backend-private caching of clog, to make it faster to access. The visibility checks are now hitting the clog a lot more heavily than before, as you need to check the clog even if the hint bits are set, if the XID falls between xmin and xmax of the snapshot. That'll hurt a lot in concurrent scenarios :/. Have you measured how 'wide' xmax-xmin usually is? That depends entirely on the workload. The worst case is a mix of a long-running transaction and a lot of short transaction. It could grow to millions of transactions or more in that case. I wonder if we could just copy a range of values from the clog when we start scanning I don't think that's practical, if the xmin-xmax gap is wide. Perhaps we should take the bull by the horns and make clog faster to look up. If we e.g. mmapped the clog file into backend-private address space, we could all the locking overhead of an SLRU. On platforms with atomic 64-bit instructions, you could read the clog with just a memory barrier. Even on other architectures, you'd only need a spinlock. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
(forgot to answer this question) On 05/30/2014 06:27 PM, Andres Freund wrote: On 2014-05-30 17:59:23 +0300, Heikki Linnakangas wrote: * I expanded pg_clog to 64-bits per XID, but people suggested keeping pg_clog as is, with two bits per commit, and adding a new SLRU for the commit LSNs beside it. Probably will need to do something like that to avoid bloating the clog. It also influences how on-disk compatibility is dealt with. So: How are you planning to deal with on-disk compatibility? Have pg_upgrade read the old clog and write it out in the new format. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] unable to build postgres-9.4 in os x 10.9 with python
Asif Naeem anaeem...@gmail.com writes: On Fri, May 30, 2014 at 9:49 PM, reiner peterke zedaa...@drizzle.com wrote: Since upgrading my mac from os x 10.8 to 10.9, i can no long build postgres with '--with-pythonâ. Latest PG 9.4 source code seems building fine with --with-python option on my OS X 10.9.3 box i.e. [ please don't top-post in PG mailing lists ] I can reproduce the failure as described, not only in 9.4 but the back branches too; I would've noticed sooner except I don't build plpython routinely. It looks to me like Apple broke something in the build toolchain. If you add -v as suggested, what you see is $ make PROFILE=-v gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -g -v -bundle -multiply_defined suppress -o plpython2.so plpy_cursorobject.o plpy_elog.o plpy_exec.o plpy_main.o plpy_planobject.o plpy_plpymodule.o plpy_procedure.o plpy_resultobject.o plpy_spi.o plpy_subxactobject.o plpy_typeio.o plpy_util.o -L../../../src/port -L../../../src/common -Wl,-dead_strip_dylibs -v -F/System/Library/Frameworks -framework Python -bundle_loader ../../../src/backend/postgres Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn) Target: x86_64-apple-darwin13.2.0 Thread model: posix /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld -demangle -dynamic -arch x86_64 -bundle -bundle_loader ../../../src/backend/postgres -macosx_version_min 10.9.0 -multiply_defined suppress -syslibroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.9.sdk -o plpython2.so -L../../../src/port -L../../../src/common plpy_cursorobject.o plpy_elog.o plpy_exec.o plpy_main.o plpy_planobject.o plpy_plpymodule.o plpy_procedure.o plpy_resultobject.o plpy_spi.o plpy_subxactobject.o plpy_typeio.o plpy_util.o -dead_strip_dylibs -framework Python -lSystem /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/5.1/lib/darwin/libclang_rt.osx.a -F/System/Library/Frameworks ld: framework not found Python clang: error: linker command failed with exit code 1 (use -v to see invocation) make: *** [plpython2.so] Error 1 The problem is that that -syslibroot option modifies where to look for frameworks; man ld quoth -syslibroot rootdir Prepend rootdir to all search paths when searching for libraries or frameworks. If you do the ld call by hand without the -syslibroot option, it works. AFAICS it could never have worked with such an option, so I'm thinking this is some new misbehavior in the latest version of Xcode. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] unable to build postgres-9.4 in os x 10.9 with python
On 05/30/2014 11:52 AM, Tom Lane wrote: Asif Naeem anaeem...@gmail.com writes: On Fri, May 30, 2014 at 9:49 PM, reiner peterke zedaa...@drizzle.com wrote: Since upgrading my mac from os x 10.8 to 10.9, i can no long build postgres with '--with-python’. Latest PG 9.4 source code seems building fine with --with-python option on my OS X 10.9.3 box i.e. [ please don't top-post in PG mailing lists ] I can reproduce the failure as described, not only in 9.4 but the back branches too; I would've noticed sooner except I don't build plpython routinely. It looks to me like Apple broke something in the build toolchain. If you add -v as suggested, what you see is If you do the ld call by hand without the -syslibroot option, it works. AFAICS it could never have worked with such an option, so I'm thinking this is some new misbehavior in the latest version of Xcode. There is and the SO thread that goes into detail on this is here: http://stackoverflow.com/questions/19555395/python-framework-is-missing-from-os-x-10-9-sdk-why-also-workaround The Apple document referenced in above is: https://developer.apple.com/library/ios/technotes/tn2328/_index.html regards, tom lane -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SP-GiST bug.
On Fri, May 30, 2014 at 08:19:07PM +0400, Teodor Sigaev wrote: If we bump the system catalog version, pg_upgrade can mark those indexes as invalid and output a script to reindex them. Between 9.3 - 9.4 catalog version is changed anyway. Right. I was thinking of beta users between beta1 and beta2 as well. We would need to trigger the index invalidation on the catalog version used for the commit that changes the index format, not the major version like 9.4. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] unable to build postgres-9.4 in os x 10.9 with python
Adrian Klaver adrian.kla...@aklaver.com writes: On 05/30/2014 11:52 AM, Tom Lane wrote: If you do the ld call by hand without the -syslibroot option, it works. AFAICS it could never have worked with such an option, so I'm thinking this is some new misbehavior in the latest version of Xcode. There is and the SO thread that goes into detail on this is here: http://stackoverflow.com/questions/19555395/python-framework-is-missing-from-os-x-10-9-sdk-why-also-workaround The Apple document referenced in above is: https://developer.apple.com/library/ios/technotes/tn2328/_index.html Fun. So after all these years of catering to Apple's preferred weirdness in this regard, they reverse course and tell us to do it like everywhere else. I experimented with just diking out the python_framework case in configure, and that *almost* works; but for some reason distutils doesn't admit to having a shared library, so you also have to override that test. The attached patch fixes this and also removes a long-obsolete comment claiming that we don't work with Python 2.5 on OSX (see prairiedog for evidence to the contrary). I've tested this successfully on my 10.9.3 laptop as well as on dromedary and prairiedog, so I'm thinking we should patch not only HEAD but all the back branches. Any objections? regards, tom lane PS: why aren't any of the buildfarm members using 10.9 building --with-python? We should have known about this months ago, ISTM. diff --git a/config/python.m4 b/config/python.m4 index 5cb2854..7012c53 100644 *** a/config/python.m4 --- b/config/python.m4 *** python_libdir=`${PYTHON} -c import dist *** 68,81 python_ldlibrary=`${PYTHON} -c import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('LDLIBRARY'` python_so=`${PYTHON} -c import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('SO'` ldlibrary=`echo ${python_ldlibrary} | sed s/${python_so}$//` - python_framework=`${PYTHON} -c import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('PYTHONFRAMEWORK'` python_enable_shared=`${PYTHON} -c import distutils.sysconfig; print(distutils.sysconfig.get_config_vars().get('Py_ENABLE_SHARED',0))` ! if test -n $python_framework; then ! python_frameworkprefix=`${PYTHON} -c import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('PYTHONFRAMEWORKPREFIX'` ! python_libspec=-F${python_frameworkprefix} -framework $python_framework ! python_enable_shared=1 ! elif test x${python_libdir} != x -a x${python_ldlibrary} != x -a x${python_ldlibrary} != x${ldlibrary} then # New way: use the official shared library ldlibrary=`echo ${ldlibrary} | sed s/^lib//` --- 68,76 python_ldlibrary=`${PYTHON} -c import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('LDLIBRARY'` python_so=`${PYTHON} -c import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('SO'` ldlibrary=`echo ${python_ldlibrary} | sed s/${python_so}$//` python_enable_shared=`${PYTHON} -c import distutils.sysconfig; print(distutils.sysconfig.get_config_vars().get('Py_ENABLE_SHARED',0))` ! if test x${python_libdir} != x -a x${python_ldlibrary} != x -a x${python_ldlibrary} != x${ldlibrary} then # New way: use the official shared library ldlibrary=`echo ${ldlibrary} | sed s/^lib//` *** else *** 91,99 python_libspec=-L${python_libdir} -lpython${python_ldversion} fi ! if test -z $python_framework; then ! python_additional_libs=`${PYTHON} -c import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('LIBS','LIBC','LIBM','BASEMODLIBS'` ! fi AC_MSG_RESULT([${python_libspec} ${python_additional_libs}]) --- 86,92 python_libspec=-L${python_libdir} -lpython${python_ldversion} fi ! python_additional_libs=`${PYTHON} -c import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('LIBS','LIBC','LIBM','BASEMODLIBS'` AC_MSG_RESULT([${python_libspec} ${python_additional_libs}]) diff --git a/src/pl/plpython/Makefile b/src/pl/plpython/Makefile index 46d2030..020861a 100644 *** a/src/pl/plpython/Makefile --- b/src/pl/plpython/Makefile *** include $(top_builddir)/src/Makefile.glo *** 9,24 # asks Python directly. But because this has been broken in Debian # for a long time (http://bugs.debian.org/695979), and to support # older Python versions, we see if there is a file that is named like ! # a shared library as a fallback. (Note that this is wrong on OS X, ! # where DLSUFFIX is .so, but libpython is a .dylib. Python 2.5 is ! # therefore not supported on OS X.) ifeq (1,$(python_enable_shared)) shared_libpython = yes else ifneq (,$(wildcard $(python_libdir)/libpython*$(DLSUFFIX)*)) shared_libpython = yes endif endif
Re: [HACKERS] Proposing pg_hibernate
On Tue, May 27, 2014 at 10:01 PM, Gurjeet Singh gurj...@singh.im wrote: When the Postgres server is being stopped/shut down, the `Buffer Saver` scans the shared-buffers of Postgres, and stores the unique block identifiers of each cached block to the disk. This information is saved under the `$PGDATA/pg_hibernator/` directory. For each of the database whose blocks are resident in shared buffers, one file is created; for eg.: `$PGDATA/pg_hibernator/2.postgres.save`. This file-naming convention seems a bit fragile. For example, on my filesystem (HFS) if I create a database named foo / bar, I'll get a complaint like: ERROR: could not open pg_hibernator/5.foo / bar.save: No such file or directory during shutdown. Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery testing for beta
On Thu, May 29, 2014 at 8:15 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, May 29, 2014 at 10:09 PM, Jeff Janes jeff.ja...@gmail.com wrote: What features in 9.4 need more beta testing for recovery? Another feature which have interaction with recovery is reduced WAL for Update operation: http://www.postgresql.org/message-id/e1wnqjm-0003cz...@gemulon.postgresql.org Commit: a3115f It looks like this is something which is always on, so it should be getting a good test without me taking any special steps. But is there some way, for example something in xlogdump, to assess how often this is getting activated? Thanks, Jeff
Re: [HACKERS] [GENERAL] unable to build postgres-9.4 in os x 10.9 with python
On 05/30/2014 01:31 PM, Tom Lane wrote: Adrian Klaver adrian.kla...@aklaver.com writes: On 05/30/2014 11:52 AM, Tom Lane wrote: If you do the ld call by hand without the -syslibroot option, it works. AFAICS it could never have worked with such an option, so I'm thinking this is some new misbehavior in the latest version of Xcode. There is and the SO thread that goes into detail on this is here: http://stackoverflow.com/questions/19555395/python-framework-is-missing-from-os-x-10-9-sdk-why-also-workaround The Apple document referenced in above is: https://developer.apple.com/library/ios/technotes/tn2328/_index.html Fun. So after all these years of catering to Apple's preferred weirdness in this regard, they reverse course and tell us to do it like everywhere else. Mavericks, the gift that keeps giving:) -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery testing for beta
On Thu, May 29, 2014 at 09:39:56AM -0700, Jeff Janes wrote: I've applied my partial-write testing harness to several scenarios in 9.4. So far its found a recovery bug for gin indexes, a recovery bug for btree, a vacuum bug for btree indexes (with foreign keys, but that is not relevant to the bug), and nothing of interest for gist index, although it only tested where text_array @@ to_tsquery(?) queries. It also implicitly tested the xlog parallel write slots thing, as that is common code to all recovery. I also applied the foreign key test retroactively to 9.3, and it quickly re-found the multixact bugs up until commit 9a57858f1103b89a5674. The test was designed only with the knowledge that the bugs involved foreign keys and the consumption of multixacts. I had no deeper knowledge of the details of those bugs when designing the test, so I have a reasonable amount of confidence that this could have found them in real time had I bothered to try to test the feature during the previous beta cycle. Impressive. This testing is of great value to the project. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery testing for beta
On Sat, May 31, 2014 at 3:51 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Thu, May 29, 2014 at 8:15 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, May 29, 2014 at 10:09 PM, Jeff Janes jeff.ja...@gmail.com wrote: What features in 9.4 need more beta testing for recovery? Another feature which have interaction with recovery is reduced WAL for Update operation: http://www.postgresql.org/message-id/e1wnqjm-0003cz...@gemulon.postgresql.org Commit: a3115f It looks like this is something which is always on, so it should be getting a good test without me taking any special steps. But is there some way, for example something in xlogdump, to assess how often this is getting activated? Currently there is no simple way to get this information, but with attached simple patch you can get this information by pg_xlogdump. It will give you information in below way: rmgr: Heaplen (rec/tot): 47/79, tx:690, lsn: 0/0375B4A0, prev 0/0375B470, bkp: , desc: hot_update: rel 1663/12125/24580; tid 0/1 xma x 690 ; new tid 0/2 xmax 0; *compressed tuple:* *suffix encoded* rmgr: Heaplen (rec/tot): 53/85, tx:691, lsn: 0/0375B520, prev 0/0375B4F0, bkp: , desc: hot_update: rel 1663/12125/24580; tid 0/2 xma x 691 ; new tid 0/3 xmax 0; *compressed tuple: prefix encoded* rmgr: Heaplen (rec/tot): 56/88, tx:692, lsn: 0/0375B5A8, prev 0/0375B578, bkp: , desc: hot_update: rel 1663/12125/24580; tid 0/3 xma x 692 ; new tid 0/4 xmax 0; *uncompressed tuple* I think this is useful information and can be even included in core code. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com pgxlogdump_output_compressed_tuple_info-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers