Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-30 Thread Heikki Linnakangas

On 05/30/2014 12:53 AM, John Lumby wrote:

Date: Thu, 29 May 2014 18:00:28 -0300
Subject: Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal 
and patch
From: klaussfre...@gmail.com
To: hlinnakan...@vmware.com
CC: johnlu...@hotmail.com; pgsql-hackers@postgresql.org

On Thu, May 29, 2014 at 5:39 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

On 05/29/2014 11:34 PM, Claudio Freire wrote:


On Thu, May 29, 2014 at 5:23 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:


On 05/29/2014 04:12 PM, John Lumby wrote:




On 05/28/2014 11:52 PM, John Lumby wrote:

The patch seems to assume that you can put the aiocb struct in shared
memory, initiate an asynchronous I/O request from one process, and wait
for its completion from another process. I'm pretty surprised if that
works on any platform.



It works on linux.Actually this ability allows the asyncio
implementation to
reduce complexity in one respect (yes I know it looks complex enough) :
it makes waiting for completion of an in-progress IO simpler than for
the existing synchronous IO case,.   since librt takes care of the
waiting.
specifically,   no need for extra wait-for-io control blocks
such as in bufmgr's  WaitIO()



[checks]. No, it doesn't work. See attached test program.


Thanks for checkingand thanks for coming up with that test program.
However,  yes,  it really does work  --  always  (on linux).
Your test program is doing things in the wrong order -
it calls aio_suspend *before* aio_error.
However,  the rule is,  call aio_suspend *after* aio_error
and *only* if aio_error returns EINPROGRESS.


I see no such a rule in the man pages of any of the functions involved. 
And it wouldn't matter anyway; the behavior is exactly the same if you 
aio_error() first.



See the code changes to fd.c function FileCompleteaio()
to see how we have done it.   And I am attaching corrected version
of your test program which runs just fine.


As Claudio mentioned earlier, the way FileCompleteaio() uses aio_suspend 
is just a complicated way of polling. You might as well replace the 
aio_suspend() calls with pg_usleep().



It kinda seems to work sometimes, because of the way it's implemented in
glibc. The aiocb struct has a field for the result value and errno, and
when
the I/O is finished, the worker thread fills them in. aio_error() and
aio_return() just return the values of those fields, so calling
aio_error()
or aio_return() do in fact happen to work from a different process.
aio_suspend(), however, is implemented by sleeping on a process-local
mutex,
which does not work from a different process.

Even if it worked on Linux today, it would be a bad idea to rely on it
from
a portability point of view. No, the only sane way to make this work is
that
the process that initiates an I/O request is responsible for completing
it.
If another process needs to wait for an async I/O to complete, we must
use
some other means to do the waiting. Like the io_in_progress_lock that we
already have, for the same purpose.


But calls to it are timeouted by 10us, effectively turning the thing
into polling mode.


We don't want polling... And even if we did, calling aio_suspend() in a way
that's known to be broken, in a loop, is a pretty crappy way of polling.


Well,  as mentioned earlier,  it is not broken. Whether it is efficient I 
am not sure.
I have looked at the mutex in aio_suspend that you mentioned and I am not
quite convinced that,  if caller is not the original aio_read process,
it renders the suspend() into an instant timeout.  I will see if I can 
verify that.


I don't see the point of pursuing this design further. Surely we don't 
want to use polling here, and you're relying on undefined behavior 
anyway. I'm pretty sure aio_return/aio_error won't work from a different 
process on all platforms, even if it happens to work on Linux. Even on 
Linux, it might stop working if the underlying implementation changes 
from the glibc pthread emulation to something kernel-based.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: Typo fixes in Solution.pm, part of MSVC scripts

2014-05-30 Thread Michael Paquier
On Fri, May 30, 2014 at 4:31 AM, Robert Haas robertmh...@gmail.com wrote:
 I suspect it would be good, then, to separate this into two patches,
 one for back-patch and one not.
Attached are patches for master and 9.3. 9.2 and older versions are
not impacted.
Regards,
-- 
Michael
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index 9c4e9f7..473185f 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -756,8 +756,8 @@ sub new
 	$self-{solutionFileVersion}= '12.00';
 	$self-{vcver}  = '12.00';
 	$self-{visualStudioName}   = 'Visual Studio 2013';
-	$self-{VisualStudioVersion}= '12.0.21005.1',
-	$self-{MinimumVisualStudioVersion} = '10.0.40219.1',
+	$self-{VisualStudioVersion}= '12.0.21005.1';
+	$self-{MinimumVisualStudioVersion} = '10.0.40219.1';
 
 	return $self;
 }
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index 283d399..c0d7f38 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -747,13 +747,13 @@ sub new
 	my $self  = $classname-SUPER::_new(@_);
 	bless($self, $classname);
 
-	$self-{solutionFileVersion}  = '12.00';
-	$self-{vcver}= '12.00';
-	$self-{visualStudioName} = 'Visual Studio 2013';
-	$self-{VisualStudioVersion}  = '12.0.21005.1',
-	  $self-{MinimumVisualStudioVersion} = '10.0.40219.1',
+	$self-{solutionFileVersion}= '12.00';
+	$self-{vcver}  = '12.00';
+	$self-{visualStudioName}   = 'Visual Studio 2013';
+	$self-{VisualStudioVersion}= '12.0.21005.1';
+	$self-{MinimumVisualStudioVersion} = '10.0.40219.1';
 
-	  return $self;
+	return $self;
 }
 
 sub GetAdditionalHeaders

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: Typo fixes in Solution.pm, part of MSVC scripts

2014-05-30 Thread Heikki Linnakangas

On 05/30/2014 10:25 AM, Michael Paquier wrote:

On Fri, May 30, 2014 at 4:31 AM, Robert Haas robertmh...@gmail.com wrote:

I suspect it would be good, then, to separate this into two patches,
one for back-patch and one not.

Attached are patches for master and 9.3. 9.2 and older versions are
not impacted.


Thanks, applied.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fix xpath() to return namespace definitions

2014-05-30 Thread Ali Akbar
Hi all,

While developing some XML processing queries, i stumbled on an old bug
mentioned in http://wiki.postgresql.org/wiki/Todo#XML: Fix Nested or
repeated xpath() that apparently mess up namespaces.

Source of the bug is that libxml2's xmlNodeDump won't output XML namespace
definitions that declared in the node's parents. As per
https://bug639036.bugzilla-attachments.gnome.org/attachment.cgi?id=177858,
the behavior is intentional.

This patch uses function xmlCopyNode that copies a node, including its
namespace definitions as required (without unused namespace in the node or
its children). When the copy dumped, the resulting XML is complete with its
namespaces. Calling xmlCopyNode will need additional memory to execute, but
reimplementing its routine to handle namespace definition will introduce
much complexity to the code.

Note: This is my very first postgresql patch.

-- 
Ali Akbar
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 422be69..93e335c 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3602,19 +3602,28 @@ xml_xmlnodetoxmltype(xmlNodePtr cur)
 	if (cur-type == XML_ELEMENT_NODE)
 	{
 		xmlBufferPtr buf;
+		xmlNodePtr cur_copy;
 
 		buf = xmlBufferCreate();
+
+		/* the result of xmlNodeDump won't contain namespace definitions, 
+		 * but libxml2 has xmlCopyNode that duplicates a node, along
+		 * with its required namespace definitions. 
+		 */
+		cur_copy = xmlCopyNode(cur, 1);
 		PG_TRY();
 		{
-			xmlNodeDump(buf, NULL, cur, 0, 1);
+			xmlNodeDump(buf, NULL, cur_copy, 0, 1);
 			result = xmlBuffer_to_xmltype(buf);
 		}
 		PG_CATCH();
 		{
+			xmlFreeNode(cur_copy);
 			xmlBufferFree(buf);
 			PG_RE_THROW();
 		}
 		PG_END_TRY();
+		xmlFreeNode(cur_copy);
 		xmlBufferFree(buf);
 	}
 	else
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 382f9df..a6d26f7 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -584,6 +584,12 @@ SELECT xpath('//loc:piece/@id', 'local:data xmlns:local=http://127.0.0.1;loc
  {1,2}
 (1 row)
 
+SELECT xpath('//loc:piece', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+ xpath  
+
+ {local:piece xmlns:local=\http://127.0.0.1\; id=\1\number one/local:piece,local:piece xmlns:local=\http://127.0.0.1\; id=\2\/}
+(1 row)
+
 SELECT xpath('//b', 'aone btwo/b three betc/b/a');
   xpath  
 -
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index a34d1f4..c7bcf91 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -498,6 +498,12 @@ LINE 1: SELECT xpath('//loc:piece/@id', 'local:data xmlns:local=ht...
 ^
 DETAIL:  This functionality requires the server to be built with libxml support.
 HINT:  You need to rebuild PostgreSQL using --with-libxml.
+SELECT xpath('//loc:piece', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+ERROR:  unsupported XML feature
+LINE 1: SELECT xpath('//loc:piece', 'local:data xmlns:local=http:/...
+^
+DETAIL:  This functionality requires the server to be built with libxml support.
+HINT:  You need to rebuild PostgreSQL using --with-libxml.
 SELECT xpath('//b', 'aone btwo/b three betc/b/a');
 ERROR:  unsupported XML feature
 LINE 1: SELECT xpath('//b', 'aone btwo/b three betc/b/a'...
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 90d4d67..241a5d6 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -174,6 +174,7 @@ SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
 SELECT xpath('', '!-- error --');
 SELECT xpath('//text()', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data');
 SELECT xpath('//loc:piece/@id', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+SELECT xpath('//loc:piece', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
 SELECT xpath('//b', 'aone btwo/b three betc/b/a');
 SELECT xpath('//text()', 'rootlt;/root');
 SELECT xpath('//@value', 'root value=lt;/');

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relpages/allvisible probably shouldn't be a int4

2014-05-30 Thread Andres Freund
On 2014-05-11 23:30:37 +0200, Andres Freund wrote:
 On 2014-05-11 12:24:30 -0400, Tom Lane wrote:
  Andres Freund and...@2ndquadrant.com writes:
   On 2014-05-10 23:21:34 -0700, Peter Geoghegan wrote:
   On Fri, May 9, 2014 at 1:50 PM, Andres Freund and...@2ndquadrant.com 
   wrote:
   And adding a proper unsigned type doesn't sound like a small amount of 
   work.
  
   Perhaps not, but it's overdue. We ought to have one.
  
   Maybe. But there's so many things to decide around it that I don't think
   it's a good prerequisite for not showing essentially corrupted values in
   a supported scenario.
  
  It's a lot harder than it sounds at first; see past discussions about how
  we could shoehorn one into the numeric type hierarchy.  And think about
  how C expressions that mix signed and unsigned inputs tend to give
  surprising results :-(
 
 Yea. I really don't like to take on such a major project to solve a
 minor problem.
 What I am thinking about right now is to expose a 'pg_blocknumber'
 type. That only does very basic operations and implicitly casts to
 int64. That's probably a much more handleable problem and it also might
 give us some more experience with unsigned types. Comments?

As a note towards that: e.g. pageinspect deals with blocknumbers and
uses int4 for that. That makes accessing the higher blocks really
awkward...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relpages/allvisible probably shouldn't be a int4

2014-05-30 Thread Simon Riggs
On 12 May 2014 08:15, Andres Freund and...@2ndquadrant.com wrote:

 But I concur that in practice, if you're dealing with 16TB tables, it's time
 to partition.

 Well, we need to improve our partitioning for that to be viable for all
 relations. Not having usable foreign and unique keys makes it a pita in
 some cases.

As discussed, declarative partitioning is on the roadmap for this next
release, so I would say lets just document that tablesizes above 16TB
don't report correctly and move on.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relpages/allvisible probably shouldn't be a int4

2014-05-30 Thread Andres Freund
On 2014-05-30 12:00:47 +0100, Simon Riggs wrote:
 On 12 May 2014 08:15, Andres Freund and...@2ndquadrant.com wrote:
 
  But I concur that in practice, if you're dealing with 16TB tables, it's 
  time
  to partition.
 
  Well, we need to improve our partitioning for that to be viable for all
  relations. Not having usable foreign and unique keys makes it a pita in
  some cases.
 
 As discussed, declarative partitioning is on the roadmap for this next
 release, so I would say lets just document that tablesizes above 16TB
 don't report correctly and move on.

I doubt we'll fix all the snags - like foreign keys, unique keys, etc -
that partitioning has in this release... Introducing a blocknumber type
seems easy and mechanical enough.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd uuid-ossp behavior on smew and shearwater

2014-05-30 Thread Magnus Hagander
On Fri, May 30, 2014 at 12:45 AM, Andrew Dunstan and...@dunslane.net
wrote:


 On 05/29/2014 05:41 PM, Josh Kupershmidt wrote:

 On Thu, May 29, 2014 at 4:06 PM, Andrew Dunstan and...@dunslane.net
 wrote:

 Almost all my critters run in VMs (all but jacana and bowerbird).

 They're not running on OpenVZ, are they? `ifconfig` on shearwater says:

  [...]

  and it seems this all-zeros MAC address is a common
 (mis-?)configuration on OpenVZ:

 https://github.com/robertdavidgraham/masscan/issues/43
 http://stackoverflow.com/questions/5838225/how-do-i-
 get-a-default-gridgain-node-in-openvz-discover-other-nodes-on-the-same
 http://forum.openvz.org/index.php?t=msggoto=8117




 Yeah, that's a bit ugly. Mine are on qemu, one (sitella) is on Xen.


That's likely the difference between real virtualization and
containerization. It'd be interesting to see how it behaves on FreeBSD
jails or LXC-containers.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


[HACKERS] jsonb access operators inefficiency

2014-05-30 Thread Teodor Sigaev

Hi!

jsonb operators -text, -text,-int, -int use inefficient methods to access 
to needed field, proportional O(N/2). Attached patch suggests for text operators 
O(log(N)) and for integer - O(1). The fuctions with fast access already are 
existed in current code and are used in contains operation, for example. 
Attached patch uses that functions instead of sequentual loop over object/array.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


jsonb_access.patch.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE REPLICATION SLOT fails on a timeout

2014-05-30 Thread Steve Singer

On 05/28/2014 06:41 PM, Andres Freund wrote:

Hi,


Pushed a fix for it. I am pretty sure it will, but could you still test
that it fixes your problem?

Thanks!


The fix seems to work (I am no longer getting the timeout on slot creation)

Thanks




Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-30 Thread Claudio Freire
On Fri, May 30, 2014 at 4:15 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 We don't want polling... And even if we did, calling aio_suspend() in a
 way
 that's known to be broken, in a loop, is a pretty crappy way of polling.


 Well,  as mentioned earlier,  it is not broken. Whether it is
 efficient I am not sure.
 I have looked at the mutex in aio_suspend that you mentioned and I am not
 quite convinced that,  if caller is not the original aio_read process,
 it renders the suspend() into an instant timeout.  I will see if I can
 verify that.


 I don't see the point of pursuing this design further. Surely we don't want
 to use polling here, and you're relying on undefined behavior anyway. I'm
 pretty sure aio_return/aio_error won't work from a different process on all
 platforms, even if it happens to work on Linux. Even on Linux, it might stop
 working if the underlying implementation changes from the glibc pthread
 emulation to something kernel-based.

I'll try to do some measuring of performance with:
a) git head
b) git head + patch as-is
c) git head + patch without aio_suspend in foreign processes (just re-read)
d) git head + patch with a lwlock (or whatever works) instead of aio_suspend

a-c will be the fastest, d might take some while.

I'll let you know of the results as I get them.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relpages/allvisible probably shouldn't be a int4

2014-05-30 Thread Alvaro Herrera
Andres Freund wrote:
 On 2014-05-11 23:30:37 +0200, Andres Freund wrote:

  Yea. I really don't like to take on such a major project to solve a
  minor problem.
  What I am thinking about right now is to expose a 'pg_blocknumber'
  type. That only does very basic operations and implicitly casts to
  int64. That's probably a much more handleable problem and it also might
  give us some more experience with unsigned types. Comments?
 
 As a note towards that: e.g. pageinspect deals with blocknumbers and
 uses int4 for that. That makes accessing the higher blocks really
 awkward...

Sounds like pg_blocknumber could be a good idea, but I wouldn't add the
bigint implicit casts to int64.  We might enlarge BlockNumber to
unsigned 64 bits in the future -- we will regret the implicit casts to
signed int64 then.  Anyway it's not like implicit casts will get us any
extra functionality.  We can just add + and - operators, and that should
cover 99% of what people would initially want ...

Now, adding casts FROM int32 and int64 to pg_blocknumber doesn't sound
bad.  (Maybe have them error out if the value is negative?)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SP-GiST bug.

2014-05-30 Thread Teodor Sigaev

The point I'm making is that the scenario your test case exposes is not
an infinite loop of picksplits, but an infinite loop of choose calls.


Thank you, now I see what I missed before. After some brainstorming, it's 
possible to use '\0' only as end of string marker.  The idea is: when we split 
allthesame tuple to upper/lower we copy node label to upper tuple instead of set 
it to '\0'. Node labels of lower tuple will be unchanged but should be ignored 
for reconstructionValue - and they are ignored because of allTheSame flag.  See 
attached patch.


Unfortunately, it will break on-disk compatibility. Although it will not cause a 
panic/error/coredump allTheSame approach will not find tuples. Users should 
recreate spgist indexes over text column.




It's certainly possible that we could/should change what checkAllTheSame
is doing --- on re-reading the comment, I'm not really sure that the
scenario it's concerned about can happen.  However, that will not fix


I rewrited a patch to fix missed way - allTheSame result of picksplit and tooBig 
is set. I believe this patch is still needed because it could make a tree more 
efficient as it was demonstrated for quad tree.


this patch doesn't break on-disk compatibility, although index recreation is 
recommended.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


spgist_textproc.patch.gz
Description: Unix tar archive


spgist_allthesame.patch.2.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb access operators inefficiency

2014-05-30 Thread Andrew Dunstan


On 05/30/2014 09:35 AM, Teodor Sigaev wrote:

Hi!

jsonb operators -text, -text,-int, -int use inefficient methods 
to access to needed field, proportional O(N/2). Attached patch 
suggests for text operators O(log(N)) and for integer - O(1). The 
fuctions with fast access already are existed in current code and are 
used in contains operation, for example. Attached patch uses that 
functions instead of sequentual loop over object/array.


Teodor, thanks for this.

My only worry is this sort of code, which in a quick search I didn't 
find used elsewhere


   -   (void) JsonbToCString(jtext, tjb-root , -1);
   -   result = cstring_to_text_with_len(jtext-data,
   jtext-len);
   +   appendStringInfoSpaces(jtext, VARHDRSZ);
   +   (void) JsonbToCString(jtext, v-val.binary.data,
   -1);
   +
   +   result = (text*)jtext-data;
   +   SET_VARSIZE(result, jtext-len);


If we're going to construct varlena objects inside a StringInfo, maybe 
we need a proper API for it. Isn't there a danger that data member of 
the StringInfo won't be properly aligned to allow us to do this? In any 
case, we should get most of the benefit of your patch without this 
optimization.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery testing for beta

2014-05-30 Thread Heikki Linnakangas

On 05/29/2014 07:39 PM, Jeff Janes wrote:

It also implicitly tested the xlog parallel write slots thing, as that is
common code to all recovery.


During development, I hit a lot of bugs in that patch by setting 
wal_buffers to 32kb (the minimum). Causes more backends to wait for each 
other, exposing deadlocks.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-30 Thread Greg Stark
On Fri, May 30, 2014 at 3:59 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 If transaction A commits synchronously with commit LSN 1, and transaction B
 commits asynchronously with commit LSN 2, B cannot become visible before A.
 And we cannot acknowledge B as committed to the client until it's visible to
 other transactions. That means that B will have to wait for A's commit
 record to be flushed to disk, before it can return, even though it was an
 asynchronous commit.


I thought that's what happens now.

What's more of a concern is synchronousl replication. We currently
have a hack that makes transactions committed locally invisible to
other transactions even though they've committed and synced to disk
until the slave responds that it's received the transaction. (I think
this is bogus personally, it just shifts the failure modes around. If
we wanted to do it properly we would have to do two-phase commit.)

I guess it still works because we don't support having synchronous
replication for just some transactions and not others. It would be
nice to support that but I think it would mean making it work like
local synchronous commit. It would only affect how long the commit
blocks, not when other transactions see the committed data.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SP-GiST bug.

2014-05-30 Thread Bruce Momjian
On Fri, May 30, 2014 at 06:55:18PM +0400, Teodor Sigaev wrote:
 The point I'm making is that the scenario your test case exposes is not
 an infinite loop of picksplits, but an infinite loop of choose calls.
 
 Thank you, now I see what I missed before. After some brainstorming,
 it's possible to use '\0' only as end of string marker.  The idea
 is: when we split allthesame tuple to upper/lower we copy node label
 to upper tuple instead of set it to '\0'. Node labels of lower tuple
 will be unchanged but should be ignored for reconstructionValue -
 and they are ignored because of allTheSame flag.  See attached
 patch.
 
 Unfortunately, it will break on-disk compatibility. Although it will
 not cause a panic/error/coredump allTheSame approach will not find
 tuples. Users should recreate spgist indexes over text column.

If we bump the system catalog version, pg_upgrade can mark those indexes
as invalid and output a script to reindex them.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-30 Thread Heikki Linnakangas

On 05/30/2014 06:09 PM, Greg Stark wrote:

On Fri, May 30, 2014 at 3:59 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

If transaction A commits synchronously with commit LSN 1, and transaction B
commits asynchronously with commit LSN 2, B cannot become visible before A.
And we cannot acknowledge B as committed to the client until it's visible to
other transactions. That means that B will have to wait for A's commit
record to be flushed to disk, before it can return, even though it was an
asynchronous commit.



I thought that's what happens now.

What's more of a concern is synchronousl replication. We currently
have a hack that makes transactions committed locally invisible to
other transactions even though they've committed and synced to disk
until the slave responds that it's received the transaction. (I think
this is bogus personally, it just shifts the failure modes around. If
we wanted to do it properly we would have to do two-phase commit.)


Yeah. To recap, the failure mode is that if the master crashes and 
restarts, the transaction becomes visible in the master even though it 
was never replicated.



I guess it still works because we don't support having synchronous
replication for just some transactions and not others. It would be
nice to support that but I think it would mean making it work like
local synchronous commit. It would only affect how long the commit
blocks, not when other transactions see the committed data.


Actually, we do support that, see synchronous_commit=local.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-30 Thread Andres Freund
Hi,

On 2014-05-30 17:59:23 +0300, Heikki Linnakangas wrote:
 So, here's a first version of the patch. Still very much WIP.

Cool.

 One thorny issue came up in discussions with other hackers on this in PGCon:
 
 When a transaction is committed asynchronously, it becomes visible to other
 backends before the commit WAL record is flushed. With CSN-based snapshots,
 the order that transactions become visible is always based on the LSNs of
 the WAL records. This is a problem when there is a mix of synchronous and
 asynchronous commits:
 
 If transaction A commits synchronously with commit LSN 1, and transaction B
 commits asynchronously with commit LSN 2, B cannot become visible before A.
 And we cannot acknowledge B as committed to the client until it's visible to
 other transactions. That means that B will have to wait for A's commit
 record to be flushed to disk, before it can return, even though it was an
 asynchronous commit.

 I personally think that's annoying, but we can live with it. The most common
 usage of synchronous_commit=off is to run a lot of transactions in that
 mode, setting it in postgresql.conf. And it wouldn't completely defeat the
 purpose of mixing synchronous and asynchronous commits either: an
 asynchronous commit still only needs to wait for any already-logged
 synchronous commits to be flushed to disk, not the commit record of the
 asynchronous transaction itself.

I have a hard time believing that users won't hate us for such a
regression. It's pretty common to mix both sorts of transactions and
this will - by my guesstimate - dramatically reduce throughput for the
async backends.

 * Logical decoding is broken. I hacked on it enough that it looks roughly
 sane and it compiles, but didn't spend more time to debug.

I think we can live with it not working for the first few
iterations. I'll look into it once the patch has stabilized a bit.

 * I expanded pg_clog to 64-bits per XID, but people suggested keeping
 pg_clog as is, with two bits per commit, and adding a new SLRU for the
 commit LSNs beside it. Probably will need to do something like that to avoid
 bloating the clog.

It also influences how on-disk compatibility is dealt with. So: How are
you planning to deal with on-disk compatibility?

 * Add some kind of backend-private caching of clog, to make it faster to
 access. The visibility checks are now hitting the clog a lot more heavily
 than before, as you need to check the clog even if the hint bits are set, if
 the XID falls between xmin and xmax of the snapshot.

That'll hurt a lot in concurrent scenarios :/. Have you measured how
'wide' xmax-xmin usually is? I wonder if we could just copy a range of
values from the clog when we start scanning

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pivot aggregation with a patched intarray

2014-05-30 Thread Marc Mamin
Hello,

(sorry for this long post)

I have patched intarray with 3 additional functions in order to count[distinct] 
event IDs
into arrays, whereas the array position correspond to the integer values. 
(mimic column oriented storage)

I need to use an array for the event IDs as I don't know how many of them 
exist, and the list may increase slowly upon the time.

The first results are quite promising. 

Although this approach is only usable for a narrow set of use cases (*), 
I wonder if I should look at other places in the source code to achieve a 
better implementation
and if there are discussions or plan to enhance Postgres with some support for 
such kind of storage (vectors of int).


(*) The array sizes should ideally not exceed a few tens 
and the number of events should be unknown, otherwise using one column 
per event would be faster)


use case


c: category
s: session
e: event int4 (small range, mostly less than 20)

c   s   e
-   -   -
1   1   1
1   1   1
1   1   3
1   2   1
2   2   3
3   1   5

WITH SES AS (
  SELECT s, c, 
 icount_to_array(e) ecount,
 iarray_flag(e) edistinct
  FROM foo
  GROUP BY s, c)
SELECT  c, 
iarray_sum(ecount) ecount, 
iarray_sum(edistinct)edistinct
FROM SES GROUP BY c


cecount  edistinct
---- -
1[3,0,1] [2,0,1]
2[0,0,1] [0,0,1]
3[0,0,0,0,1] [0,0,0,0,1]

e.g.: the event 1 was met 3 times in the category 1, in 2 distinct sessions.


source code
===
The 3 aggregates use following functions:

isumv:  isumv([1,1,1],[0,0,1,3]) = [1,1,2,3])

iarray_addat :  iarray_addat([3,3,3],2) = [3,4,3]

iarray_setat :  iarray_setat([0,0],2) = [0,1] 
iarray_setat([0,1],2) = [0,1] 


I've added them at the end of _int_op.c from intarray (attached)

missing in code: checks for integer overflow and that the array lower bound are 
all 1.

These are my first C lines ever and I've never learned it.
Hence I'm open for critics ;-)
I've started with this great posting by Craig Ringer:
 
http://stackoverflow.com/questions/16992339/why-is-postgresql-array-access-so-much-faster-in-c-than-in-pl-pgsql?answertab=votes#tab-top
The rest is mostly copy and paste from other parts of intarray.
 

iarray_setat is just to set a bitmap position to 1, possibly enlarging it 
when required.
It is a trivial modification from iarray_addat
It should be possible to implement this more efficiently with bit[] or varbit, 
but this
lies beyond my C capbilities.

performances  advantage of icount_to_array
===
As this aggregate allows to reduce the cardinality of GROUP BYs
it often performs better than a vertical approach. 

With the horizontal storage, the result can be stored in smaller tables with 
much less rows,
hence bringing more advantages when it comes to indices.



e.g.:

select icount_to_array(user_id) from foo

{256655,0,0,1075,0,1,91154,36,0 (...)

Aggregate  (cost=36930.44..36930.45 rows=1 width=4) (actual 
time=333.378..333.378 rows=1 loops=1)
  -  Seq Scan on foo (cost=0.00..32279.95 rows=1860195 width=4) (actual 
time=0.010..131.117 rows=1860179 loops=1)
Total runtime: 333.420 ms


select user_id, count(*) from foo group by user_id order by 1

Sort  (cost=41582.75..41582.87 rows=48 width=4) (actual time=492.638..492.638 
rows=79 loops=1)
  Sort Key: user_id
  Sort Method: quicksort  Memory: 28kB
  -  HashAggregate  (cost=41580.93..41581.41 rows=48 width=4) (actual 
time=492.606..492.615 rows=79 loops=1)
-  Seq Scan on foo (cost=0.00..32279.95 rows=1860195 width=4) (actual 
time=0.010..128.800 rows=1860179 loops=1)
Total runtime: 492.699 ms

1;256656
4;1075
7;91157
8;36
17;1455583
(...)

It will swap later on
-
... which result in a significant difference in some cases.

create temp table ev AS SELECT * FROM (
generate_series(1,1000)s cross join
generate_series(1,500)c cross join
generate_series(1,15)e cross join
generate_series(1,5) repeat
)

explain analyze select s, c,  icount_to_array(e)  from ev group by s,c

HashAggregate  (cost=830575.54..830975.54 rows=4 width=12) (actual 
time=19188.384..19379.154 rows=50 loops=1)
  -  Seq Scan on ev  (cost=0.00..561487.31 rows=35878431 width=12) (actual 
time=0.046..4849.977 rows=3750 loops=1)
Total runtime: 19399.151 ms

explain analyze select s, c, e, count(*) from ev group by s,c,e

GroupAggregate  (cost=5589186.88..6073545.71 rows=3587844 width=12) (actual 
time=63290.168..89336.265 rows=750 loops=1)
  -  Sort  (cost=5589186.88..5678882.96 rows=35878431 width=12) (actual 
time=63290.156..83981.772 rows=3750 loops=1)
Sort Key: s, c, e
Sort Method: external merge  Disk: 806424kB
-  Seq Scan on ev  (cost=0.00..561487.31 rows=35878431 width=12) 
(actual time=0.039..4957.481 rows=3750 loops=1)
Total runtime: 89680.844 ms


Aggregates definition:
==


   

Re: [HACKERS] SP-GiST bug.

2014-05-30 Thread Teodor Sigaev

If we bump the system catalog version, pg_upgrade can mark those indexes
as invalid and output a script to reindex them.



Between 9.3 - 9.4 catalog version is changed anyway.
--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pivot aggregation with a patched intarray

2014-05-30 Thread Marc Mamin
(reposted, this time with attachment. sorry)


Hello,


(sorry for this long post)

I have patched intarray with 3 additional functions in order to count[distinct] 
event IDs
into arrays, whereas the array position correspond to the integer values. 
(mimic column oriented storage)

I need to use an array for the event IDs as I don't know how many of them 
exist, and the list may increase slowly upon the time.

The first results are quite promising. 

Although this approach is only usable for a narrow set of use cases (*), 
I wonder if I should look at other places in the source code to achieve a 
better implementation
and if there are discussions or plan to enhance Postgres with some support for 
such kind of storage (vectors of int).


(*) The array sizes should ideally not exceed a few tens 
and the number of events should be unknown, otherwise using one column 
per event would be faster)


use case


c: category
s: session
e: event int4 (small range, mostly less than 20)

c   s   e
-   -   -
1   1   1
1   1   1
1   1   3
1   2   1
2   2   3
3   1   5

WITH SES AS (
  SELECT s, c, 
 icount_to_array(e) ecount,
 iarray_flag(e) edistinct
  FROM foo
  GROUP BY s, c)
SELECT  c, 
iarray_sum(ecount) ecount, 
iarray_sum(edistinct)edistinct
FROM SES GROUP BY c


cecount  edistinct
---- -
1[3,0,1] [2,0,1]
2[0,0,1] [0,0,1]
3[0,0,0,0,1] [0,0,0,0,1]

e.g.: the event 1 was met 3 times in the category 1, in 2 distinct sessions.


source code
===
The 3 aggregates use following functions:

isumv:  isumv([1,1,1],[0,0,1,3]) = [1,1,2,3])

iarray_addat :  iarray_addat([3,3,3],2) = [3,4,3]

iarray_setat :  iarray_setat([0,0],2) = [0,1] 
iarray_setat([0,1],2) = [0,1] 


I've added them at the end of _int_op.c from intarray (attached)

missing in code: checks for integer overflow and that the array lower bound are 
all 1.

These are my first C lines ever and I've never learned it.
Hence I'm open for critics ;-)
I've started with this great posting by Craig Ringer:
 
http://stackoverflow.com/questions/16992339/why-is-postgresql-array-access-so-much-faster-in-c-than-in-pl-pgsql?answertab=votes#tab-top
The rest is mostly copy and paste from other parts of intarray.
 

iarray_setat is just to set a bitmap position to 1, possibly enlarging it 
when required.
It is a trivial modification from iarray_addat
It should be possible to implement this more efficiently with bit[] or varbit, 
but this
lies beyond my C capbilities.

performances  advantage of icount_to_array
===
As this aggregate allows to reduce the cardinality of GROUP BYs
it often performs better than a vertical approach. 

With the horizontal storage, the result can be stored in smaller tables with 
much less rows,
hence bringing more advantages when it comes to indices.



e.g.:

select icount_to_array(user_id) from foo

{256655,0,0,1075,0,1,91154,36,0 (...)

Aggregate  (cost=36930.44..36930.45 rows=1 width=4) (actual 
time=333.378..333.378 rows=1 loops=1)
  -  Seq Scan on foo (cost=0.00..32279.95 rows=1860195 width=4) (actual 
time=0.010..131.117 rows=1860179 loops=1)
Total runtime: 333.420 ms


select user_id, count(*) from foo group by user_id order by 1

Sort  (cost=41582.75..41582.87 rows=48 width=4) (actual time=492.638..492.638 
rows=79 loops=1)
  Sort Key: user_id
  Sort Method: quicksort  Memory: 28kB
  -  HashAggregate  (cost=41580.93..41581.41 rows=48 width=4) (actual 
time=492.606..492.615 rows=79 loops=1)
-  Seq Scan on foo (cost=0.00..32279.95 rows=1860195 width=4) (actual 
time=0.010..128.800 rows=1860179 loops=1)
Total runtime: 492.699 ms

1;256656
4;1075
7;91157
8;36
17;1455583
(...)

It will swap later on
-
... which result in a significant difference in some cases.

create temp table ev AS SELECT * FROM (
generate_series(1,1000)s cross join
generate_series(1,500)c cross join
generate_series(1,15)e cross join
generate_series(1,5) repeat
)

explain analyze select s, c,  icount_to_array(e)  from ev group by s,c

HashAggregate  (cost=830575.54..830975.54 rows=4 width=12) (actual 
time=19188.384..19379.154 rows=50 loops=1)
  -  Seq Scan on ev  (cost=0.00..561487.31 rows=35878431 width=12) (actual 
time=0.046..4849.977 rows=3750 loops=1)
Total runtime: 19399.151 ms

explain analyze select s, c, e, count(*) from ev group by s,c,e

GroupAggregate  (cost=5589186.88..6073545.71 rows=3587844 width=12) (actual 
time=63290.168..89336.265 rows=750 loops=1)
  -  Sort  (cost=5589186.88..5678882.96 rows=35878431 width=12) (actual 
time=63290.156..83981.772 rows=3750 loops=1)
Sort Key: s, c, e
Sort Method: external merge  Disk: 806424kB
-  Seq Scan on ev  (cost=0.00..561487.31 rows=35878431 width=12) 
(actual time=0.039..4957.481 rows=3750 loops=1)
Total runtime: 89680.844 ms



Re: [HACKERS] jsonb access operators inefficiency

2014-05-30 Thread Andrew Dunstan


On 05/30/2014 11:08 AM, Andrew Dunstan wrote:


On 05/30/2014 09:35 AM, Teodor Sigaev wrote:

Hi!

jsonb operators -text, -text,-int, -int use inefficient methods 
to access to needed field, proportional O(N/2). Attached patch 
suggests for text operators O(log(N)) and for integer - O(1). The 
fuctions with fast access already are existed in current code and are 
used in contains operation, for example. Attached patch uses that 
functions instead of sequentual loop over object/array.


Teodor, thanks for this.

My only worry is this sort of code, which in a quick search I didn't 
find used elsewhere


   -   (void) JsonbToCString(jtext, tjb-root , -1);
   -   result = cstring_to_text_with_len(jtext-data,
   jtext-len);
   +   appendStringInfoSpaces(jtext, VARHDRSZ);
   +   (void) JsonbToCString(jtext, v-val.binary.data,
   -1);
   +
   +   result = (text*)jtext-data;
   +   SET_VARSIZE(result, jtext-len);


If we're going to construct varlena objects inside a StringInfo, maybe 
we need a proper API for it. Isn't there a danger that data member of 
the StringInfo won't be properly aligned to allow us to do this? In 
any case, we should get most of the benefit of your patch without this 
optimization.



I see that palloc.h says:

   The result of palloc() is always word-aligned


so maybe my alignment fear is misplaced. So my remaining question is 
whether this is OK stylistically.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb access operators inefficiency

2014-05-30 Thread Teodor Sigaev


If we're going to construct varlena objects inside a StringInfo, maybe
we need a proper API for it. Isn't there a danger that data member of
the StringInfo won't be properly aligned to allow us to do this? In any
case, we should get most of the benefit of your patch without this
optimization.


I believe that StringInfo-data is palloc'ed, it means it's MAXALIGNed.

--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb access operators inefficiency

2014-05-30 Thread Teodor Sigaev

I see that palloc.h says:

The result of palloc() is always word-aligned


void *
repalloc(void *pointer, Size size)
{
...
/*
 * Try to detect bogus pointers handed to us, poorly though we can.
 * Presumably, a pointer that isn't MAXALIGNED isn't pointing at an
 * allocated chunk.
 */
Assert(pointer != NULL);
Assert(pointer == (void *) MAXALIGN(pointer));
...





so maybe my alignment fear is misplaced. So my remaining question is
whether this is OK stylistically.


Something like this?
initStringInfoVarlena()/makeStringInfoVarlena()
{
initStringInfo()
appendStringInfoSpaces(jtext, VARHDRSZ);
}

char*
formStringInfoVarlena()
{
SET_VARSIZE(jtext-data, jtext-len);
return  jtext-data;
}

--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb access operators inefficiency

2014-05-30 Thread Oleg Bartunov
The patch really improves access performance to jsonb.  On the
delicious bookmarks I got 5 times better performance.Now jsonb
outperforms json on simple access  (slide 12 of pgcon presentation) by
103 times !

Oleg

On Fri, May 30, 2014 at 9:35 AM, Teodor Sigaev teo...@sigaev.ru wrote:
 Hi!

 jsonb operators -text, -text,-int, -int use inefficient methods to
 access to needed field, proportional O(N/2). Attached patch suggests for
 text operators O(log(N)) and for integer - O(1). The fuctions with fast
 access already are existed in current code and are used in contains
 operation, for example. Attached patch uses that functions instead of
 sequentual loop over object/array.
 --
 Teodor Sigaev   E-mail: teo...@sigaev.ru
WWW:
 http://www.sigaev.ru/


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb access operators inefficiency

2014-05-30 Thread Andrew Dunstan


On 05/30/2014 01:30 PM, Oleg Bartunov wrote:

The patch really improves access performance to jsonb.  On the
delicious bookmarks I got 5 times better performance.Now jsonb
outperforms json on simple access  (slide 12 of pgcon presentation) by
103 times !




(Oleg, please try not to top-post)

The question is whether the speedup comes from the reduction in lookup 
times or the reduction in string copying. I have a strong suspicion that 
it's mostly the first, not the second.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb access operators inefficiency

2014-05-30 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 If we're going to construct varlena objects inside a StringInfo, maybe 
 we need a proper API for it. Isn't there a danger that data member of 
 the StringInfo won't be properly aligned to allow us to do this? In any 
 case, we should get most of the benefit of your patch without this 
 optimization.

As noted, the data buffer is maxaligned; but nonetheless I agree that
this is a serious stylistic abuse, and it's not buying much compared
to the rest of the patch.  I'd stick with the cstring_to_text_with_len
coding.

At the other end of the process, why are we using PG_GETARG_TEXT_P
and not PG_GETARG_TEXT_PP to avoid a detoast cycle on short-header
inputs?  The function body is using VARDATA_ANY/VARSIZE_ANY_EXHDR,
so it's already prepared for unaligned input.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb access operators inefficiency

2014-05-30 Thread Teodor Sigaev

The question is whether the speedup comes from the reduction in lookup
times or the reduction in string copying. I have a strong suspicion that
it's mostly the first, not the second.


Agree with about Oleg's test, but some users put huge values into 
json... Actually, I don't insist, ITSM, that's easy way to prevent extra 
copying.


--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-30 Thread Heikki Linnakangas

On 05/30/2014 06:27 PM, Andres Freund wrote:

On 2014-05-30 17:59:23 +0300, Heikki Linnakangas wrote:

One thorny issue came up in discussions with other hackers on this in PGCon:

When a transaction is committed asynchronously, it becomes visible to other
backends before the commit WAL record is flushed. With CSN-based snapshots,
the order that transactions become visible is always based on the LSNs of
the WAL records. This is a problem when there is a mix of synchronous and
asynchronous commits:

If transaction A commits synchronously with commit LSN 1, and transaction B
commits asynchronously with commit LSN 2, B cannot become visible before A.
And we cannot acknowledge B as committed to the client until it's visible to
other transactions. That means that B will have to wait for A's commit
record to be flushed to disk, before it can return, even though it was an
asynchronous commit.



I personally think that's annoying, but we can live with it. The most common
usage of synchronous_commit=off is to run a lot of transactions in that
mode, setting it in postgresql.conf. And it wouldn't completely defeat the
purpose of mixing synchronous and asynchronous commits either: an
asynchronous commit still only needs to wait for any already-logged
synchronous commits to be flushed to disk, not the commit record of the
asynchronous transaction itself.


I have a hard time believing that users won't hate us for such a
regression. It's pretty common to mix both sorts of transactions and
this will - by my guesstimate - dramatically reduce throughput for the
async backends.


Yeah, it probably would. Not sure how many people would care.

For an asynchronous commit, we could store the current WAL flush 
location as the commit LSN, instead of the location of the commit 
record. That would break the property that LSN == commit order, but that 
property is fundamentally incompatible with having async commits become 
visible without flushing previous transactions. Or we could even make it 
configurable, it would be fairly easy to support both behaviors.



* Logical decoding is broken. I hacked on it enough that it looks roughly
sane and it compiles, but didn't spend more time to debug.


I think we can live with it not working for the first few
iterations. I'll look into it once the patch has stabilized a bit.


Thanks!


* I expanded pg_clog to 64-bits per XID, but people suggested keeping
pg_clog as is, with two bits per commit, and adding a new SLRU for the
commit LSNs beside it. Probably will need to do something like that to avoid
bloating the clog.


It also influences how on-disk compatibility is dealt with. So: How are
you planning to deal with on-disk compatibility?


* Add some kind of backend-private caching of clog, to make it faster to
access. The visibility checks are now hitting the clog a lot more heavily
than before, as you need to check the clog even if the hint bits are set, if
the XID falls between xmin and xmax of the snapshot.


That'll hurt a lot in concurrent scenarios :/. Have you measured how
'wide' xmax-xmin usually is?


That depends entirely on the workload. The worst case is a mix of a 
long-running transaction and a lot of short transaction. It could grow 
to millions of transactions or more in that case.



I wonder if we could just copy a range of
values from the clog when we start scanning


I don't think that's practical, if the xmin-xmax gap is wide.

Perhaps we should take the bull by the horns and make clog faster to 
look up. If we e.g. mmapped the clog file into backend-private address 
space, we could all the locking overhead of an SLRU. On platforms with 
atomic 64-bit instructions, you could read the clog with just a memory 
barrier. Even on other architectures, you'd only need a spinlock.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-30 Thread Heikki Linnakangas

(forgot to answer this question)

On 05/30/2014 06:27 PM, Andres Freund wrote:

On 2014-05-30 17:59:23 +0300, Heikki Linnakangas wrote:

* I expanded pg_clog to 64-bits per XID, but people suggested keeping
pg_clog as is, with two bits per commit, and adding a new SLRU for the
commit LSNs beside it. Probably will need to do something like that to avoid
bloating the clog.


It also influences how on-disk compatibility is dealt with. So: How are
you planning to deal with on-disk compatibility?


Have pg_upgrade read the old clog and write it out in the new format.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] unable to build postgres-9.4 in os x 10.9 with python

2014-05-30 Thread Tom Lane
Asif Naeem anaeem...@gmail.com writes:
 On Fri, May 30, 2014 at 9:49 PM, reiner peterke zedaa...@drizzle.com
 wrote:
 Since upgrading my mac from os x 10.8 to 10.9, i can no long build
 postgres with '--with-python’.

 Latest PG 9.4 source code seems building fine with --with-python option on
 my OS X 10.9.3 box  i.e.

[ please don't top-post in PG mailing lists ]

I can reproduce the failure as described, not only in 9.4 but the back
branches too; I would've noticed sooner except I don't build plpython
routinely.

It looks to me like Apple broke something in the build toolchain.
If you add -v as suggested, what you see is

$ make PROFILE=-v 
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -g -v  -bundle -multiply_defined 
suppress -o plpython2.so plpy_cursorobject.o plpy_elog.o plpy_exec.o 
plpy_main.o plpy_planobject.o plpy_plpymodule.o plpy_procedure.o 
plpy_resultobject.o plpy_spi.o plpy_subxactobject.o plpy_typeio.o plpy_util.o 
-L../../../src/port -L../../../src/common -Wl,-dead_strip_dylibs  -v  
-F/System/Library/Frameworks -framework Python   -bundle_loader 
../../../src/backend/postgres
Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
Target: x86_64-apple-darwin13.2.0
Thread model: posix
 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld
 -demangle -dynamic -arch x86_64 -bundle -bundle_loader 
../../../src/backend/postgres -macosx_version_min 10.9.0 -multiply_defined 
suppress -syslibroot 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.9.sdk
 -o plpython2.so -L../../../src/port -L../../../src/common plpy_cursorobject.o 
plpy_elog.o plpy_exec.o plpy_main.o plpy_planobject.o plpy_plpymodule.o 
plpy_procedure.o plpy_resultobject.o plpy_spi.o plpy_subxactobject.o 
plpy_typeio.o plpy_util.o -dead_strip_dylibs -framework Python -lSystem 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/5.1/lib/darwin/libclang_rt.osx.a
 -F/System/Library/Frameworks
ld: framework not found Python
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [plpython2.so] Error 1

The problem is that that -syslibroot option modifies where to look for
frameworks; man ld quoth

 -syslibroot rootdir
 Prepend rootdir to all search paths when searching for
 libraries or frameworks.

If you do the ld call by hand without the -syslibroot option, it works.
AFAICS it could never have worked with such an option, so I'm thinking
this is some new misbehavior in the latest version of Xcode.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] unable to build postgres-9.4 in os x 10.9 with python

2014-05-30 Thread Adrian Klaver

On 05/30/2014 11:52 AM, Tom Lane wrote:

Asif Naeem anaeem...@gmail.com writes:

On Fri, May 30, 2014 at 9:49 PM, reiner peterke zedaa...@drizzle.com
wrote:

Since upgrading my mac from os x 10.8 to 10.9, i can no long build
postgres with '--with-python’.



Latest PG 9.4 source code seems building fine with --with-python option on
my OS X 10.9.3 box  i.e.


[ please don't top-post in PG mailing lists ]

I can reproduce the failure as described, not only in 9.4 but the back
branches too; I would've noticed sooner except I don't build plpython
routinely.

It looks to me like Apple broke something in the build toolchain.
If you add -v as suggested, what you see is



If you do the ld call by hand without the -syslibroot option, it works.
AFAICS it could never have worked with such an option, so I'm thinking
this is some new misbehavior in the latest version of Xcode.


There is and the SO thread that goes into detail on this is here:

http://stackoverflow.com/questions/19555395/python-framework-is-missing-from-os-x-10-9-sdk-why-also-workaround

The Apple document referenced in above is:

https://developer.apple.com/library/ios/technotes/tn2328/_index.html



regards, tom lane





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SP-GiST bug.

2014-05-30 Thread Bruce Momjian
On Fri, May 30, 2014 at 08:19:07PM +0400, Teodor Sigaev wrote:
 If we bump the system catalog version, pg_upgrade can mark those indexes
 as invalid and output a script to reindex them.
 
 
 Between 9.3 - 9.4 catalog version is changed anyway.

Right.  I was thinking of beta users between beta1 and beta2 as well. 
We would need to trigger the index invalidation on the catalog version
used for the commit that changes the index format, not the major version
like 9.4.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] unable to build postgres-9.4 in os x 10.9 with python

2014-05-30 Thread Tom Lane
Adrian Klaver adrian.kla...@aklaver.com writes:
 On 05/30/2014 11:52 AM, Tom Lane wrote:
 If you do the ld call by hand without the -syslibroot option, it works.
 AFAICS it could never have worked with such an option, so I'm thinking
 this is some new misbehavior in the latest version of Xcode.

 There is and the SO thread that goes into detail on this is here:
 http://stackoverflow.com/questions/19555395/python-framework-is-missing-from-os-x-10-9-sdk-why-also-workaround
 The Apple document referenced in above is:
 https://developer.apple.com/library/ios/technotes/tn2328/_index.html

Fun.  So after all these years of catering to Apple's preferred weirdness
in this regard, they reverse course and tell us to do it like everywhere
else.

I experimented with just diking out the python_framework case in
configure, and that *almost* works; but for some reason distutils doesn't
admit to having a shared library, so you also have to override that test.

The attached patch fixes this and also removes a long-obsolete comment
claiming that we don't work with Python  2.5 on OSX (see prairiedog
for evidence to the contrary).

I've tested this successfully on my 10.9.3 laptop as well as on dromedary
and prairiedog, so I'm thinking we should patch not only HEAD but all the
back branches.  Any objections?

regards, tom lane

PS: why aren't any of the buildfarm members using 10.9 building
--with-python?  We should have known about this months ago, ISTM.

diff --git a/config/python.m4 b/config/python.m4
index 5cb2854..7012c53 100644
*** a/config/python.m4
--- b/config/python.m4
*** python_libdir=`${PYTHON} -c import dist
*** 68,81 
  python_ldlibrary=`${PYTHON} -c import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('LDLIBRARY'`
  python_so=`${PYTHON} -c import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('SO'`
  ldlibrary=`echo ${python_ldlibrary} | sed s/${python_so}$//`
- python_framework=`${PYTHON} -c import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('PYTHONFRAMEWORK'`
  python_enable_shared=`${PYTHON} -c import distutils.sysconfig; print(distutils.sysconfig.get_config_vars().get('Py_ENABLE_SHARED',0))`
  
! if test -n $python_framework; then
! 	python_frameworkprefix=`${PYTHON} -c import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('PYTHONFRAMEWORKPREFIX'`
! 	python_libspec=-F${python_frameworkprefix} -framework $python_framework
! 	python_enable_shared=1
! elif test x${python_libdir} != x -a x${python_ldlibrary} != x -a x${python_ldlibrary} != x${ldlibrary}
  then
  	# New way: use the official shared library
  	ldlibrary=`echo ${ldlibrary} | sed s/^lib//`
--- 68,76 
  python_ldlibrary=`${PYTHON} -c import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('LDLIBRARY'`
  python_so=`${PYTHON} -c import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('SO'`
  ldlibrary=`echo ${python_ldlibrary} | sed s/${python_so}$//`
  python_enable_shared=`${PYTHON} -c import distutils.sysconfig; print(distutils.sysconfig.get_config_vars().get('Py_ENABLE_SHARED',0))`
  
! if test x${python_libdir} != x -a x${python_ldlibrary} != x -a x${python_ldlibrary} != x${ldlibrary}
  then
  	# New way: use the official shared library
  	ldlibrary=`echo ${ldlibrary} | sed s/^lib//`
*** else
*** 91,99 
  	python_libspec=-L${python_libdir} -lpython${python_ldversion}
  fi
  
! if test -z $python_framework; then
! 	python_additional_libs=`${PYTHON} -c import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('LIBS','LIBC','LIBM','BASEMODLIBS'`
! fi
  
  AC_MSG_RESULT([${python_libspec} ${python_additional_libs}])
  
--- 86,92 
  	python_libspec=-L${python_libdir} -lpython${python_ldversion}
  fi
  
! python_additional_libs=`${PYTHON} -c import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('LIBS','LIBC','LIBM','BASEMODLIBS'`
  
  AC_MSG_RESULT([${python_libspec} ${python_additional_libs}])
  
diff --git a/src/pl/plpython/Makefile b/src/pl/plpython/Makefile
index 46d2030..020861a 100644
*** a/src/pl/plpython/Makefile
--- b/src/pl/plpython/Makefile
*** include $(top_builddir)/src/Makefile.glo
*** 9,24 
  # asks Python directly.  But because this has been broken in Debian
  # for a long time (http://bugs.debian.org/695979), and to support
  # older Python versions, we see if there is a file that is named like
! # a shared library as a fallback.  (Note that this is wrong on OS X,
! # where DLSUFFIX is .so, but libpython is a .dylib.  Python 2.5 is
! # therefore not supported on OS X.)
  ifeq (1,$(python_enable_shared))
  shared_libpython = yes
  else
  ifneq (,$(wildcard $(python_libdir)/libpython*$(DLSUFFIX)*))
  shared_libpython = yes
  endif
  endif
 

Re: [HACKERS] Proposing pg_hibernate

2014-05-30 Thread Josh Kupershmidt
On Tue, May 27, 2014 at 10:01 PM, Gurjeet Singh gurj...@singh.im wrote:

 When the Postgres server is being stopped/shut down, the `Buffer
 Saver` scans the
 shared-buffers of Postgres, and stores the unique block identifiers of
 each cached
 block to the disk. This information is saved under the 
 `$PGDATA/pg_hibernator/`
 directory. For each of the database whose blocks are resident in shared 
 buffers,
 one file is created; for eg.: `$PGDATA/pg_hibernator/2.postgres.save`.

This file-naming convention seems a bit fragile. For example, on my
filesystem (HFS) if I create a database named foo / bar, I'll get a
complaint like:

ERROR:  could not open pg_hibernator/5.foo / bar.save: No such file
or directory

during shutdown.

Josh


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery testing for beta

2014-05-30 Thread Jeff Janes
On Thu, May 29, 2014 at 8:15 PM, Amit Kapila amit.kapil...@gmail.com
wrote:

 On Thu, May 29, 2014 at 10:09 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 
  What features in 9.4 need more beta testing for recovery?

 Another feature which have interaction with recovery is reduced WAL
 for Update operation:

 http://www.postgresql.org/message-id/e1wnqjm-0003cz...@gemulon.postgresql.org

 Commit: a3115f


It looks like this is something which is always on, so it should be
getting a good test without me taking any special steps.  But is there some
way, for example something in xlogdump, to assess how often this is getting
activated?

Thanks,

Jeff


Re: [HACKERS] [GENERAL] unable to build postgres-9.4 in os x 10.9 with python

2014-05-30 Thread Adrian Klaver

On 05/30/2014 01:31 PM, Tom Lane wrote:

Adrian Klaver adrian.kla...@aklaver.com writes:

On 05/30/2014 11:52 AM, Tom Lane wrote:

If you do the ld call by hand without the -syslibroot option, it works.
AFAICS it could never have worked with such an option, so I'm thinking
this is some new misbehavior in the latest version of Xcode.



There is and the SO thread that goes into detail on this is here:
http://stackoverflow.com/questions/19555395/python-framework-is-missing-from-os-x-10-9-sdk-why-also-workaround
The Apple document referenced in above is:
https://developer.apple.com/library/ios/technotes/tn2328/_index.html


Fun.  So after all these years of catering to Apple's preferred weirdness
in this regard, they reverse course and tell us to do it like everywhere
else.


Mavericks, the gift that keeps giving:)



--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery testing for beta

2014-05-30 Thread Noah Misch
On Thu, May 29, 2014 at 09:39:56AM -0700, Jeff Janes wrote:
 I've applied my partial-write testing harness to several scenarios in 9.4.
  So far its found a recovery bug for gin indexes, a recovery bug for btree,
 a vacuum bug for btree indexes (with foreign keys, but that is not relevant
 to the bug), and nothing of interest for gist index, although it only
 tested where text_array @@ to_tsquery(?) queries.
 
 It also implicitly tested the xlog parallel write slots thing, as that is
 common code to all recovery.
 
 I also applied the foreign key test retroactively to 9.3, and it quickly
 re-found the multixact bugs up until commit 9a57858f1103b89a5674.  The test
 was designed only with the knowledge that the bugs involved foreign keys
 and the consumption of multixacts.   I had no deeper knowledge of the
 details of those bugs when designing the test, so I have a reasonable
 amount of confidence that this could have found them in real time had I
 bothered to try to test the feature during the previous beta cycle.

Impressive.  This testing is of great value to the project.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery testing for beta

2014-05-30 Thread Amit Kapila
On Sat, May 31, 2014 at 3:51 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Thu, May 29, 2014 at 8:15 PM, Amit Kapila amit.kapil...@gmail.com
wrote:
 On Thu, May 29, 2014 at 10:09 PM, Jeff Janes jeff.ja...@gmail.com
wrote:
 
  What features in 9.4 need more beta testing for recovery?

 Another feature which have interaction with recovery is reduced WAL
 for Update operation:

http://www.postgresql.org/message-id/e1wnqjm-0003cz...@gemulon.postgresql.org

 Commit: a3115f


 It looks like this is something which is always on, so it should be
getting a good test without me taking any special steps.  But is there some
way, for example something in xlogdump, to assess how often this is getting
activated?

Currently there is no simple way to get this information, but with
attached simple patch you can get this information by pg_xlogdump.

It will give you information in below way:

rmgr: Heaplen (rec/tot): 47/79, tx:690, lsn:
0/0375B4A0,
 prev 0/0375B470, bkp: , desc: hot_update: rel 1663/12125/24580; tid
0/1 xma
x 690 ; new tid 0/2 xmax 0; *compressed tuple:* *suffix encoded*

rmgr: Heaplen (rec/tot): 53/85, tx:691, lsn:
0/0375B520,
 prev 0/0375B4F0, bkp: , desc: hot_update: rel 1663/12125/24580; tid
0/2 xma
x 691 ; new tid 0/3 xmax 0; *compressed tuple: prefix encoded*

rmgr: Heaplen (rec/tot): 56/88, tx:692, lsn:
0/0375B5A8,
 prev 0/0375B578, bkp: , desc: hot_update: rel 1663/12125/24580; tid
0/3 xma
x 692 ; new tid 0/4 xmax 0; *uncompressed tuple*

I think this is useful information and can be even included in core
code.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pgxlogdump_output_compressed_tuple_info-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers