Re: [HACKERS] pg_stat directory and pg_stat_statements

2014-05-29 Thread Peter Geoghegan
On Wed, May 28, 2014 at 10:49 PM, Fujii Masao masao.fu...@gmail.com wrote:
 You're concerned about the scenario using pg_upgrade? I'm not sure the detail
 of pg_upgrade. But if it doesn't work properly, we should have gotten
 the trouble

I'm not worried about pg_upgrade, because by design pg_stat_statements
will discard stats files that originated in earlier versions. However,
I don't see a need to change pg_stat_statements to serialize its
statistics to disk in the pg_stat directory before we branch off 9.4.
As you mentioned, it's harmless.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat directory and pg_stat_statements

2014-05-29 Thread Ashesh Vashi
On Thu, May 29, 2014 at 11:32 AM, Peter Geoghegan p...@heroku.com wrote:

 On Wed, May 28, 2014 at 10:49 PM, Fujii Masao masao.fu...@gmail.com
 wrote:
  You're concerned about the scenario using pg_upgrade?

Yeah - I was.

  I'm not sure the detail
  of pg_upgrade. But if it doesn't work properly, we should have gotten
  the trouble

 I'm not worried about pg_upgrade, because by design pg_stat_statements
 will discard stats files that originated in earlier versions. However,
 I don't see a need to change pg_stat_statements to serialize its
 statistics to disk in the pg_stat directory before we branch off 9.4.
 As you mentioned, it's harmless.

K.
I was just curious about the scenario.
If it was discarding the stats files that originated in earlier version, It
should be ok.


--

Thanks  Regards,
Ashesh Vashi



 --
 Peter Geoghegan



Re: [HACKERS] Ancient bug in formatting.c/to_char()

2014-05-29 Thread Heikki Linnakangas

On 05/29/2014 04:59 AM, Peter Geoghegan wrote:

Consider this example:

[local]/postgres=# SELECT to_char(1e9::float8,'9D9');
  to_char
--
  10.0
(1 row)

[local]/postgres=# SELECT to_char(1e20::float8,'9D9');
 to_char

   1
(1 row)

[local]/postgres=# SELECT to_char(1e20,'9D9');
  to_char
--
   1.0
(1 row)


There is access to uninitialized memory here. #define
DEBUG_TO_FROM_CHAR shows this to be the case (garbage is printed to
stdout).

Valgrind brought this to my attention. I propose the attached patch as
a fix for this issue.

The direct cause appears to be that float8_to_char() feels entitled to
truncate Number description post-digits, while that doesn't happen
within numeric_to_char(), say. This is very old code, from the
original to_char() patch back in 2000, and the interactions here are
not obvious. I think that that should be okay to not do this
truncation, since the value of MAXDOUBLEWIDTH was greatly increased in
2007 as part of a round of fixes to bugs of similar vintage. There is
still a defensive measure against over-sized input (we bail), but that
ought to be unnecessary, strictly speaking.


postgres=# select to_char('0.1'::float8, '9D' || repeat('9', 1000));
ERROR:  double precision for character conversion is too wide

Yeah, the code is pretty crufty, I'm not sure what the proper fix would 
be. I wonder if psprintf would be useful.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2014-05-29 Thread Simon Riggs
On 29 May 2014 01:07, Bruce Momjian br...@momjian.us wrote:
 On Wed, May 28, 2014 at 04:04:13PM +0100, Simon Riggs wrote:
 On 28 May 2014 15:34, Fujii Masao masao.fu...@gmail.com wrote:

  Also, compress_backup_block GUC needs to be merged with full_page_writes.
 
  Basically I agree with you because I don't want to add new GUC very 
  similar to
  the existing one.
 
  But could you imagine the case where full_page_writes = off. Even in this 
  case,
  FPW is forcibly written only during base backup. Such FPW also should be
  compressed? Which compression algorithm should be used? If we want to
  choose the algorithm for such FPW, we would not be able to merge those two
  GUCs. IMO it's OK to always use the best compression algorithm for such FPW
  and merge them, though.

 I'd prefer a new name altogether

 torn_page_protection = 'full_page_writes'
 torn_page_protection = 'compressed_full_page_writes'
 torn_page_protection = 'none'

 this allows us to add new techniques later like

 torn_page_protection = 'background_FPWs'

 or

 torn_page_protection = 'double_buffering'

 when/if we add those new techniques

 Uh, how would that work if you want to compress the background_FPWs?
 Use compressed_background_FPWs?

We've currently got 1 technique for torn page protection, soon to have
2 and with a 3rd on the horizon and likely to receive effort in next
release.

It seems sensible to have just one parameter to describe the various
techniques, as suggested. I'm suggesting that we plan for how things
will look when we have the 3rd one as well.

Alternate suggestions welcome.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Jsonb: jbvBinary usage in the convertJsonbValue?

2014-05-29 Thread Dmitry Dolgov
Hi all,

I'm little confused by the *convertJsonbValue* functon at *jsonb_utils.c*
Maybe I misunderstood something, so I need help =)

 if (IsAJsonbScalar(val) || val-type == jbvBinary)
 convertJsonbScalar(buffer, header, val);

As I can see, the *convertJsonbScalar* function is used for scalar and
binary jsonb values. But this function doesn't handle the jbvBinary type.

 switch (scalarVal-type)
 case jbvNull:
 ...
 case jbvString:
 ...
 case jbvNumeric:
 .
 case jbvBool:
 .
 default:
 elog(ERROR, invalid jsonb scalar type);

Does this mean, that binary type is incorrect for *convertJsonbValue *? Or
am I missed something?


Re: [HACKERS] Proposing pg_hibernate

2014-05-29 Thread Gurjeet Singh
On May 29, 2014 12:12 AM, Amit Kapila amit.kapil...@gmail.com wrote:

 I agree with you that there are only few corner cases where evicting
 shared buffers by this utility would harm, but was wondering if we could
 even save those, say if it would only use available free buffers.  I think
 currently there is no such interface and inventing a new interface for
this
 case doesn't seem to reasonable unless we see any other use case of
 such a interface.

+1


[HACKERS] json/jsonb inconsistence

2014-05-29 Thread Teodor Sigaev

# select  '\uaBcD'::json;
   json
--
 \uaBcD

but

# select  '\uaBcD'::jsonb;
ERROR:  invalid input syntax for type json
LINE 1: select  '\uaBcD'::jsonb;
^
DETAIL:  Unicode escape values cannot be used for code point values above 007F 
when the server encoding is not UTF8.

CONTEXT:  JSON data, line 1: ...

and

# select  '\uaBcD'::json - 0;
ERROR:  invalid input syntax for type json
DETAIL:  Unicode escape values cannot be used for code point values above 007F 
when the server encoding is not UTF8.

CONTEXT:  JSON data, line 1: ...
Time: 0,696 ms

More than, json looks strange:

# select  '[\uaBcD]'::json;
json

 [\uaBcD]

but

# select  '[\uaBcD]'::json-0;
ERROR:  invalid input syntax for type json
DETAIL:  Unicode escape values cannot be used for code point values above 007F 
when the server encoding is not UTF8.

CONTEXT:  JSON data, line 1: [...

Looks like random parse rules.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] json/jsonb inconsistence - 2

2014-05-29 Thread Teodor Sigaev

postgres=# select  '[\u]'::json-0;
 ?column?
--
 \u
(1 row)

Time: 1,294 ms
postgres=# select  '[\u]'::jsonb-0;
 ?column?
---
 \\u
(1 row)

It seems to me that escape_json() is wrongly used in jsonb_put_escaped_value(), 
right name of escape_json() is a escape_to_json().


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2014-05-29 Thread Rahila Syed
Thanks for extending and revising the FPW-compress patch! Could you add
your patch into next CF?
Sure.  I will make improvements and add it to next CF.

Isn't it worth measuring the recovery performance for each compression
algorithm?
Yes I will post this soon.

On Wed, May 28, 2014 at 8:04 PM, Fujii Masao masao.fu...@gmail.com wrote:

 On Tue, May 27, 2014 at 12:57 PM, Rahila Syed rahilasyed...@gmail.com
 wrote:
  Hello All,
 
  0001-CompressBackupBlock_snappy_lz4_pglz extends patch on compression of
  full page writes to include LZ4 and Snappy . Changes include making
  compress_backup_block GUC from boolean to enum. Value of the GUC can be
  OFF, pglz, snappy or lz4 which can be used to turn off compression or set
  the desired compression algorithm.
 
  0002-Support_snappy_lz4 adds support for LZ4 and Snappy in PostgreSQL. It
  uses Andres’s patch for getting Makefiles working and has a few wrappers
 to
  make the function calls to LZ4 and Snappy compression functions and
 handle
  varlena datatypes.
  Patch Courtesy: Pavan Deolasee

 Thanks for extending and revising the FPW-compress patch! Could you add
 your patch into next CF?

  Also, compress_backup_block GUC needs to be merged with full_page_writes.

 Basically I agree with you because I don't want to add new GUC very
 similar to
 the existing one.

 But could you imagine the case where full_page_writes = off. Even in this
 case,
 FPW is forcibly written only during base backup. Such FPW also should be
 compressed? Which compression algorithm should be used? If we want to
 choose the algorithm for such FPW, we would not be able to merge those two
 GUCs. IMO it's OK to always use the best compression algorithm for such FPW
 and merge them, though.

  Tests use JDBC runner TPC-C benchmark to measure the amount of WAL
  compression ,tps and response time in each of the scenarios viz .
  Compression = OFF , pglz, LZ4 , snappy ,FPW=off

 Isn't it worth measuring the recovery performance for each compression
 algorithm?

 Regards,

 --
 Fujii Masao



Re: [HACKERS] json/jsonb inconsistence

2014-05-29 Thread Andrew Dunstan

On 05/29/2014 07:55 AM, Teodor Sigaev wrote:

# select  '\uaBcD'::json;
   json
--
 \uaBcD

but

# select  '\uaBcD'::jsonb;
ERROR:  invalid input syntax for type json
LINE 1: select  '\uaBcD'::jsonb;
^
DETAIL:  Unicode escape values cannot be used for code point values 
above 007F when the server encoding is not UTF8.

CONTEXT:  JSON data, line 1: ...

and

# select  '\uaBcD'::json - 0;
ERROR:  invalid input syntax for type json
DETAIL:  Unicode escape values cannot be used for code point values 
above 007F when the server encoding is not UTF8.

CONTEXT:  JSON data, line 1: ...
Time: 0,696 ms

More than, json looks strange:

# select  '[\uaBcD]'::json;
json

 [\uaBcD]

but

# select  '[\uaBcD]'::json-0;
ERROR:  invalid input syntax for type json
DETAIL:  Unicode escape values cannot be used for code point values 
above 007F when the server encoding is not UTF8.

CONTEXT:  JSON data, line 1: [...

Looks like random parse rules.




It is documented that for json we don't check the validity of unicode 
escapes until we try to use them. That's because the original json 
parser didn't check them, and if we started doing so now users who 
pg_upgraded would find themselves with invalid data in the database. The 
rules for jsonb are more strict, because we actually resolve the unicode 
escapes when constructing the jsonb object. There is nothing at all 
random about it, although I agree it's slightly inconsistent. It's been 
the subject of some discussion on -hackers previously, IIRC. I actually 
referred to this difference in my talk at pgCon last Friday.


Frankly, if you want to use json/jsonb, you are going to be best served 
by using a UTF8-encoded database, or not using non-ascii unicode escapes 
in json at all.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] backup_label revisited

2014-05-29 Thread Greg Stark
So I ran into the case again where a system crashed while a hot backup
was being taken. Postgres couldn't start up automatically because the
backup_label was present. This has come up before e.g.
http://www.postgresql.org/message-id/caazkufap1gxcojtyzch13rvevjevwro1vvfbysqtwgud9is...@mail.gmail.com
but I believe no progress was made.

I was trying to think if we could somehow identify if the backup_label
was from a backup in progress or a restore in progress. Obvious
choices like putting the server ip address in it are obviously not
going to work for several reasons.

However, at least on Linux wouldn't it be sufficient to put the inode
number of the backup_label file in the backup_label? If it's still the
same inode then it's just restarting, not a restore since afaik
there's no way for tar or the like to recreate the file with the same
inode on any filesystem. That would even protect against another
restore on the same host.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Peter Eisentraut
On 5/27/14, 10:37 PM, Tom Lane wrote:
 I'm not terribly happy about pushing such a change post-beta1 either,
 but it's not like this isn't something we've known was needed.  Anyway,
 what's the worst case if we find a bug here?  Tell people not to use
 uuid-ossp?

Mainly some more discussion time would have been nice.  Also, while the
old ossp-based uuid was broken in some cases, it had a well-defined
workaround.  The new code introduces a whole new dimension of potential
portability issues.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Peter Eisentraut
On 5/27/14, 10:37 PM, Tom Lane wrote:
 If you don't like this change, we can revert it and also revert the upgrade 
 to 2.69.

Nobody else appears to be concerned, but I would have preferred this option.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] json/jsonb inconsistence - 2

2014-05-29 Thread Andrew Dunstan


On 05/29/2014 08:00 AM, Teodor Sigaev wrote:

postgres=# select  '[\u]'::json-0;
 ?column?
--
 \u
(1 row)

Time: 1,294 ms
postgres=# select  '[\u]'::jsonb-0;
 ?column?
---
 \\u
(1 row)

It seems to me that escape_json() is wrongly used in 
jsonb_put_escaped_value(), right name of escape_json() is a 
escape_to_json().



That's a bug. I will look into it. I think we might need to special-case 
\u on output, just as we do on input.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] json/jsonb inconsistence

2014-05-29 Thread Teodor Sigaev

inconsistent. It's been the subject of some discussion on -hackers previously,
IIRC. I actually referred to this difference in my talk at pgCon last Friday.


I see, and I wasn't on your talk :( I'm playing around jsquery and now trying to 
implement UTF escapes there.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Andres Freund
On 2014-05-29 08:14:48 -0400, Peter Eisentraut wrote:
 On 5/27/14, 10:37 PM, Tom Lane wrote:
  If you don't like this change, we can revert it and also revert the upgrade 
  to 2.69.
 
 Nobody else appears to be concerned, but I would have preferred this option.

I am pretty concerned actually. But I don't see downgrading to an
earlier autoconf as something really helpful. There already was a huge
portability problem with the current code. Avoiding the autoconf update
for a while wouldn't have solved it. And in 5 years time the amount of
portability problems will be much larger.
Yes, it'd have been nice if this were done a month+ ago. But nobody
stepped up :(. Seems like the least bad choice :/

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix bogus %name-prefix option syntax in all our Bison files.

2014-05-29 Thread Peter Eisentraut
On 5/28/14, 7:02 PM, Andres Freund wrote:
 That's a good idea. What i've been thinking about is to add
 -Wno-deprecated to the bison rule in the interim. Maybe after a
 configure test for the option. All deprecation warnings so far seem to
 be pretty unhelpful.

Here is a patch.

diff --git a/config/programs.m4 b/config/programs.m4
index fd3a9a4..5570e55 100644
--- a/config/programs.m4
+++ b/config/programs.m4
@@ -23,6 +23,10 @@ if test $BISON; then
 *** Bison version 1.875 or later is required, but this is $pgac_bison_version.])
 BISON=
   fi
+  if echo $pgac_bison_version | $AWK '{ if ([$]4 = 3) exit 0; else exit 1;}'
+  then
+BISONFLAGS=$BISONFLAGS -Wno-deprecated
+  fi
 fi
 
 if test -z $BISON; then
diff --git a/configure b/configure
index 3663e50..5499d56 100755
--- a/configure
+++ b/configure
@@ -7075,6 +7075,10 @@ $as_echo $as_me: WARNING:
 *** Bison version 1.875 or later is required, but this is $pgac_bison_version. 2;}
 BISON=
   fi
+  if echo $pgac_bison_version | $AWK '{ if ($4 = 3) exit 0; else exit 1;}'
+  then
+BISONFLAGS=$BISONFLAGS -Wno-deprecated
+  fi
 fi
 
 if test -z $BISON; then

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Andrew Dunstan


On 05/29/2014 08:21 AM, Andres Freund wrote:

On 2014-05-29 08:14:48 -0400, Peter Eisentraut wrote:

On 5/27/14, 10:37 PM, Tom Lane wrote:

If you don't like this change, we can revert it and also revert the upgrade to 
2.69.

Nobody else appears to be concerned, but I would have preferred this option.

I am pretty concerned actually. But I don't see downgrading to an
earlier autoconf as something really helpful. There already was a huge
portability problem with the current code. Avoiding the autoconf update
for a while wouldn't have solved it. And in 5 years time the amount of
portability problems will be much larger.
Yes, it'd have been nice if this were done a month+ ago. But nobody
stepped up :(. Seems like the least bad choice :/




The most worrying thing is that we didn't find the occasioning problem 
when we switched to autoconf 2.69 back in December, so we end up dealing 
with bad choices now.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Peter Eisentraut
On 5/29/14, 8:21 AM, Andres Freund wrote:
 But I don't see downgrading to an
 earlier autoconf as something really helpful.

Well, we could have just hacked up that particular header check to do
what we want.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Andres Freund
On 2014-05-29 08:49:38 -0400, Peter Eisentraut wrote:
 On 5/29/14, 8:21 AM, Andres Freund wrote:
  But I don't see downgrading to an
  earlier autoconf as something really helpful.
 
 Well, we could have just hacked up that particular header check to do
 what we want.

Still wouldn't have solved that ossp already didn't work on several
platforms at all anymore and is likely to work on even fewer in 5 years.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread John Lumby
Thanks for looking at it!

 Date: Thu, 29 May 2014 00:19:33 +0300
 From: hlinnakan...@vmware.com
 To: johnlu...@hotmail.com; pgsql-hackers@postgresql.org
 CC: klaussfre...@gmail.com
 Subject: Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal 
 and patch
 
 On 05/28/2014 11:52 PM, John Lumby wrote:
 
 
 The patch seems to assume that you can put the aiocb struct in shared 
 memory, initiate an asynchronous I/O request from one process, and wait 
 for its completion from another process. I'm pretty surprised if that 
 works on any platform.

It works on linux.Actually this ability allows the asyncio implementation to
reduce complexity in one respect (yes I know it looks complex enough) :
it makes waiting for completion of an in-progress IO simpler than for
the existing synchronous IO case,.   since librt takes care of the waiting.
specifically,   no need for extra wait-for-io control blocks
such as in bufmgr's  WaitIO()

This also plays very nicely with the syncscan where the cohorts run close 
together.

If anyone would like to advise whether this also works on MacOS/BSD , windows,
that would be good,  as I can't verify it myself.

 
 How portable is POSIX aio nowadays? Googling around, it still seems that 
 on Linux, it's implemented using threads. Does the thread-emulation 
 implementation cause problems with the rest of the backend, which 
 assumes that there is only a single thread? In any case, I think we'll 

Good question,   I am not aware of any dependency on a backend having only
a single thread.Can you please point me to such dependencies?


 want to encapsulate the AIO implementation behind some kind of an API, 
 to allow other implementations to co-exist.

It is already  -it follows the smgr(stg mgr) - md (mag disk) -  fd 
(filesystem) 
layering used for the existing filesystem ops including posix-fadvise.

 
 Benchmarks?

I will see if I can package mine up somehow.
Would be great if anyone else would like to benchmark it on theirs   ...


 - Heikki
  

Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread John Lumby
I have pasted  below the EXPLAIN of one of my benchmark queries
(the one I reference in the README).
Plenty of nested loop joins.
However I think I understand your question as to how effective it would be
if the outer is not sorted, and I will see if I can dig into that
if I get time  (and it sounds as though Claudio is on it too).

The area of exactly what the best prefetch strategy should be for
each particular type of scan and context is a good one to work on.

John

 Date: Wed, 28 May 2014 18:12:23 -0700
 Subject: Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal 
 and patch
 From: p...@heroku.com
 To: klaussfre...@gmail.com
 CC: johnlu...@hotmail.com; pgsql-hackers@postgresql.org
 
 On Wed, May 28, 2014 at 5:59 PM, Claudio Freire klaussfre...@gmail.com 
 wrote:
  For nestloop, correct me if I'm wrong, but index scan nodes don't have
  visibility of the next tuple to be searched for.
 
 Nested loop joins are considered a particularly compelling case for
 prefetching, actually.
 
 -- 
 Peter Geoghegan

-
 QUERY PLAN 

  

 Limit  (cost=801294.81..801294.81 rows=2 width=532)
   CTE deploy_zone_down
 -  Recursive Union  (cost=1061.25..2687.40 rows=11 width=573)
   -  Nested Loop  (cost=1061.25..1423.74 rows=1 width=41)
 -  Nested Loop  (cost=1061.11..1421.22 rows=14 width=49)
   -  Bitmap Heap Scan on entity zone_tree  
(cost=1060.67..1175.80 rows=29 width=40)
 Recheck Cond: ((name = 'testZone-4375'::text) AND 
(name = 'testZone-5499'::text) AND ((discriminator)::text = 'ZONE'::text))
 -  BitmapAnd  (cost=1060.67..1060.67 rows=29 
width=0)
   -  Bitmap Index Scan on entity_name  
(cost=0.00..139.71 rows=5927 width=0)
 Index Cond: ((name = 
'testZone-4375'::text) AND (name = 'testZone-5499'::text))
   -  Bitmap Index Scan on 
entity_discriminatorx  (cost=0.00..920.70 rows=49636 width=0)
 Index Cond: ((discriminator)::text = 
'ZONE'::text)
   -  Index Scan using metadata_value_owner_id on 
metadata_value mv  (cost=0.43..8.45 rows=1 width=17)
 Index Cond: (owner_id = zone_tree.id)
 -  Index Scan using metadata_field_pkey on metadata_field mf  
(cost=0.15..0.17 rows=1 width=8)
   Index Cond: (id = mv.field_id)
   Filter: ((name)::text = 'deployable'::text)
   -  Nested Loop  (cost=0.87..126.34 rows=1 width=573)
 -  Nested Loop  (cost=0.72..125.44 rows=5 width=581)
   -  Nested Loop  (cost=0.29..83.42 rows=10 width=572)
 -  WorkTable Scan on deploy_zone_down dzd  
(cost=0.00..0.20 rows=10 width=540)
 -  Index Scan using 
entity_discriminator_parent_zone on entity ch  (cost=0.29..8.31 rows=1 width=40)
   Index Cond: ((parent_id = dzd.zone_tree_id) 
AND ((discriminator)::text = 'ZONE'::text))
   -  Index Scan using metadata_value_owner_id on 
metadata_value mv_1  (cost=0.43..4.19 rows=1 width=17)
 Index Cond: (owner_id = ch.id)
 -  Index Scan using metadata_field_pkey on metadata_field 
mf_1  (cost=0.15..0.17 rows=1 width=8)
   Index Cond: (id = mv_1.field_id)
   Filter: ((name)::text = 'deployable'::text)
   CTE deploy_zone_tree
 -  Recursive Union  (cost=0.00..9336.89 rows=21 width=1105)
   -  CTE Scan on deploy_zone_down dzd_1  (cost=0.00..0.22 rows=11 
width=1105)
   -  Nested Loop  (cost=0.43..933.63 rows=1 width=594)
 -  WorkTable Scan on deploy_zone_tree dzt  (cost=0.00..2.20 
rows=110 width=581)
 -  Index Scan using entity_pkey on entity pt  
(cost=0.43..8.46 rows=1 width=21)
   Index Cond: (id = dzt.zone_tree_ancestor_parent_id)
   Filter: ((discriminator)::text = ANY 
('{VIEW,ZONE}'::text[]))
   CTE forward_host_ip
 -  Nested Loop  (cost=1.30..149.65 rows=24 width=88)
   -  Nested Loop  (cost=0.87..121.69 rows=48 width=56)
 -  Nested Loop  (cost=0.43..71.61 

Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix bogus %name-prefix option syntax in all our Bison files.

2014-05-29 Thread Andres Freund
On 2014-05-29 08:33:05 -0400, Peter Eisentraut wrote:
 On 5/28/14, 7:02 PM, Andres Freund wrote:
  That's a good idea. What i've been thinking about is to add
  -Wno-deprecated to the bison rule in the interim. Maybe after a
  configure test for the option. All deprecation warnings so far seem to
  be pretty unhelpful.
 
 Here is a patch.

FWIW, I vote for applying something like it. It seems better to collect
the -Wno-deprecated in one place (i.e. configure) instead of having it
in every developer's Makefile.custom. The latter will be harder to get
rid of.
I'd add a comment about why it's been added though. I won't remember why
at least...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Claudio Freire
El 28/05/2014 22:12, Peter Geoghegan p...@heroku.com escribió:

 On Wed, May 28, 2014 at 5:59 PM, Claudio Freire klaussfre...@gmail.com
wrote:
  For nestloop, correct me if I'm wrong, but index scan nodes don't have
  visibility of the next tuple to be searched for.

 Nested loop joins are considered a particularly compelling case for
 prefetching, actually.

Of course. I only doubt it can be implemented without not so small changes
to all execution nodes.

I'll look into it


 --
 Peter Geoghegan


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-05-29 08:49:38 -0400, Peter Eisentraut wrote:
 Well, we could have just hacked up that particular header check to do
 what we want.

 Still wouldn't have solved that ossp already didn't work on several
 platforms at all anymore and is likely to work on even fewer in 5 years.

Yeah.  The problem is not with the header check, it's with the fact that
OSSP UUID is basically broken on several modern platforms.[1]  We were
going to have to do something about that pretty soon anyway.  I agree that
this isn't the most ideal time in the dev cycle to do something about it,
but fixing portability issues is one of the expected beta-time activities,
no?  That's really what this is.

regards, tom lane

[1] Quite aside from compilation problems, yesterday's testing results
suggest that it can't read the system MAC address at all on Debian:
http://www.postgresql.org/message-id/29063.1401333...@sss.pgh.pa.us


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix bogus %name-prefix option syntax in all our Bison files.

2014-05-29 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 FWIW, I vote for applying something like it. It seems better to collect
 the -Wno-deprecated in one place (i.e. configure) instead of having it
 in every developer's Makefile.custom. The latter will be harder to get
 rid of.

Yeah, that's a good point.

 I'd add a comment about why it's been added though. I won't remember why
 at least...

+1

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 05/29/2014 08:21 AM, Andres Freund wrote:
 Yes, it'd have been nice if this were done a month+ ago. But nobody
 stepped up :(. Seems like the least bad choice :/

 The most worrying thing is that we didn't find the occasioning problem 
 when we switched to autoconf 2.69 back in December, so we end up dealing 
 with bad choices now.

I think the main reason for that is that none of the buildfarm animals
were trying to build contrib/uuid-ossp on machines where OSSP UUID is
shaky.  Some of our packagers do so, though, and so once the beta got
into their hands we found out about the problem.

Short of adding try to package according to Debian, Red Hat, etc
packaging recipes to the buildfarm requirements, there doesn't seem to
be much that we could do to find out about such issues earlier.  (And
no, I'm not proposing that as a good idea.  As an ex-packager, I know
that those recipes are moving targets because of constantly changing
external requirements.  We don't want to take over that maintenance.)

So IMO we just have to expect that beta is going to turn up portability
issues, and we're going to have to do what it takes to resolve them,
even if it's nontrivial.

The good news on this front is that we have substantially more buildfarm
coverage of contrib/uuid-ossp than we had two days ago.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SP-GiST bug.

2014-05-29 Thread Teodor Sigaev

I don't think that checkAllTheSame is broken, but it probably would be
after this patch --- ISTM you're breaking the corner case described in the
header comment for checkAllTheSame, where we need to force splitting of a
set of existing tuples that the opclass can't distinguish.
INHO, this patch will not break - because it forces (in corner case) to create a 
node for new tuple but exclude it from list to insert. On next iteration we will 
find a needed node with empty list.


Comment:
 * If we know that the leaf tuples wouldn't all fit on one page, then we
 * exclude the last tuple (which is the incoming new tuple that forced a split)
 * from the check to see if more than one node is used.  The reason for this
 * is that if the existing tuples are put into only one chain, then even if
 * we move them all to an empty page, there would still not be room for the
 * new tuple, so we'd get into an infinite loop of picksplit attempts.

If we reserve a node for new tuple then on next iteration we will not fall into 
picksplit again - we will have an empty list. So new tuple will fall into 
another page.


BTW, look for following example:
create table xxx (p point);

insert into xxx select '(1,1)'::point from generate_series(1, 226) as v;
insert into xxx values ('(3,3)'::point);

create index xxxidx on xxx using spgist (p quad_point_ops);

easy to see that the single node is allTheSame although underlying leafs are 
different. Due to allTheSame search logic scans are correct but inefficient. And 
patch fixes it too.




What actually is broken, I think, is the spgist text opclass, because it's
using a zero node label for two different situations.  The scenario we're
looking at here is:


Agree for different situations, but disagree that's broken :)


 It seems like we have to
 distinguish the case of zero-length string from that of a dummy label
 for a pushed-down allTheSame tuple.

Actually, we do this: if allTheSame is true then nodes have a dummy labels.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] json/jsonb inconsistence - 2

2014-05-29 Thread Andrew Dunstan


On 05/29/2014 08:15 AM, Andrew Dunstan wrote:


On 05/29/2014 08:00 AM, Teodor Sigaev wrote:

postgres=# select '[\u]'::json-0;
 ?column?
--
 \u
(1 row)

Time: 1,294 ms
postgres=# select  '[\u]'::jsonb-0;
 ?column?
---
 \\u
(1 row)

It seems to me that escape_json() is wrongly used in 
jsonb_put_escaped_value(), right name of escape_json() is a 
escape_to_json().



That's a bug. I will look into it. I think we might need to 
special-case \u on output, just as we do on input.



Actually, this is just the tip of the iceberg.

Here's what 9.3 does:

   andrew=# select array_to_json(array['a','\u','b']::text[]);
array_to_json
   -
 [a,\\u,b]


I'm now wondering if we should pass though any unicode escape 
(presumably validated to some extent). I guess we can't change this in 
9.2/9.3 because it would be a behaviour change.


These unicode escapes have given us more trouble than any other part of 
the JSON spec :-(


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ancient bug in formatting.c/to_char()

2014-05-29 Thread Peter Geoghegan
On Thu, May 29, 2014 at 3:09 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Yeah, the code is pretty crufty, I'm not sure what the proper fix would be.
 I wonder if psprintf would be useful.


I don't know that it's worth worrying about the case you highlight. It
is certainly worth worrying about the lack of a similar fix in
float4_to_char(), though.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix bogus %name-prefix option syntax in all our Bison files.

2014-05-29 Thread Martijn van Oosterhout
On Thu, May 29, 2014 at 08:33:05AM -0400, Peter Eisentraut wrote:
 On 5/28/14, 7:02 PM, Andres Freund wrote:
  That's a good idea. What i've been thinking about is to add
  -Wno-deprecated to the bison rule in the interim. Maybe after a
  configure test for the option. All deprecation warnings so far seem to
  be pretty unhelpful.
 
 Here is a patch.
 

Does this need a comment indicating why it's needed and when it can be
removed?

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


[HACKERS] recovery testing for beta

2014-05-29 Thread Jeff Janes
What features in 9.4 need more beta testing for recovery?

I've applied my partial-write testing harness to several scenarios in 9.4.
 So far its found a recovery bug for gin indexes, a recovery bug for btree,
a vacuum bug for btree indexes (with foreign keys, but that is not relevant
to the bug), and nothing of interest for gist index, although it only
tested where text_array @@ to_tsquery(?) queries.

It also implicitly tested the xlog parallel write slots thing, as that is
common code to all recovery.

I also applied the foreign key test retroactively to 9.3, and it quickly
re-found the multixact bugs up until commit 9a57858f1103b89a5674.  The test
was designed only with the knowledge that the bugs involved foreign keys
and the consumption of multixacts.   I had no deeper knowledge of the
details of those bugs when designing the test, so I have a reasonable
amount of confidence that this could have found them in real time had I
bothered to try to test the feature during the previous beta cycle.

So, what else in 9.4 needs testing for recovery from crashes in general or
partial-writes in particular?

One thing is that I want to find a way to drive multixact in fast forward,
so that the freezing cycle gets a good workout. Currently I can't consume
enough of them to make them wrap around within the time frame of a test.

It looks like jsonb stuff only makes new operators for use by existing
indexes types, so probably is not a high risk for recovery bugs.  I will
probably try to test it anyway as a way to become more familiar with the
feature.  I don't really know about the logical streaming stuff.

These are the recent threads on hackers.  The first one has a link to the
harness variant which is set up for the foreign key testing.

9.4 btree index corruption
9.4 checksum error in recovery with btree index
9.4 checksum errors in recovery with gin index

Cheers,

Jeff


Re: [HACKERS] json/jsonb inconsistence - 2

2014-05-29 Thread Teodor Sigaev

Actually, this is just the tip of the iceberg.



Funny, but
postgres=# select '\u'::text;
  text

 \u
(1 row)

postgres=# select array['\u']::text[];
array
-
 {\\u}

postgres=# select '{\u}'::text[];
  text
-
 {u}

postgres=# select '{\\u}'::text[];
text
-
 {\\u}

%) I'm crazy about that. Suppose, we shouldn't worry about array type, 
just make output the same for json/jsonb

--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd uuid-ossp behavior on smew and shearwater

2014-05-29 Thread Josh Kupershmidt
On Wed, May 28, 2014 at 11:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Buildfarm critters smew and shearwater are reporting regression test
 failures that suggest that the UUID library can't get a system MAC
 address:
 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=smewdt=2014-05-28%2023%3A38%3A28
 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=shearwaterdt=2014-05-29%2000%3A24%3A32

 I've just committed regression test adjustments to prevent that from
 being a failure case, but I am confused about why it's happening.
 I wouldn't be surprised at not getting a MAC address on a machine that
 lacks any internet connection, but that surely can't describe the
 buildfarm environment.  Are you curious enough to poke into it and
 see what's going on?  It might be useful to strace a backend that's
 trying to execute uuid_generate_v1() and see what the kernel interaction
 looks like exactly.

Here's the result of attaching strace to an idle backend, then running
SELECT uuid_generate_v1(). AFAIR shearwater is a cheaply-hosted OpenVZ
VPS under the hood.

Josh
josh@ease1:~$ strace -p 6818
Process 6818 attached - interrupt to quit
recv(10, Q\0\0\0\37SELECT uuid_generate_v1();\0, 8192, 0) = 32
gettimeofday({1401383296, 920282}, NULL) = 0
gettimeofday({1401383296, 920313}, NULL) = 0
mmap2(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0xae80
stat64(/home/josh/runtime/lib/postgresql/uuid-ossp, 0xbfdf87d0) = -1 ENOENT 
(No such file or directory)
stat64(/home/josh/runtime/lib/postgresql/uuid-ossp.so, {st_mode=S_IFREG|0755, 
st_size=46685, ...}) = 0
stat64(/home/josh/runtime/lib/postgresql/uuid-ossp.so, {st_mode=S_IFREG|0755, 
st_size=46685, ...}) = 0
open(/home/josh/runtime/lib/postgresql/uuid-ossp.so, O_RDONLY) = 7
read(7, \177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0p\f\0\0004\0\0\0..., 
512) = 512
fstat64(7, {st_mode=S_IFREG|0755, st_size=46685, ...}) = 0
mmap2(NULL, 16796, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 7, 0) = 
0xae7fb000
mmap2(0xae7ff000, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 7, 0x3) = 0xae7ff000
close(7)= 0
open(/home/josh/runtime/lib/libuuid.so.1, O_RDONLY) = -1 ENOENT (No such file 
or directory)
open(tls/i686/sse2/cmov/libuuid.so.1, O_RDONLY) = -1 ENOENT (No such file or 
directory)
open(tls/i686/sse2/libuuid.so.1, O_RDONLY) = -1 ENOENT (No such file or 
directory)
open(tls/i686/cmov/libuuid.so.1, O_RDONLY) = -1 ENOENT (No such file or 
directory)
open(tls/i686/libuuid.so.1, O_RDONLY) = -1 ENOENT (No such file or directory)
open(tls/sse2/cmov/libuuid.so.1, O_RDONLY) = -1 ENOENT (No such file or 
directory)
open(tls/sse2/libuuid.so.1, O_RDONLY) = -1 ENOENT (No such file or directory)
open(tls/cmov/libuuid.so.1, O_RDONLY) = -1 ENOENT (No such file or directory)
open(tls/libuuid.so.1, O_RDONLY)  = -1 ENOENT (No such file or directory)
open(i686/sse2/cmov/libuuid.so.1, O_RDONLY) = -1 ENOENT (No such file or 
directory)
open(i686/sse2/libuuid.so.1, O_RDONLY) = -1 ENOENT (No such file or directory)
open(i686/cmov/libuuid.so.1, O_RDONLY) = -1 ENOENT (No such file or directory)
open(i686/libuuid.so.1, O_RDONLY) = -1 ENOENT (No such file or directory)
open(sse2/cmov/libuuid.so.1, O_RDONLY) = -1 ENOENT (No such file or directory)
open(sse2/libuuid.so.1, O_RDONLY) = -1 ENOENT (No such file or directory)
open(cmov/libuuid.so.1, O_RDONLY) = -1 ENOENT (No such file or directory)
open(libuuid.so.1, O_RDONLY)  = -1 ENOENT (No such file or directory)
open(/home/josh/runtime/lib/libuuid.so.1, O_RDONLY) = -1 ENOENT (No such file 
or directory)
open(/etc/ld.so.cache, O_RDONLY)  = 7
fstat64(7, {st_mode=S_IFREG|0644, st_size=30628, ...}) = 0
mmap2(NULL, 30628, PROT_READ, MAP_PRIVATE, 7, 0) = 0xae7f3000
close(7)= 0
access(/etc/ld.so.nohwcap, F_OK)  = -1 ENOENT (No such file or directory)
open(/lib/i386-linux-gnu/libuuid.so.1, O_RDONLY) = 7
read(7, 
\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\260\22\0\0004\0\0\0..., 512) 
= 512
fstat64(7, {st_mode=S_IFREG|0644, st_size=18000, ...}) = 0
mmap2(NULL, 20712, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 7, 0) = 
0xae7ed000
mmap2(0xae7f1000, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 7, 0x3) = 0xae7f1000
close(7)= 0
mprotect(0xae7f1000, 4096, PROT_READ)   = 0
munmap(0xae7f3000, 30628)   = 0
socket(PF_FILE, SOCK_STREAM, 0) = 7
connect(7, {sa_family=AF_FILE, path=/var/run/uuidd/request}, 110) = -1 ENOENT 
(No such file or directory)
access(/usr/sbin/uuidd, X_OK) = -1 ENOENT (No such file or directory)
close(7)= 0
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 7
ioctl(7, SIOCGIFCONF, {96, {{lo, {AF_INET, inet_addr(127.0.0.1)}}, 
{venet0, {AF_INET, inet_addr(127.0.0.2)}}, {venet0:0, {AF_INET, 
inet_addr(198.204.250.34)) = 0
ioctl(7, SIOCGIFHWADDR, {ifr_name=lo, ifr_hwaddr=00:00:00:00:00:00}) = 

Re: [HACKERS] Compression of full-page-writes

2014-05-29 Thread Bruce Momjian
On Thu, May 29, 2014 at 11:21:44AM +0100, Simon Riggs wrote:
  Uh, how would that work if you want to compress the background_FPWs?
  Use compressed_background_FPWs?
 
 We've currently got 1 technique for torn page protection, soon to have
 2 and with a 3rd on the horizon and likely to receive effort in next
 release.
 
 It seems sensible to have just one parameter to describe the various
 techniques, as suggested. I'm suggesting that we plan for how things
 will look when we have the 3rd one as well.
 
 Alternate suggestions welcome.

I was just pointing out that we might need compression to be a separate
boolean variable from the type of page tear protection.  I know I am
usually anti-adding-variables, but in this case it seems trying to have
one variable control several things will lead to confusion.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Bruce Momjian
On Thu, May 29, 2014 at 02:21:57PM +0200, Andres Freund wrote:
 On 2014-05-29 08:14:48 -0400, Peter Eisentraut wrote:
  On 5/27/14, 10:37 PM, Tom Lane wrote:
   If you don't like this change, we can revert it and also revert the 
   upgrade to 2.69.
  
  Nobody else appears to be concerned, but I would have preferred this option.
 
 I am pretty concerned actually. But I don't see downgrading to an
 earlier autoconf as something really helpful. There already was a huge
 portability problem with the current code. Avoiding the autoconf update
 for a while wouldn't have solved it. And in 5 years time the amount of
 portability problems will be much larger.
 Yes, it'd have been nice if this were done a month+ ago. But nobody
 stepped up :(. Seems like the least bad choice :/

One thing that concerns me is that we already had the problem that users
creating the uuid-ossp extension had to double-quote the name because of
the dash, and we have regularly questioned the viability of the
uuid-ossp codebase.

Now that we know we have to support alternatives, we are changing the
build API to support those alternatives, but doing nothing to decouple
the extension name from uuid-ossp and the dash issue.  

Seems this would be the logical time to just break compatibility and get
a sane API for UUID generation.  We could hack pg_dump --binary-upgrade
to map a dump of uuid-ossp to some other name, or hack the backend to
accept uuid-ossp and map that to something more sane.

I basically feel this change is making our bad API even more confusing. 
I think the only good thing about this change is that the _user_ API, as
odd as it is, remains the same.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery testing for beta

2014-05-29 Thread Bruce Momjian
On Thu, May 29, 2014 at 09:39:56AM -0700, Jeff Janes wrote:
 What features in 9.4 need more beta testing for recovery?
 
 I've applied my partial-write testing harness to several scenarios in 9.4.  So
 far its found a recovery bug for gin indexes, a recovery bug for btree, a
 vacuum bug for btree indexes (with foreign keys, but that is not relevant to
 the bug), and nothing of interest for gist index, although it only tested
 where text_array @@ to_tsquery(?) queries.  
 
 It also implicitly tested the xlog parallel write slots thing, as that is
 common code to all recovery.
 
 I also applied the foreign key test retroactively to 9.3, and it quickly
 re-found the multixact bugs up until commit 9a57858f1103b89a5674.  The test 
 was
 designed only with the knowledge that the bugs involved foreign keys and the
 consumption of multixacts.   I had no deeper knowledge of the details of those
 bugs when designing the test, so I have a reasonable amount of confidence that
 this could have found them in real time had I bothered to try to test the
 feature during the previous beta cycle.

Wow, that is impressive!  We are looking for a ways to find bugs like
the ones the appeared in 9.3.X, and it seems you might have found a way.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 One thing that concerns me is that we already had the problem that users
 creating the uuid-ossp extension had to double-quote the name because of
 the dash, and we have regularly questioned the viability of the
 uuid-ossp codebase.

 Now that we know we have to support alternatives, we are changing the
 build API to support those alternatives, but doing nothing to decouple
 the extension name from uuid-ossp and the dash issue.  

Well, if you've got a proposal for how to rename the extension without
creating major compatibility problems, let's hear it.

 Seems this would be the logical time to just break compatibility and get
 a sane API for UUID generation.

Most people think the sane API would be to put the functionality in
core, and forget about any extension name at all.  The compatibility
problems with that approach aren't exactly trivial either, but I suspect
that's where we'll end up in the long run.  So I'm not that excited about
kluge solutions for renaming the extension.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd uuid-ossp behavior on smew and shearwater

2014-05-29 Thread Tom Lane
Josh Kupershmidt schmi...@gmail.com writes:
 On Wed, May 28, 2014 at 11:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I've just committed regression test adjustments to prevent that from
 being a failure case, but I am confused about why it's happening.
 I wouldn't be surprised at not getting a MAC address on a machine that
 lacks any internet connection, but that surely can't describe the
 buildfarm environment.  Are you curious enough to poke into it and
 see what's going on?  It might be useful to strace a backend that's
 trying to execute uuid_generate_v1() and see what the kernel interaction
 looks like exactly.

 Here's the result of attaching strace to an idle backend, then running
 SELECT uuid_generate_v1(). AFAIR shearwater is a cheaply-hosted OpenVZ
 VPS under the hood.

Interesting.  Looks like you have access only to virtual network
interfaces, and they report all-zero MAC addresses, which the UUID library
is smart enough to ignore.  If smew is also in a virtual environment
then that's probably the explanation.  (There are some other buildfarm
critters that are reporting MAC addresses with the local-admin bit set,
which I suspect also means they've got virtual network interfaces, but
with a different treatment of the what-to-report problem.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Bruce Momjian
On Thu, May 29, 2014 at 01:56:22PM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  One thing that concerns me is that we already had the problem that users
  creating the uuid-ossp extension had to double-quote the name because of
  the dash, and we have regularly questioned the viability of the
  uuid-ossp codebase.
 
  Now that we know we have to support alternatives, we are changing the
  build API to support those alternatives, but doing nothing to decouple
  the extension name from uuid-ossp and the dash issue.  
 
 Well, if you've got a proposal for how to rename the extension without
 creating major compatibility problems, let's hear it.

Well, the only two I could think of were the pg_dump and backend mapping
changes;  pretty hacky.

  Seems this would be the logical time to just break compatibility and get
  a sane API for UUID generation.
 
 Most people think the sane API would be to put the functionality in
 core, and forget about any extension name at all.  The compatibility
 problems with that approach aren't exactly trivial either, but I suspect
 that's where we'll end up in the long run.  So I'm not that excited about
 kluge solutions for renaming the extension.

OK.  I was just worried that users are now using a badly-named uuid-ossp
extension that isn't even using uuid-ossp in many cases.  If we have a
long-term plan to fix this, then you are right that it isn't worth
worrying about it now.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery testing for beta

2014-05-29 Thread Alvaro Herrera
Jeff Janes wrote:

 One thing is that I want to find a way to drive multixact in fast forward,
 so that the freezing cycle gets a good workout. Currently I can't consume
 enough of them to make them wrap around within the time frame of a test.

IIRC I lobotomized it up by removing the XLogInsert() call.  That allows
you to generate large amounts of multixacts quickly.  In my laptop this
was able to do four or five wraparound cycles in two-three hours or so,
using the burn multixact utility here:
http://www.postgresql.org/message-id/20131231035913.gu22...@eldon.alvh.no-ip.org

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: Typo fixes in Solution.pm, part of MSVC scripts

2014-05-29 Thread Robert Haas
On Sun, May 25, 2014 at 11:34 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 I noticed a couple of typos in Solution.pm, fixed by the patch
 attached. Those things have been introduced by commits cec8394 (visual
 2013 support) and 0a78320 (latest pgident run, not actually a problem
 of pgident, because of a misuse of commas).
 The tab problems should not be backpatched to 9.3, but the use of
 commas instead of semicolons should be.

I suspect it would be good, then, to separate this into two patches,
one for back-patch and one not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery testing for beta

2014-05-29 Thread Josh Berkus
On 05/29/2014 09:39 AM, Jeff Janes wrote:
 
 I've applied my partial-write testing harness to several scenarios in
 9.4.  So far its found a recovery bug for gin indexes, a recovery bug
 for btree, a vacuum bug for btree indexes (with foreign keys, but that
 is not relevant to the bug), and nothing of interest for gist index,
 although it only tested where text_array @@ to_tsquery(?) queries.

This is awesome.  Thanks for doing it!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SP-GiST bug.

2014-05-29 Thread Tom Lane
Teodor Sigaev teo...@sigaev.ru writes:
 I don't think that checkAllTheSame is broken, but it probably would be
 after this patch --- ISTM you're breaking the corner case described in the
 header comment for checkAllTheSame, where we need to force splitting of a
 set of existing tuples that the opclass can't distinguish.

 INHO, this patch will not break - because it forces (in corner case) to 
 create a 
 node for new tuple but exclude it from list to insert. On next iteration we 
 will 
 find a needed node with empty list.

 Comment:
   * If we know that the leaf tuples wouldn't all fit on one page, then we
   * exclude the last tuple (which is the incoming new tuple that forced a 
 split)
   * from the check to see if more than one node is used.  The reason for this
   * is that if the existing tuples are put into only one chain, then even if
   * we move them all to an empty page, there would still not be room for the
   * new tuple, so we'd get into an infinite loop of picksplit attempts.

 If we reserve a node for new tuple then on next iteration we will not fall 
 into 
 picksplit again - we will have an empty list. So new tuple will fall into 
 another page.

The point I'm making is that the scenario your test case exposes is not
an infinite loop of picksplits, but an infinite loop of choose calls.
There are certainly ways to get into that loop that don't involve this
specific checkAllTheSame logic.  Here's an example:

regression=# create table xxx ( t text);
CREATE TABLE
regression=# create index xxxidx on xxx using spgist (t);
CREATE INDEX
regression=# insert into xxx select repeat('x',4000);
INSERT 0 1
regression=# insert into xxx select repeat('x',4000);
INSERT 0 1
regression=# insert into xxx select repeat('x',4000);
INSERT 0 1
regression=# insert into xxx select repeat('x',3996);

In this example, we get a picksplit at the third insert, and here we
*must* take the allTheSame path because the values are, in fact, all the
same (the SPGIST_MAX_PREFIX_LENGTH constraint means we can only put 3996
characters into the prefix, and then the 3997'th character of each value
is 'x').  Then the fourth insert triggers this same infinite loop, because
spg_text_choose is asked how to insert the slightly-shorter value into the
allTheSame tuple, and it does the wrong thing.

It's certainly possible that we could/should change what checkAllTheSame
is doing --- on re-reading the comment, I'm not really sure that the
scenario it's concerned about can happen.  However, that will not fix
the bug in spgtextproc.c; it will only block one avenue for reaching
the bug.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery testing for beta

2014-05-29 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Jeff Janes wrote:
 One thing is that I want to find a way to drive multixact in fast forward,
 so that the freezing cycle gets a good workout. Currently I can't consume
 enough of them to make them wrap around within the time frame of a test.

 IIRC I lobotomized it up by removing the XLogInsert() call.  That allows
 you to generate large amounts of multixacts quickly.  In my laptop this
 was able to do four or five wraparound cycles in two-three hours or so,
 using the burn multixact utility here:
 http://www.postgresql.org/message-id/20131231035913.gu22...@eldon.alvh.no-ip.org

Another possibility is to use pg_resetxlog to manually advance the
multixact counter to a point near wraparound.  I think you have to
manually create appropriate slru segment files as well when doing that
(someday we should hack pg_resetxlog to do that for you).  Still, it
might beat waiting hours to burn multixacts one by one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd uuid-ossp behavior on smew and shearwater

2014-05-29 Thread Andrew Dunstan


On 05/29/2014 02:38 PM, Tom Lane wrote:

Josh Kupershmidt schmi...@gmail.com writes:

On Wed, May 28, 2014 at 11:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:

I've just committed regression test adjustments to prevent that from
being a failure case, but I am confused about why it's happening.
I wouldn't be surprised at not getting a MAC address on a machine that
lacks any internet connection, but that surely can't describe the
buildfarm environment.  Are you curious enough to poke into it and
see what's going on?  It might be useful to strace a backend that's
trying to execute uuid_generate_v1() and see what the kernel interaction
looks like exactly.

Here's the result of attaching strace to an idle backend, then running
SELECT uuid_generate_v1(). AFAIR shearwater is a cheaply-hosted OpenVZ
VPS under the hood.

Interesting.  Looks like you have access only to virtual network
interfaces, and they report all-zero MAC addresses, which the UUID library
is smart enough to ignore.  If smew is also in a virtual environment
then that's probably the explanation.  (There are some other buildfarm
critters that are reporting MAC addresses with the local-admin bit set,
which I suspect also means they've got virtual network interfaces, but
with a different treatment of the what-to-report problem.)






Almost all my critters run in VMs (all but jacana and bowerbird).

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for GIST

2014-05-29 Thread Robert Haas
On Sun, May 25, 2014 at 6:12 AM, Anastasia Lubennikova
lubennikov...@gmail.com wrote:
 Hi, hackers!
 There are first results of my work on GSoC project Index-only scans for
 GIST.

Cool.

 1. Version of my project code is in forked repository
 https://github.com/lubennikovaav/postgres/tree/indexonlygist2
 Patch is in attachments
 - This version is only for one-column indexes

That's probably a limitation that needs to be fixed before this can be
committed.

 - fetch() method is realized only for box opclass (because it's trivial)

That might not need to be fixed before this can be committed.

 2. I test Index-only scans with SQL script box_test.sql
 and it works really faster. (results in box_test_out)

 I'll be glad to get your feedback about this feature.

Since this is a GSoC project, it would be nice if one of the people
who is knowledgeable about GIST (Heikki, Alexander, etc.) could weigh
in on this before too much time goes by, so that Anastasia can press
forward with this work.

I don't know enough to offer too many substantive comments, but I
think you should remove all of the //-style comments (most of which
are debugging leftovers) and add some more comments describing what
you're actually doing and, more importantly, why.

This comment doesn't appear to make sense:

+   /*
+* The offset number on tuples on internal pages is unused.
For historical
+* reasons, it is set 0x.
+*/

The reason this doesn't make sense is because the tuple in question is
not on an internal page, or indeed any page at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Heikki Linnakangas

On 05/29/2014 04:12 PM, John Lumby wrote:

Thanks for looking at it!


Date: Thu, 29 May 2014 00:19:33 +0300
From: hlinnakan...@vmware.com
To: johnlu...@hotmail.com; pgsql-hackers@postgresql.org
CC: klaussfre...@gmail.com
Subject: Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal 
and patch

On 05/28/2014 11:52 PM, John Lumby wrote:




The patch seems to assume that you can put the aiocb struct in shared
memory, initiate an asynchronous I/O request from one process, and wait
for its completion from another process. I'm pretty surprised if that
works on any platform.


It works on linux.Actually this ability allows the asyncio implementation to
reduce complexity in one respect (yes I know it looks complex enough) :
it makes waiting for completion of an in-progress IO simpler than for
the existing synchronous IO case,.   since librt takes care of the waiting.
specifically,   no need for extra wait-for-io control blocks
such as in bufmgr's  WaitIO()


[checks]. No, it doesn't work. See attached test program.

It kinda seems to work sometimes, because of the way it's implemented in 
glibc. The aiocb struct has a field for the result value and errno, and 
when the I/O is finished, the worker thread fills them in. aio_error() 
and aio_return() just return the values of those fields, so calling 
aio_error() or aio_return() do in fact happen to work from a different 
process. aio_suspend(), however, is implemented by sleeping on a 
process-local mutex, which does not work from a different process.


Even if it worked on Linux today, it would be a bad idea to rely on it 
from a portability point of view. No, the only sane way to make this 
work is that the process that initiates an I/O request is responsible 
for completing it. If another process needs to wait for an async I/O to 
complete, we must use some other means to do the waiting. Like the 
io_in_progress_lock that we already have, for the same purpose.


- Heikki

/*
 * Test program to test if POSIX aio functions work across processes
 */

#include unistd.h
#include stdio.h
#include stdlib.h
#include string.h
#include sys/mman.h
#include sys/types.h
#include sys/stat.h
#include fcntl.h
#include aio.h


char *shmem;

void
processA(void)
{
	int fd;
	struct aiocb *aiocbp = (struct aiocb *) shmem;
	char *buf = shmem + sizeof(struct aiocb);

	fd = open(aio-shmem-test-file, O_CREAT | O_WRONLY | O_SYNC, S_IRWXU);
	if (fd == -1)
	{
		fprintf(stderr, open() failed\n);
		exit(1);
	}
	printf(processA starting AIO\n);

	strcpy(buf, foobar);

	memset(aiocbp, 0, sizeof(struct aiocb));
	aiocbp-aio_fildes = fd;
	aiocbp-aio_offset = 0;
	aiocbp-aio_buf = buf;
	aiocbp-aio_nbytes = strlen(buf);
	aiocbp-aio_reqprio = 0;
	aiocbp-aio_sigevent.sigev_notify = SIGEV_NONE;

	if (aio_write(aiocbp) != 0)
	{
		fprintf(stderr, aio_write() failed\n);
		exit(1);
	}
}

void
processB(void)
{
	struct aiocb *aiocbp = (struct aiocb *) shmem;
	const struct aiocb * const pl[1] = { aiocbp };
	int rv;


	printf(waiting for the write to finish in process B\n);

	if (aio_suspend(pl, 1, NULL) != 0)
	{
		fprintf(stderr, aio_suspend() failed\n);
		exit(1);
	}
	printf(aio_suspend() returned 0\n);

	rv = aio_error(aiocbp);
	if (rv != 0)
	{
		fprintf(stderr, aio_error returned %d: %s\n, rv, strerror(rv));
		exit(1);
	}
	rv = aio_return(aiocbp);
	printf(aio_return returned %d\n, rv);
}



int main(int argc, char **argv)
{
	int pidB;

	shmem = mmap(NULL, sizeof(struct aiocb) + 1000,
 PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS,
 -1, 0);
	if (shmem == MAP_FAILED)
	{
		fprintf(stderr, mmap() failed\n);
		exit(1);
	}

#ifdef SINGLE_PROCESS
	/* this works */
	processA();
	processB();
#else
	/*
	 * Start the I/O request in parent process, then fork and try to wait
	 * for it to finish from the child process. (doesn't work, it will hang
	 * forever)
	 */
	processA();

	pidB = fork();
	if (pidB == -1)
	{
		fprintf(stderr, fork() failed\n);
		exit(1);
	}
	if (pidB != 0)
	{
		/* parent */
		wait (pidB);
	}
	else
	{
		/* child */
		processB();
	}
#endif
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CHECK_FOR_INTERRUPTS in spgdoinsert() isn't helpful

2014-05-29 Thread Tom Lane
One of the annoying things about the SPGiST bug Teodor pointed out is
that once you're in the infinite loop, query cancel doesn't work to get
you out of it.  There are a couple of CHECK_FOR_INTERRUPTS() calls in
spgdoinsert() that are meant to get you out of exactly this sort of
buggy-opclass situation --- but they don't help.  The reason is that
when we reach that point we're holding one or more buffer locks, which
means the HOLD_INTERRUPTS count is positive, so ProcessInterrupts is
afraid to do anything.

In point of fact, we'd be happy to give up the locks and then throw
the error.  So I was thinking about inventing some macro or out-of-line
function that went about like this:

if (InterruptPending  (QueryCancelPending || ProcDiePending))
{
LWLockReleaseAll();
ProcessInterrupts();
elog(ERROR, ProcessInterrupts failed to throw an error);
}

where we don't expect to reach the final elog; it's just there to make
sure we don't return control after yanking locks out from under the
caller.

You might object that this is risky since it might release locks other
than the buffer locks spgdoinsert() is specifically concerned with.
However, if spgdoinsert() were to throw an error for a reason other than
query cancel, any such locks would get released anyway as the first step
during error cleanup, so I think this is not a real concern.

There may be other places in the index AMs or other low-level code where
this'd be appropriate; I've not really looked.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Logical slots not mentioned in CREATE_REPLICATION_SLOT for replication protocol docs

2014-05-29 Thread Robert Haas
On Thu, May 22, 2014 at 12:18 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Thu, May 22, 2014 at 12:44 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 Hi all,

 As written in subject, replication protocol documentation lacks
 details about logical slots in CREATE_REPLICATION_SLOT command:
 http://www.postgresql.org/docs/devel/static/protocol-replication.html
 Attached is a patch correcting that.
 An additional thing I noticed: START_REPLICATION does not mention that
 it is possible to specify options for the output plugin. All the fixes
 are included in the patch attached.

Thanks, this looks good.  But shouldn't the bit about output plugin
options mention say something like:

( option_name option_argument [, ...] )

...instead of just:

( option [, ...] )

?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CHECK_FOR_INTERRUPTS in spgdoinsert() isn't helpful

2014-05-29 Thread Heikki Linnakangas

On 05/29/2014 11:25 PM, Tom Lane wrote:

One of the annoying things about the SPGiST bug Teodor pointed out is
that once you're in the infinite loop, query cancel doesn't work to get
you out of it.  There are a couple of CHECK_FOR_INTERRUPTS() calls in
spgdoinsert() that are meant to get you out of exactly this sort of
buggy-opclass situation --- but they don't help.  The reason is that
when we reach that point we're holding one or more buffer locks, which
means the HOLD_INTERRUPTS count is positive, so ProcessInterrupts is
afraid to do anything.

In point of fact, we'd be happy to give up the locks and then throw
the error.  So I was thinking about inventing some macro or out-of-line
function that went about like this:

if (InterruptPending  (QueryCancelPending || ProcDiePending))
{
LWLockReleaseAll();
ProcessInterrupts();
elog(ERROR, ProcessInterrupts failed to throw an error);
}

where we don't expect to reach the final elog; it's just there to make
sure we don't return control after yanking locks out from under the
caller.


Also checking that CritSectionCount == 0 seems like a good idea...

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Claudio Freire
On Thu, May 29, 2014 at 5:23 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 05/29/2014 04:12 PM, John Lumby wrote:

 Thanks for looking at it!

 Date: Thu, 29 May 2014 00:19:33 +0300
 From: hlinnakan...@vmware.com
 To: johnlu...@hotmail.com; pgsql-hackers@postgresql.org
 CC: klaussfre...@gmail.com
 Subject: Re: [HACKERS] Extended Prefetching using Asynchronous IO -
 proposal and patch

 On 05/28/2014 11:52 PM, John Lumby wrote:



 The patch seems to assume that you can put the aiocb struct in shared
 memory, initiate an asynchronous I/O request from one process, and wait
 for its completion from another process. I'm pretty surprised if that
 works on any platform.


 It works on linux.Actually this ability allows the asyncio
 implementation to
 reduce complexity in one respect (yes I know it looks complex enough) :
 it makes waiting for completion of an in-progress IO simpler than for
 the existing synchronous IO case,.   since librt takes care of the
 waiting.
 specifically,   no need for extra wait-for-io control blocks
 such as in bufmgr's  WaitIO()


 [checks]. No, it doesn't work. See attached test program.

 It kinda seems to work sometimes, because of the way it's implemented in
 glibc. The aiocb struct has a field for the result value and errno, and when
 the I/O is finished, the worker thread fills them in. aio_error() and
 aio_return() just return the values of those fields, so calling aio_error()
 or aio_return() do in fact happen to work from a different process.
 aio_suspend(), however, is implemented by sleeping on a process-local mutex,
 which does not work from a different process.

 Even if it worked on Linux today, it would be a bad idea to rely on it from
 a portability point of view. No, the only sane way to make this work is that
 the process that initiates an I/O request is responsible for completing it.
 If another process needs to wait for an async I/O to complete, we must use
 some other means to do the waiting. Like the io_in_progress_lock that we
 already have, for the same purpose.

But calls to it are timeouted by 10us, effectively turning the thing
into polling mode.

Which is odd now that I read carefully, is how come 256 waits of 10us
amounts to anything? That's just 2.5ms, not enough IIUC for any normal
I/O to complete.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CHECK_FOR_INTERRUPTS in spgdoinsert() isn't helpful

2014-05-29 Thread Peter Geoghegan
On Thu, May 29, 2014 at 1:34 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Also checking that CritSectionCount == 0 seems like a good idea...


What do you do if it isn't?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Heikki Linnakangas

On 05/29/2014 11:34 PM, Claudio Freire wrote:

On Thu, May 29, 2014 at 5:23 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

On 05/29/2014 04:12 PM, John Lumby wrote:



On 05/28/2014 11:52 PM, John Lumby wrote:

The patch seems to assume that you can put the aiocb struct in shared
memory, initiate an asynchronous I/O request from one process, and wait
for its completion from another process. I'm pretty surprised if that
works on any platform.


It works on linux.Actually this ability allows the asyncio
implementation to
reduce complexity in one respect (yes I know it looks complex enough) :
it makes waiting for completion of an in-progress IO simpler than for
the existing synchronous IO case,.   since librt takes care of the
waiting.
specifically,   no need for extra wait-for-io control blocks
such as in bufmgr's  WaitIO()


[checks]. No, it doesn't work. See attached test program.

It kinda seems to work sometimes, because of the way it's implemented in
glibc. The aiocb struct has a field for the result value and errno, and when
the I/O is finished, the worker thread fills them in. aio_error() and
aio_return() just return the values of those fields, so calling aio_error()
or aio_return() do in fact happen to work from a different process.
aio_suspend(), however, is implemented by sleeping on a process-local mutex,
which does not work from a different process.

Even if it worked on Linux today, it would be a bad idea to rely on it from
a portability point of view. No, the only sane way to make this work is that
the process that initiates an I/O request is responsible for completing it.
If another process needs to wait for an async I/O to complete, we must use
some other means to do the waiting. Like the io_in_progress_lock that we
already have, for the same purpose.


But calls to it are timeouted by 10us, effectively turning the thing
into polling mode.


We don't want polling... And even if we did, calling aio_suspend() in a 
way that's known to be broken, in a loop, is a pretty crappy way of polling.



Which is odd now that I read carefully, is how come 256 waits of 10us
amounts to anything? That's just 2.5ms, not enough IIUC for any normal
I/O to complete


Wild guess: the buffer you're reading is already in OS cache.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CHECK_FOR_INTERRUPTS in spgdoinsert() isn't helpful

2014-05-29 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 05/29/2014 11:25 PM, Tom Lane wrote:
 In point of fact, we'd be happy to give up the locks and then throw
 the error.  So I was thinking about inventing some macro or out-of-line
 function that went about like this:
 
 if (InterruptPending  (QueryCancelPending || ProcDiePending))
 {
  LWLockReleaseAll();
  ProcessInterrupts();
  elog(ERROR, ProcessInterrupts failed to throw an error);
 }

 Also checking that CritSectionCount == 0 seems like a good idea...

Yeah, and there may be a couple other details like that.  Right now
I'm just thinking about not allowing LWLocks to block the cancel.
I guess something else to consider is whether InterruptHoldoffCount
could be larger than the number of held LWLocks; if so, that would
prevent this from working as desired.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CHECK_FOR_INTERRUPTS in spgdoinsert() isn't helpful

2014-05-29 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes:
 On Thu, May 29, 2014 at 1:34 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 Also checking that CritSectionCount == 0 seems like a good idea...

 What do you do if it isn't?

Not throw the error.  (If we did, it'd turn into a PANIC, which doesn't
seem like what we want.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Claudio Freire
On Thu, May 29, 2014 at 5:39 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 05/29/2014 11:34 PM, Claudio Freire wrote:

 On Thu, May 29, 2014 at 5:23 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:

 On 05/29/2014 04:12 PM, John Lumby wrote:


 On 05/28/2014 11:52 PM, John Lumby wrote:

 The patch seems to assume that you can put the aiocb struct in shared
 memory, initiate an asynchronous I/O request from one process, and wait
 for its completion from another process. I'm pretty surprised if that
 works on any platform.


 It works on linux.Actually this ability allows the asyncio
 implementation to
 reduce complexity in one respect (yes I know it looks complex enough) :
 it makes waiting for completion of an in-progress IO simpler than for
 the existing synchronous IO case,.   since librt takes care of the
 waiting.
 specifically,   no need for extra wait-for-io control blocks
 such as in bufmgr's  WaitIO()


 [checks]. No, it doesn't work. See attached test program.

 It kinda seems to work sometimes, because of the way it's implemented in
 glibc. The aiocb struct has a field for the result value and errno, and
 when
 the I/O is finished, the worker thread fills them in. aio_error() and
 aio_return() just return the values of those fields, so calling
 aio_error()
 or aio_return() do in fact happen to work from a different process.
 aio_suspend(), however, is implemented by sleeping on a process-local
 mutex,
 which does not work from a different process.

 Even if it worked on Linux today, it would be a bad idea to rely on it
 from
 a portability point of view. No, the only sane way to make this work is
 that
 the process that initiates an I/O request is responsible for completing
 it.
 If another process needs to wait for an async I/O to complete, we must
 use
 some other means to do the waiting. Like the io_in_progress_lock that we
 already have, for the same purpose.


 But calls to it are timeouted by 10us, effectively turning the thing
 into polling mode.


 We don't want polling... And even if we did, calling aio_suspend() in a way
 that's known to be broken, in a loop, is a pretty crappy way of polling.

Agreed. Just saying that that's why it works.

The PID of the process responsible for the aio should be there
somewhere, and other processes should explicitly synchronize (or
initiate their own I/O and let the OS merge them - which also works).



 Which is odd now that I read carefully, is how come 256 waits of 10us
 amounts to anything? That's just 2.5ms, not enough IIUC for any normal
 I/O to complete


 Wild guess: the buffer you're reading is already in OS cache.

My wild guess as well.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CHECK_FOR_INTERRUPTS in spgdoinsert() isn't helpful

2014-05-29 Thread Peter Geoghegan
On Thu, May 29, 2014 at 1:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Not throw the error.  (If we did, it'd turn into a PANIC, which doesn't
 seem like what we want.)

Of course, but it isn't obvious to me that that isn't what we'd want,
if the alternative is definitely that we'd be stuck in an infinite
uninterruptible loop (I guess we couldn't just avoid throwing the
error, thereby risking proceeding without locks -- and so I guess we
can basically do nothing like this in critical sections). Now, that
probably isn't the actual choice that we must face, but it also isn't
obvious to me how you can do better than not throwing the error (and
doing nothing extra) when in a critical section. That was the intent
of my original question.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CHECK_FOR_INTERRUPTS in spgdoinsert() isn't helpful

2014-05-29 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes:
 On Thu, May 29, 2014 at 1:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Not throw the error.  (If we did, it'd turn into a PANIC, which doesn't
 seem like what we want.)

 Of course, but it isn't obvious to me that that isn't what we'd want,
 if the alternative is definitely that we'd be stuck in an infinite
 uninterruptible loop (I guess we couldn't just avoid throwing the
 error, thereby risking proceeding without locks -- and so I guess we
 can basically do nothing like this in critical sections). Now, that
 probably isn't the actual choice that we must face, but it also isn't
 obvious to me how you can do better than not throwing the error (and
 doing nothing extra) when in a critical section. That was the intent
 of my original question.

I don't feel a need for special behavior for this inside critical
sections; a critical section that lasts long enough for query cancel
response to be problematic is probably already broken by design.

The other case of InterruptHoldoffCount being too high might be more
relevant.  I have not gone around and looked at retail uses of
HOLD_INTERRUPTS to see whether there are any that might be holding counts
for long periods.  But I have checked that for the case in spgdoinsert,
the only thing staying ProcessInterrupts' hand is the holdoff count
associated with the buffer lock on the current index page.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Tom Lane
Claudio Freire klaussfre...@gmail.com writes:
 Didn't fix that, but the attached patch does fix regression tests when
 scanning over index types other than btree (was invoking elog when the
 index am didn't have ampeeknexttuple)

ampeeknexttuple?  That's a bit scary.  It would certainly be unsafe
for non-MVCC snapshots (read about vacuum vs indexscan interlocks in
nbtree/README).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd uuid-ossp behavior on smew and shearwater

2014-05-29 Thread Josh Kupershmidt
On Thu, May 29, 2014 at 4:06 PM, Andrew Dunstan and...@dunslane.net wrote:
 On 05/29/2014 02:38 PM, Tom Lane wrote:
 Josh Kupershmidt schmi...@gmail.com writes:
 Interesting.  Looks like you have access only to virtual network
 interfaces, and they report all-zero MAC addresses, which the UUID library
 is smart enough to ignore.  If smew is also in a virtual environment
 then that's probably the explanation.  (There are some other buildfarm
 critters that are reporting MAC addresses with the local-admin bit set,
 which I suspect also means they've got virtual network interfaces, but
 with a different treatment of the what-to-report problem.)

 Almost all my critters run in VMs (all but jacana and bowerbird).

They're not running on OpenVZ, are they? `ifconfig` on shearwater says:

venet0Link encap:UNSPEC  HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
  inet addr:127.0.0.2  P-t-P:127.0.0.2  Bcast:0.0.0.0
Mask:255.255.255.255
  UP BROADCAST POINTOPOINT RUNNING NOARP  MTU:1500  Metric:1
  RX packets:1409294 errors:0 dropped:0 overruns:0 frame:0
  TX packets:1488401 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:751149524 (716.3 MiB)  TX bytes:740573200 (706.2 MiB)

venet0:0  Link encap:UNSPEC  HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
  inet addr:198.204.250.34  P-t-P:198.204.250.34
Bcast:0.0.0.0  Mask:255.255.255.255
  UP BROADCAST POINTOPOINT RUNNING NOARP  MTU:1500  Metric:1

and it seems this all-zeros MAC address is a common
(mis-?)configuration on OpenVZ:

https://github.com/robertdavidgraham/masscan/issues/43
http://stackoverflow.com/questions/5838225/how-do-i-get-a-default-gridgain-node-in-openvz-discover-other-nodes-on-the-same
http://forum.openvz.org/index.php?t=msggoto=8117


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Claudio Freire
On Thu, May 29, 2014 at 6:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Claudio Freire klaussfre...@gmail.com writes:
 Didn't fix that, but the attached patch does fix regression tests when
 scanning over index types other than btree (was invoking elog when the
 index am didn't have ampeeknexttuple)

 ampeeknexttuple?  That's a bit scary.  It would certainly be unsafe
 for non-MVCC snapshots (read about vacuum vs indexscan interlocks in
 nbtree/README).


It's not really the tuple, just the tid


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Claudio Freire
On Thu, May 29, 2014 at 6:43 PM, Claudio Freire klaussfre...@gmail.com wrote:
 On Thu, May 29, 2014 at 6:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Claudio Freire klaussfre...@gmail.com writes:
 Didn't fix that, but the attached patch does fix regression tests when
 scanning over index types other than btree (was invoking elog when the
 index am didn't have ampeeknexttuple)

 ampeeknexttuple?  That's a bit scary.  It would certainly be unsafe
 for non-MVCC snapshots (read about vacuum vs indexscan interlocks in
 nbtree/README).


 It's not really the tuple, just the tid

And, furthermore, it's used only to do prefetching, so even if the tid
was invalid when the tuple needs to be accessed, it wouldn't matter,
because the indexam wouldn't use the result of ampeeknexttuple to do
anything at that time.

Though, your comment does illustrate the need to document that on
ampeeknexttuple, for future users.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread John Lumby


 Date: Thu, 29 May 2014 18:00:28 -0300
 Subject: Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal 
 and patch
 From: klaussfre...@gmail.com
 To: hlinnakan...@vmware.com
 CC: johnlu...@hotmail.com; pgsql-hackers@postgresql.org
 
 On Thu, May 29, 2014 at 5:39 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
  On 05/29/2014 11:34 PM, Claudio Freire wrote:
 
  On Thu, May 29, 2014 at 5:23 PM, Heikki Linnakangas
  hlinnakan...@vmware.com wrote:
 
  On 05/29/2014 04:12 PM, John Lumby wrote:
 
 
  On 05/28/2014 11:52 PM, John Lumby wrote:
 
  The patch seems to assume that you can put the aiocb struct in shared
  memory, initiate an asynchronous I/O request from one process, and wait
  for its completion from another process. I'm pretty surprised if that
  works on any platform.
 
 
  It works on linux.Actually this ability allows the asyncio
  implementation to
  reduce complexity in one respect (yes I know it looks complex enough) :
  it makes waiting for completion of an in-progress IO simpler than for
  the existing synchronous IO case,.   since librt takes care of the
  waiting.
  specifically,   no need for extra wait-for-io control blocks
  such as in bufmgr's  WaitIO()
 
 
  [checks]. No, it doesn't work. See attached test program.

Thanks for checkingand thanks for coming up with that test program.
However,  yes,  it really does work  --  always  (on linux).
Your test program is doing things in the wrong order -
it calls aio_suspend *before* aio_error.
However,  the rule is,  call aio_suspend *after* aio_error
and *only* if aio_error returns EINPROGRESS.

See the code changes to fd.c function FileCompleteaio()
to see how we have done it.   And I am attaching corrected version
of your test program which runs just fine.


 
  It kinda seems to work sometimes, because of the way it's implemented in
  glibc. The aiocb struct has a field for the result value and errno, and
  when
  the I/O is finished, the worker thread fills them in. aio_error() and
  aio_return() just return the values of those fields, so calling
  aio_error()
  or aio_return() do in fact happen to work from a different process.
  aio_suspend(), however, is implemented by sleeping on a process-local
  mutex,
  which does not work from a different process.
 
  Even if it worked on Linux today, it would be a bad idea to rely on it
  from
  a portability point of view. No, the only sane way to make this work is
  that
  the process that initiates an I/O request is responsible for completing
  it.
  If another process needs to wait for an async I/O to complete, we must
  use
  some other means to do the waiting. Like the io_in_progress_lock that we
  already have, for the same purpose.
 
 
  But calls to it are timeouted by 10us, effectively turning the thing
  into polling mode.
 
 
  We don't want polling... And even if we did, calling aio_suspend() in a way
  that's known to be broken, in a loop, is a pretty crappy way of polling.

Well,  as mentioned earlier,  it is not broken. Whether it is efficient I 
am not sure.
I have looked at the mutex in aio_suspend that you mentioned and I am not
quite convinced that,  if caller is not the original aio_read process,
it renders the suspend() into an instant timeout.  I will see if I can 
verify that.
Where are you (Claudio) seeing 10us?

 
 
 Didn't fix that, but the attached patch does fix regression tests when
 scanning over index types other than btree (was invoking elog when the
 index am didn't have ampeeknexttuple)
  /*
 * Test program to test if POSIX aio functions work across processes
 */

#include unistd.h
#include stdio.h
#include stdlib.h
#include string.h
#include sys/mman.h
#include sys/types.h
#include sys/stat.h
#include fcntl.h
#include aio.h
#include errno.h

char *shmem;

void
processA(void)
{
int fd;
struct aiocb *aiocbp = (struct aiocb *) shmem;
char *buf = shmem + sizeof(struct aiocb);

fd = open(aio-shmem-test-file, O_CREAT | O_WRONLY | O_SYNC, S_IRWXU);
if (fd == -1)
{
fprintf(stderr, open() failed\n);
exit(1);
}
printf(processA starting AIO\n);

strcpy(buf, foobar);

memset(aiocbp, 0, sizeof(struct aiocb));
aiocbp-aio_fildes = fd;
aiocbp-aio_offset = 0;
aiocbp-aio_buf = buf;
aiocbp-aio_nbytes = strlen(buf);
aiocbp-aio_reqprio = 0;
aiocbp-aio_sigevent.sigev_notify = SIGEV_NONE;

if (aio_write(aiocbp) != 0)
{
fprintf(stderr, aio_write() failed\n);
exit(1);
}
}

void
processB(void)
{
struct aiocb *aiocbp = (struct aiocb *) shmem;
const struct aiocb * const pl[1] = { aiocbp };
int rv;
int returnCode;
struct timespec my_timeout = { 0 , 1 };
int max_polls;

printf(waiting for the write to finish in process B\n);


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Tom Lane
Claudio Freire klaussfre...@gmail.com writes:
 On Thu, May 29, 2014 at 6:43 PM, Claudio Freire klaussfre...@gmail.com 
 wrote:
 On Thu, May 29, 2014 at 6:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 ampeeknexttuple?  That's a bit scary.  It would certainly be unsafe
 for non-MVCC snapshots (read about vacuum vs indexscan interlocks in
 nbtree/README).

 It's not really the tuple, just the tid

 And, furthermore, it's used only to do prefetching, so even if the tid
 was invalid when the tuple needs to be accessed, it wouldn't matter,
 because the indexam wouldn't use the result of ampeeknexttuple to do
 anything at that time.

Nonetheless, getting the next tid out of the index may involve stepping
to the next index page, at which point you've lost your interlock
guaranteeing that the *previous* tid will still mean something by the time
you arrive at its heap page.  I presume that the ampeeknexttuple call is
issued before trying to visit the heap (otherwise you're not actually
getting much I/O overlap), so I think there's a real risk here.

Having said that, it's probably OK as long as this mode is only invoked
for user queries (with MVCC snapshots) and not for system indexscans.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread John Lumby


 From: t...@sss.pgh.pa.us
 To: klaussfre...@gmail.com
 CC: hlinnakan...@vmware.com; johnlu...@hotmail.com; 
 pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal 
 and patch
 Date: Thu, 29 May 2014 17:56:57 -0400
 
 Claudio Freire klaussfre...@gmail.com writes:
  On Thu, May 29, 2014 at 6:43 PM, Claudio Freire klaussfre...@gmail.com 
  wrote:
  On Thu, May 29, 2014 at 6:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  ampeeknexttuple?  That's a bit scary.  It would certainly be unsafe
  for non-MVCC snapshots (read about vacuum vs indexscan interlocks in
  nbtree/README).
 
  It's not really the tuple, just the tid
 
  And, furthermore, it's used only to do prefetching, so even if the tid
  was invalid when the tuple needs to be accessed, it wouldn't matter,
  because the indexam wouldn't use the result of ampeeknexttuple to do
  anything at that time.
 
 Nonetheless, getting the next tid out of the index may involve stepping
 to the next index page, at which point you've lost your interlock

I think we are ok as peeknexttuple (yes  bad name,  sorry,  can change it ...

never advances to another page  :



 *btpeeknexttuple() -- peek at the next tuple different from any blocknum 
in pfch_list

 *   without reading a new index page

 *   and without causing any side-effects such as altering 
values in control blocks

 *   if found, store blocknum in next element of pfch_list




 guaranteeing that the *previous* tid will still mean something by the time
 you arrive at its heap page.  I presume that the ampeeknexttuple call is
 issued before trying to visit the heap (otherwise you're not actually
 getting much I/O overlap), so I think there's a real risk here.
 
 Having said that, it's probably OK as long as this mode is only invoked
 for user queries (with MVCC snapshots) and not for system indexscans.
 
   regards, tom lane
  

Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Claudio Freire
On Thu, May 29, 2014 at 6:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Claudio Freire klaussfre...@gmail.com writes:
 On Thu, May 29, 2014 at 6:43 PM, Claudio Freire klaussfre...@gmail.com 
 wrote:
 On Thu, May 29, 2014 at 6:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 ampeeknexttuple?  That's a bit scary.  It would certainly be unsafe
 for non-MVCC snapshots (read about vacuum vs indexscan interlocks in
 nbtree/README).

 It's not really the tuple, just the tid

 And, furthermore, it's used only to do prefetching, so even if the tid
 was invalid when the tuple needs to be accessed, it wouldn't matter,
 because the indexam wouldn't use the result of ampeeknexttuple to do
 anything at that time.

 Nonetheless, getting the next tid out of the index may involve stepping
 to the next index page, at which point you've lost your interlock
 guaranteeing that the *previous* tid will still mean something by the time

No, no... that's exactly why a new regproc is needed, because for
prefetching, we need to get the next tid that satisfies some
conditions *without* walking the index.

This, in nbtree, only looks through the tid array to find the suitable
tid, or just return false if the array is exhausted.

 Having said that, it's probably OK as long as this mode is only invoked
 for user queries (with MVCC snapshots) and not for system indexscans.

I think system index scans will also invoke this. There's no rule
excluding that possibility.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Claudio Freire
On Thu, May 29, 2014 at 7:11 PM, John Lumby johnlu...@hotmail.com wrote:
 Nonetheless, getting the next tid out of the index may involve stepping
 to the next index page, at which point you've lost your interlock

 I think we are ok as peeknexttuple (yes  bad name,  sorry,  can change it
 ...
 never advances to another page  :

  *btpeeknexttuple() -- peek at the next tuple different from any
 blocknum in pfch_list
  *   without reading a new index page
  *   and without causing any side-effects such as
 altering values in control blocks
  *   if found, store blocknum in next element of pfch_list


Yeah, I was getting to that conclusion myself too.

We could call it amprefetchnextheap, since it does just prefetch, and
is good for nothing *but* prefetch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Andres Freund
Hi,

On 2014-05-29 17:53:51 -0400, John Lumby wrote:
 to see how we have done it.   And I am attaching corrected version
 of your test program which runs just fine.

It's perfectly fine to not be up to the coding style at this point, but
trying to adhere to it to some degree will make code review later less
painfull...
* comments with **
* line length
* tabs vs spaces
* ...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread John Lumby


 Date: Thu, 29 May 2014 18:00:28 -0300
 Subject: Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal 
 and patch
 From: klaussfre...@gmail.com
 To: hlinnakan...@vmware.com
 CC: johnlu...@hotmail.com; pgsql-hackers@postgresql.org
 

 
  Even if it worked on Linux today, it would be a bad idea to rely on it
  from
  a portability point of view. No, the only sane way to make this work is
  that
  the process that initiates an I/O request is responsible for completing
  it.


I meant to add  -  it is really a significant benefit that a bkend
can wait on the aio of a different bkend's original prefeetching aio_read.
Remember that we check completion only when the bkend decides it really
wants the block in a buffer,i.e ReadBuffer and friends,
which might be a very long time after it had issued the prefetch request,
or even never (see below).We don't want other bkends which want that
block to have to wait for the originator to get around to reading it.
*Especially* since the originator may *never* read it if it quits its scan early
leaving prefetched but unread blocks behind.   (Which is also taken
care of in the patch).


  If another process needs to wait for an async I/O to complete, we must
  use
  some other means to do the waiting. Like the io_in_progress_lock that we
  already have, for the same purpose.


  

Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Claudio Freire
On Thu, May 29, 2014 at 6:53 PM, John Lumby johnlu...@hotmail.com wrote:
 Well,  as mentioned earlier,  it is not broken. Whether it is efficient
 I am not sure.
 I have looked at the mutex in aio_suspend that you mentioned and I am not
 quite convinced that,  if caller is not the original aio_read process,
 it renders the suspend() into an instant timeout.  I will see if I can
 verify that.
 Where are you (Claudio) seeing 10us?


fd.c, in FileCompleteaio, sets timeout to:

my_timeout.tv_sec = 0; my_timeout.tv_nsec = 1;

Which is 10k ns, which is 10 us.

It loops 256 times at most, so it's polling 256 times with a 10 us
timeout. Sounds wasteful.

I'd:

1) If it's the same process, wait for the full timeout (no looping).
If you have to loop (EAGAIN or EINTR - which I just noticed you don't
check for), that's ok.

2) If it's not, just fall through, don't wait, issue the I/O. The
kernel will merge the requests.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Claudio Freire
On Thu, May 29, 2014 at 7:26 PM, Claudio Freire klaussfre...@gmail.com wrote:
 1) If it's the same process, wait for the full timeout (no looping).
 If you have to loop (EAGAIN or EINTR - which I just noticed you don't
 check for), that's ok.

Sorry, meant to say just looping on EINTR.

About the style guidelines, no, I just copy the style of surrounding
code usually.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd uuid-ossp behavior on smew and shearwater

2014-05-29 Thread Andrew Dunstan


On 05/29/2014 05:41 PM, Josh Kupershmidt wrote:

On Thu, May 29, 2014 at 4:06 PM, Andrew Dunstan and...@dunslane.net wrote:

Almost all my critters run in VMs (all but jacana and bowerbird).

They're not running on OpenVZ, are they? `ifconfig` on shearwater says:


[...]

and it seems this all-zeros MAC address is a common
(mis-?)configuration on OpenVZ:

https://github.com/robertdavidgraham/masscan/issues/43
http://stackoverflow.com/questions/5838225/how-do-i-get-a-default-gridgain-node-in-openvz-discover-other-nodes-on-the-same
http://forum.openvz.org/index.php?t=msggoto=8117





Yeah, that's a bit ugly. Mine are on qemu, one (sitella) is on Xen.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Logical slots not mentioned in CREATE_REPLICATION_SLOT for replication protocol docs

2014-05-29 Thread Michael Paquier
On Fri, May 30, 2014 at 5:31 AM, Robert Haas robertmh...@gmail.com wrote:
 Thanks, this looks good.  But shouldn't the bit about output plugin
 options mention say something like:

 ( option_name option_argument [, ...] )

 ...instead of just:

 ( option [, ...] )
 ?
Yes, true. Here is an updated patch.
-- 
Michael
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 3a2421b..43861d0 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1428,10 +1428,10 @@ The commands accepted in walsender mode are:
   /varlistentry
 
   varlistentry
-termliteralCREATE_REPLICATION_SLOT/literal replaceable class=parameterslotname/ literalPHYSICAL/literalindextermprimaryCREATE_REPLICATION_SLOT/primary/indexterm/term
+termliteralCREATE_REPLICATION_SLOT/literal replaceable class=parameterslotname/ { literalPHYSICAL/ | literalLOGICAL/ replaceable class=parameteroutput_plugin/ } indextermprimaryCREATE_REPLICATION_SLOT/primary/indexterm/term
 listitem
  para
-  Create a physical replication
+  Create a physical or logical replication
   slot. See xref linkend=streaming-replication-slots for more about
   replication slots.
  /para
@@ -1445,6 +1445,16 @@ The commands accepted in walsender mode are:
  /para
/listitem
   /varlistentry
+
+  varlistentry
+   termreplaceable class=parameteroutput_plugin//term
+   listitem
+ para
+  The name of the output plugin used for logical decoding
+  (see xref linkend=logicaldecoding-output-plugin).
+ /para
+   /listitem
+  /varlistentry
  /variablelist
 /listitem
   /varlistentry
@@ -1778,7 +1788,7 @@ The commands accepted in walsender mode are:
 /listitem
   /varlistentry
   varlistentry
-termliteralSTART_REPLICATION/literal literalSLOT/literal replaceable class=parameterslotname/ literalLOGICAL/literal replaceable class=parameterXXX/XXX//term
+termliteralSTART_REPLICATION/literal literalSLOT/literal replaceable class=parameterslotname/ literalLOGICAL/literal replaceable class=parameterXXX/XXX/ [ ( replaceableoption_name/replaceable replaceablevalue/replaceable [, ... ] ) ]/term
 listitem
  para
   Instructs server to start streaming WAL for logical replication, starting
@@ -1811,6 +1821,22 @@ The commands accepted in walsender mode are:
 /para
/listitem
   /varlistentry
+  varlistentry
+   termreplaceable class=parameteroption_name//term
+   listitem
+para
+ Custom option name for logical decoding plugin.
+/para
+   /listitem
+  /varlistentry
+  varlistentry
+   termreplaceable class=parametervalue//term
+   listitem
+para
+ Value associated with given option for logical decoding plugin.
+/para
+   /listitem
+  /varlistentry
  /variablelist
 /listitem
   /varlistentry

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] json/jsonb and unicode escapes

2014-05-29 Thread Andrew Dunstan


Here is a draft patch for some of the issues to do with unicode escapes 
that Teodor raised the other day.


I think it does the right thing, although I want to add a few more 
regression cases before committing it.


Comments welcome.

cheers

andrew
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index a7364f3..47ab9be 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -2274,7 +2274,27 @@ escape_json(StringInfo buf, const char *str)
 appendStringInfoString(buf, \\\);
 break;
 			case '\\':
-appendStringInfoString(buf, );
+/*
+ * Unicode escapes are passed through as is. There is no
+ * requirement that they denote a valid character in the
+ * server encoding - indeed that is a big part of their 
+ * usefulness.
+ *
+ * All we require is that they consist of \u where 
+ * the Xs are hexadecimal digits. It is the responsibility
+ * of the caller of, say, to_json() to make sure that the
+ * unicode escape is valid. 
+ *
+ * In the case of a jsonb string value beng escaped, the
+ * only unicode escape that should be present is \u,
+ * all the other unicode escapes will have been resolved.
+ *
+ */
+if (p[1] == 'u'  isxdigit(p[2])  isxdigit(p[3])
+	 isxdigit(p[4])  isxdigit(p[5]))
+	appendStringInfoCharMacro(buf, *p);
+else
+	appendStringInfoString(buf, );
 break;
 			default:
 if ((unsigned char) *p  ' ')
diff --git a/src/test/regress/expected/jsonb.out b/src/test/regress/expected/jsonb.out
index ae7c506..1e46939 100644
--- a/src/test/regress/expected/jsonb.out
+++ b/src/test/regress/expected/jsonb.out
@@ -61,9 +61,9 @@ LINE 1: SELECT '\u000g'::jsonb;
 DETAIL:  \u must be followed by four hexadecimal digits.
 CONTEXT:  JSON data, line 1: \u000g...
 SELECT '\u'::jsonb;		-- OK, legal escape
-   jsonb   

- \\u
+  jsonb   
+--
+ \u
 (1 row)
 
 -- use octet_length here so we don't get an odd unicode char in the
diff --git a/src/test/regress/expected/jsonb_1.out b/src/test/regress/expected/jsonb_1.out
index 38a95b4..955dc42 100644
--- a/src/test/regress/expected/jsonb_1.out
+++ b/src/test/regress/expected/jsonb_1.out
@@ -61,9 +61,9 @@ LINE 1: SELECT '\u000g'::jsonb;
 DETAIL:  \u must be followed by four hexadecimal digits.
 CONTEXT:  JSON data, line 1: \u000g...
 SELECT '\u'::jsonb;		-- OK, legal escape
-   jsonb   

- \\u
+  jsonb   
+--
+ \u
 (1 row)
 
 -- use octet_length here so we don't get an odd unicode char in the

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery testing for beta

2014-05-29 Thread Amit Kapila
On Thu, May 29, 2014 at 10:09 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 What features in 9.4 need more beta testing for recovery?

Another feature which have interaction with recovery is reduced WAL
for Update operation:
http://www.postgresql.org/message-id/e1wnqjm-0003cz...@gemulon.postgresql.org

Commit: a3115f

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] fix worker_spi to run as non-dynamic background worker

2014-05-29 Thread Robert Haas
On Tue, May 27, 2014 at 12:11 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 The feature itself is already mentioned in the release notes in a very
 general manner, by telling that bgworkers can be dynamically
 registered and started. Users can still refer to the documentation
 to see more precisely what kind of infrastructure is in place in 9.4.

 So... As I'm on it... What about something like the attached patch?

Hearing no comment from Bruce, committed.  Bruce, feel free to modify
or revert if you don't think this is appropriate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spreading full-page writes

2014-05-29 Thread Robert Haas
On Tue, May 27, 2014 at 8:15 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Since you will be flushing the buffers one redo partition at a time, you
 would want to allow the OS to do merge the writes within a partition as much
 as possible. So my even-odd split would in fact be pretty bad. Some sort of
 striping, e.g. mapping each contiguous 1 MB chunk to the same partition,
 would be better.

I suspect you'd actually want to stripe by segment (1GB partition).
If you striped by 1MB partitions, there might still be writes to the
parts of the file you weren't checkpointing that would be flushed by
the fsync().  That would lead to more physical I/O overall, if those
pages were written again before you did the next half-checkpoint.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw and connection management

2014-05-29 Thread Robert Haas
On Mon, May 26, 2014 at 10:47 PM, Shigeru Hanada
shigeru.han...@gmail.com wrote:
 2014-05-24 0:09 GMT+09:00 Sandro Santilli s...@keybit.net:
 Indeed I tried DISCARD ALL in hope it would have helped, so I find
 good your idea of allowing extensions to register an hook there.

 Still, I'd like the FDW handler itself to possibly be configured
 to disable the pool completely as a server-specific configuration.

 Connection management seems FDW-specific feature to me.  How about to
 add FDW option, say pool_connection=true|false, to postgres_fdw which
 allows per-server configuration?

Right... or you could have an option to close the connection at
end-of-statement, end-of-transaction, or end-of-session.  But quite
apart from that, it seems like there ought to be a way to tell an FDW
to flush its state.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_sleep() doesn't work well with recovery conflict interrupts.

2014-05-29 Thread Amit Kapila
On Wed, May 28, 2014 at 8:53 PM, Andres Freund and...@2ndquadrant.com
wrote:
 Hi,

 Since a64ca63e59c11d8fe6db24eee3d82b61db7c2c83 pg_sleep() uses
 WaitLatch() to wait. That's fine in itself. But
 procsignal_sigusr1_handler, which is used e.g. when resolving recovery
 conflicts, doesn't unconditionally do a SetLatch().
 That means that we'll we'll currently not be able to cancel conflicting
 backends during recovery for 10min. Now, I don't think that'll happen
 too often in practice, but it's still annoying.

How will such a situation occur, aren't we using pg_usleep during
RecoveryConflict functions
(ex. in ResolveRecoveryConflictWithVirtualXIDs)?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com