Re: [HACKERS] BRIN index and aborted transaction

2015-07-19 Thread Alvaro Herrera
Alvaro Herrera wrote:
 Tatsuo Ishii wrote:
 
  When a transaction aborts, it seems a BRIN index leaves summary data
  which is not valid any more. Is this an expected behavior?  I guess
  the answer is yes, because it does not affect correctness of a query
  result, but I would like to make sure.
 
 You're right, that is not rolled back (just like any other index type,
 actually).

Let me clarify this a bit.  Summarization normally takes place during
vacuum (or upon the brin_summarize_new_ranges() function being called on
the table).  If the INSERT adds tuples to a page in a range that has
already been summarized, then the summary tuple for that page range will
be updated to cover the to-be-aborted tuples.  On the other hand, if the
INSERT adds tuples to a page that is not summarized, there is no summary
tuple to update; and the subsequent vacuum will remove those tuples
before summarizing the range, so they will not appear in the summary
tuple.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2015-07-19 Thread Andres Freund
Hi,

I, every now and then, spent a bit of time making this more efficient
over the last few weeks.

I had a bit of a problem to reproduce the problems I'd seen in
production on physical hardware (found EC2 to be to variable to
benchmark this), but luckily 2ndQuadrant today allowed me access to
their four socket machine[1] of the AXLE project.  Thanks Simon and
Tomas!

First, some mostly juicy numbers:

My benchmark was a parallel COPY into a single wal logged target
table:
CREATE TABLE data(data text);
The source data has been generated with
narrow:
COPY (select g.i::text FROM generate_series(1, 1) g(i)) TO 
'/tmp/copybinary' WITH BINARY;
wide:
COPY (select repeat(random()::text, 10) FROM generate_series(1, 1) g(i)) TO 
'/tmp/copybinarywide' WITH BINARY;

Between every test I ran a TRUNCATE data; CHECKPOINT;

For each number of clients I ran pgbench for 70 seconds. I'd previously
determined using -P 1 that the numbers are fairly stable. Longer runs
would have been nice, but then I'd not have finished in time.

shared_buffers = 48GB, narrow table contents:
client tps after:  tps before:
1  180.255577  210.125143
2  338.231058  391.875088
4  638.814300  405.243901
8  1126.852233 370.922271
16 1242.363623 498.487008
32 1229.648854 484.477042
48 1223.288397 468.127943
64 1198.007422 438.238119
96 1201.501278 370.556354
1281198.554929 288.213032
1961189.603398 193.841993
2561144.082291 191.293781
512643.323675  200.782105

shared_buffers = 1GB, narrow table contents:
client tps after:  tps before:
1  191.137410  210.787214
2  351.293017  384.086634
4  649.800991  420.703149
8  1103.770749 355.947915
16 1287.192256 489.050768
32 1226.329585 464.936427
48 1187.266489 443.386440
64 1182.698974 402.251258
96 1208.315983 331.290851
1281183.469635 269.250601
1961202.847382 202.788617
2561177.924515 190.876852
512572.457773  192.413191

1
shared_buffers = 48GB, wide table contents:
client tps after:  tps before:
1  59.685215   68.445331
2  102.034688  103.210277
4  179.434065  78.982315
8  222.613727  76.195353
16 232.162484  77.520265
32 231.979136  71.654421
48 231.981216  64.730114
64 230.955979  57.444215
96 228.016910  56.324725
128227.693947  45.701038
196227.410386  37.138537
256224.626948  35.265530
512105.356439  34.397636

shared_buffers = 1GB, wide table contents:
(ran out of patience)

Note that the peak performance with the patch is significantly better,
but there's currently a noticeable regression in single threaded
performance. That undoubtedly needs to be addressed.


So, to get to the actual meat: My goal was to essentially get rid of an
exclusive lock over relation extension alltogether. I think I found a
way to do that that addresses the concerns made in this thread.

Thew new algorithm basically is:
1) Acquire victim buffer, clean it, and mark it as pinned
2) Get the current size of the relation, save buffer into blockno
3) Try to insert an entry into the buffer table for blockno
4) If the page is already in the buffer table, increment blockno by 1,
   goto 3)
5) Try to read the page. In most cases it'll not yet exist. But the page
   might concurrently have been written by another backend and removed
   from shared buffers already. If already existing, goto 1)
6) Zero out the page on disk.

I think this does handle the concurrency issues.

This patch very clearly is in the POC stage. But I do think the approach
is generally sound.  I'd like to see some comments before deciding
whether to carry on.


Greetings,

Andres Freund

PS: Yes, I know that precision in the benchmark isn't warranted, but I'm
too lazy to truncate them.

[1]
[10:28:11 PM] Tomas Vondra: 4x Intel Xeon E5­4620 Eight Core 2.2GHz
Processor’s generation Sandy Bridge EP
each core handles 2 threads, so 16 threads total
256GB (16x16GB) ECC REG System Validated Memory (1333 MHz)
2x 250GB SATA 2.5” Enterprise Level HDs (RAID 1, ~250GB)
17x 600GB SATA 2.5” Solid State HDs (RAID 0, ~10TB)
LSI MegaRAID 9271­8iCC controller and Cache Vault Kit (1GB cache)
2 x Nvidia Tesla K20 Active GPU Cards (GK110GL)

From fc095897a6f4207d384559a095f80a36cf49648c Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Sun, 29 Mar 2015 20:55:32 +0200
Subject: [PATCH] WIP: Saner heap extension.

---
 src/backend/access/heap/hio.c   |  86 
 src/backend/commands/vacuumlazy.c   |  39 ++--
 src/backend/storage/buffer/bufmgr.c | 377 ++--
 src/backend/storage/smgr/md.c   |  62 ++
 

Re: [HACKERS] Relation extension scalability

2015-07-19 Thread Andres Freund
Hi,

Eeek, the attached patch included a trivial last minute screwup
(dereferencing bistate unconditionally...). Fixed version attached.

Andres
From fc095897a6f4207d384559a095f80a36cf49648c Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Sun, 29 Mar 2015 20:55:32 +0200
Subject: [PATCH] WIP: Saner heap extension.

---
 src/backend/access/heap/hio.c   |  86 
 src/backend/commands/vacuumlazy.c   |  39 ++--
 src/backend/storage/buffer/bufmgr.c | 377 ++--
 src/backend/storage/smgr/md.c   |  62 ++
 src/backend/storage/smgr/smgr.c |  20 +-
 src/include/storage/buf_internals.h |   1 +
 src/include/storage/bufmgr.h|   1 +
 src/include/storage/smgr.h  |   7 +-
 8 files changed, 417 insertions(+), 176 deletions(-)

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 6db73bf..b47f9fe 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -15,6 +15,8 @@
 
 #include postgres.h
 
+#include miscadmin.h
+
 #include access/heapam.h
 #include access/hio.h
 #include access/htup_details.h
@@ -237,7 +239,6 @@ RelationGetBufferForTuple(Relation relation, Size len,
 saveFreeSpace;
 	BlockNumber targetBlock,
 otherBlock;
-	bool		needLock;
 
 	len = MAXALIGN(len);		/* be conservative */
 
@@ -433,63 +434,50 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	/*
 	 * Have to extend the relation.
 	 *
-	 * We have to use a lock to ensure no one else is extending the rel at the
-	 * same time, else we will both try to initialize the same new page.  We
-	 * can skip locking for new or temp relations, however, since no one else
-	 * could be accessing them.
+	 * To avoid, as it used to be the case, holding the extension lock during
+	 * victim buffer search for the new buffer, we extend the relation here
+	 * instead of relying on bufmgr.c. We still have to hold the extension
+	 * lock to prevent a race between two backends initializing the same page.
 	 */
-	needLock = !RELATION_IS_LOCAL(relation);
-
-	if (needLock)
-		LockRelationForExtension(relation, ExclusiveLock);
+	while(true)
+	{
+		buffer = ExtendRelation(relation, MAIN_FORKNUM, bistate-strategy);
 
-	/*
-	 * XXX This does an lseek - rather expensive - but at the moment it is the
-	 * only way to accurately determine how many blocks are in a relation.  Is
-	 * it worth keeping an accurate file length in shared memory someplace,
-	 * rather than relying on the kernel to do it for us?
-	 */
-	buffer = ReadBufferBI(relation, P_NEW, bistate);
+		if (otherBuffer != InvalidBuffer)
+			LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
 
-	/*
-	 * We can be certain that locking the otherBuffer first is OK, since it
-	 * must have a lower page number.
-	 */
-	if (otherBuffer != InvalidBuffer)
-		LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
+		/*
+		 * Now acquire lock on the new page.
+		 */
+		LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 
-	/*
-	 * Now acquire lock on the new page.
-	 */
-	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+		page = BufferGetPage(buffer);
 
-	/*
-	 * Release the file-extension lock; it's now OK for someone else to extend
-	 * the relation some more.  Note that we cannot release this lock before
-	 * we have buffer lock on the new page, or we risk a race condition
-	 * against vacuumlazy.c --- see comments therein.
-	 */
-	if (needLock)
-		UnlockRelationForExtension(relation, ExclusiveLock);
+		/*
+		 * While unlikely, it's possible that another backend managed to
+		 * initialize the page and use up the free space till we got the
+		 * exclusive lock. That'd require the page to be vacuumed (to be put
+		 * on the free space list) and then be used; possible but fairly
+		 * unlikely in practice. If it happens and there's not enough space,
+		 * just retry.
+		 */
+		if (PageIsNew(page))
+		{
+			PageInit(page, BLCKSZ, 0);
 
-	/*
-	 * We need to initialize the empty new page.  Double-check that it really
-	 * is empty (this should never happen, but if it does we don't want to
-	 * risk wiping out valid data).
-	 */
-	page = BufferGetPage(buffer);
+			Assert(len = PageGetHeapFreeSpace(page));
+			break;
+		}
+		else if (len = PageGetHeapFreeSpace(page))
+			break;
 
-	if (!PageIsNew(page))
-		elog(ERROR, page %u of relation \%s\ should be empty but is not,
-			 BufferGetBlockNumber(buffer),
-			 RelationGetRelationName(relation));
+		if (otherBuffer != InvalidBuffer)
+			LockBuffer(otherBuffer, BUFFER_LOCK_UNLOCK);
 
-	PageInit(page, BufferGetPageSize(buffer), 0);
+		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+		ReleaseBuffer(buffer);
 
-	if (len  PageGetHeapFreeSpace(page))
-	{
-		/* We should not get here given the test at the top */
-		elog(PANIC, tuple is too big: size %zu, len);
+		CHECK_FOR_INTERRUPTS();
 	}
 
 	/*
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index a01cfb4..896731c 100644
--- a/src/backend/commands/vacuumlazy.c
+++ 

Re: [HACKERS] pg_dump quietly ignore missing tables - is it bug?

2015-07-19 Thread Pavel Stehule
2015-07-19 20:54 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com:

 Hi

 I am sending updated version. It implements new long option
 --strict-names. If this option is used, then for any entered name
 (without any wildcard char) must be found least one object. This option has
 not impact on patters (has wildcards chars). When this option is not used,
 then behave is 100% same as before (with same numbers of SQL queries for -t
 option). It is based on Oleksandr's documentation (and lightly modified
 philosophy), and cleaned my previous patch. A test on wildchard existency
 strcspn(cell-val, ?*) cannot be used, because it doesn't calculate
 quotes (but a replace has few lines only).

 There is a possibility to remove a wildcard char test and require least
 one entry for patters too. But I am thinking, when somebody explicitly uses
 any wildcard, then he calculate with a possibility of empty result.


other variant is using --strict-names behave as default (and implement
negative option like --disable-strict-names or some similar).




 Regards

 Pavel






 2015-07-09 22:48 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com:



 2015-07-08 5:36 GMT+02:00 Fujii Masao masao.fu...@gmail.com:

 On Sat, May 23, 2015 at 1:41 AM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
 
 
  2015-05-22 18:34 GMT+02:00 Tom Lane t...@sss.pgh.pa.us:
 
  Oleksandr Shulgin oleksandr.shul...@zalando.de writes:
   I think this is a bit over-engineered (apart from the fact that
   processSQLNamePattern is also used in two dozen of places in
   psql/describe.c and all of them must be touched for this patch to
   compile).
 
   Also, the new --table-if-exists options seems to be doing what the
 old
   --table did, and I'm not really sure I underestand what --table does
   now.
 
  I'm pretty sure we had agreed *not* to change the default behavior of
 -t.
 
   I propose instead to add a separate new option --strict-include,
 without
   argument, that only controls the behavior when an include pattern
 didn't
   find any table (or schema).
 
  If we do it as a separate option, then it necessarily changes the
 behavior
  for *each* -t switch in the call.  Can anyone show a common use-case
 where
  that's no good, and you need separate behavior for each of several -t
  switches?  If not, I like the simplicity of this approach.  (Perhaps
 the
  switch name could use some bikeshedding, though.)
 
 
  it is near to one proposal
 
  implement only new long option --required-table

 There is no updated version of the patch. So I marked this patch
 as Waiting on Author.


 tomorrow I'll return to this topic.



 One very simple question is, doesn't -n option have very similar problem?
 Also what about -N, -T and --exclude-table-data? Not sure if we need to
 handle them in the similar way as you proposed.


 hard to say - I understand to your motivation, and it can signalize some
 inconsistency in environment, but it has not same negative impact as same
 problem with -t -n options.

 This is maybe place for warning message with possibility to disable this
 warning. But maybe it is premature optimization?

 Next way is introduction of strictness option - default can be
 equivalent of current, safe can check only tables required for dump, strict
 can check all.



 Isn't it sufficient to only emit the warning message to stderr if there
 is no table matching the pattern specified in -t?


 I prefer raising error in this case.

 1. I am thinking so missing tables for dump signalize important
 inconsistency in environment and it is important bug
 2. The warning is not simply detected (and processed) in bash scripts.

 Regards

 Pavel



 Regards,

 --
 Fujii Masao






Re: [HACKERS] Implementation of global temporary tables?

2015-07-19 Thread Josh Berkus
Pavel, All:

Just to be clear, the idea of a global temp table is that the table def
is available to all users, but the data is private to each session?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-19 Thread Josh Berkus
On 07/17/2015 04:36 PM, Jim Nasby wrote:
 On 7/16/15 12:40 PM, Robert Haas wrote:
 They may well be 2-3 times as long. Why is that a negative?
 In my opinion, brevity makes things easier to read and understand.  We
 also don't support multi-line GUCs, so if your configuration takes 140
 characters, you're going to have a very long line in your
 postgresql.conf (and in your pg_settings output, etc.)
 
 Brevity goes both ways, but I don't think that's the real problem here;
 it's the lack of multi-line support. The JSON that's been proposed makes
 you work really hard to track what level of nesting you're at, while
 every alternative format I've seen is terse enough to be very clear on a
 single line.

I will point out that the proposed non-JSON syntax does not offer any
ability to name consensus/priority groups.  I believe that being able to
name groups is vital to managing any complex synch rep, but if we add
names it will make the non-JSON syntax less compact.

 
 I'm guessing it'd be really ugly/hard to support at least this GUC being
 multi-line?

Yes.

Mind you, multi-line GUCs would be useful otherwise, but we don't want
to hinge this feature on making that work.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-19 Thread Simon Riggs
On 19 July 2015 at 21:16, Tom Lane t...@sss.pgh.pa.us wrote:

 Josh Berkus j...@agliodbs.com writes:
  On 07/17/2015 04:36 PM, Jim Nasby wrote:
  I'm guessing it'd be really ugly/hard to support at least this GUC being
  multi-line?

  Mind you, multi-line GUCs would be useful otherwise, but we don't want
  to hinge this feature on making that work.

 I'm pretty sure that changing the GUC parser to allow quoted strings to
 continue across lines would be trivial.


Agreed


 The problem with it is not that
 it's hard, it's that omitting a closing quote mark would then result in
 the entire file being syntactically broken, with the error message(s)
 almost certainly pointing somewhere else than where the actual mistake is.


That depends upon how we specify line-continuation. If we do it with
starting and ending quotes, then we would have the problem you suggest. If
we required each new continuation line to start with a \ then it wouldn't
(or similar). Or perhaps it gets its own file even, an idea raised before.

Do we really want such a global reduction in friendliness to make this
 feature easier?


Clearly not, but we must first decide whether that is how we characterise
the decision.

synchronous_standby_name= is already 25 characters, so that leaves 115
characters - are they always single byte chars?

It's not black and white for me that JSON necessarily requires 115 chars
whereas other ways never will do.

What we are discussing is expanding an existing parameter to include more
information. If Josh gets some of the things he's been asking for, then the
format will bloat further. It doesn't take much for me to believe it might
expand further still, so my view from the discussion is that we'll likely
need to expand beyond 115 chars one day whatever format we choose.

I'm personally ambivalent what the exact format is that we choose; I care
much more about the feature than the syntax, always. My contribution so far
was to summarise what I thought was the majority opinion, and to challenge
the thought that JSON had no discernible benefit. If the majority view is
different, I have no problem there.

Clusters of 20 or more standby nodes are reasonably common, so those limits
do seem a little small. Synchronous commit behavior is far from being the
only cluster metadata we need to record.  I'm thinking now that this
illustrates that this is the wrong way altogether and we should just be
storing cluster metadata in database tables, which is what was discussed
and agreed at the BDR meeting at PGCon.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-19 Thread Simon Riggs
On 19 July 2015 at 21:56, Tom Lane t...@sss.pgh.pa.us wrote:


 Since I'm not observing any movement


Apologies if nothing visible was occurring. Petr and I had arranged to meet
and discuss Mon/Tues.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Arguable RLS security bug, EvalPlanQual() paranoia

2015-07-19 Thread Peter Geoghegan
On Mon, Jun 1, 2015 at 12:29 AM, Peter Geoghegan p...@heroku.com wrote:
 If you're using another well known MVCC database system that has RLS,
 I imagine when this happens the attacker similarly waits on the
 conflicting (privileged) xact to finish (in my example in the patch,
 Bob's xact). However, unlike with the Postgres READ COMMITTED mode,
 Mallory would then have her malicious UPDATE statement entirely rolled
 back, and her statement would acquire an entirely new MVCC snapshot,
 to be used by the USING security barrier qual (and everything else)
 from scratch. This other system would then re-run her UPDATE with the
 new MVCC snapshot. This would repeat until Mallory's UPDATE statement
 completes without encountering any concurrent UPDATEs/DELETEs to her
 would-be affected rows.

 In general, with this other database system, an UPDATE must run to
 completion without violating MVCC, even in READ COMMITTED mode. For
 that reason, I think we can take no comfort from the presumption that
 this flexibility in USING security barrier quals (allowing subqueries,
 etc) works securely in this other system. (I actually didn't check
 this out, but I imagine it's true).

Where are we on this?

Discussion during pgCon with Heikki and Andres led me to believe that
the issue is acceptable. The issue can be documented to help ensure
that user expectation is in line with actual user-visible behavior.
Unfortunately, I think that that will be a clunky documentation patch.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory prefetching while sequentially fetching from SortTuple array, tuplestore

2015-07-19 Thread Peter Geoghegan
On Thu, Jul 16, 2015 at 8:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Meh.  I don't like the assumption that non-GCC compilers will be smart
 enough to optimize away the useless-to-them if() tests this adds.
 Please refactor that so that there is exactly 0 new code when the
 intrinsic doesn't exist.

I imagined that there was some value in copying the GCC intrinsic's
behavior, and actually evaluating the addr expression even in the
event of no platform support. On reflection, I suppose that that isn't
actually a particularly useful property for Postgres. There will only
ever be a handful of callers.

Attached revision does not rely on such optimization occurring on
platforms that lack __builtin_prefetch(). This allowed me to decouple
availability from actual use, in the style of posix_fadvise(), so that
one can manually disable memory prefetching within pg_config_manual.h.

Clang is compatibile with __builtin_prefetch() intrinsic, FWIW. I'm
not sure if it's worth trying to make the wrapper portable across a
variety of supported compilers. If we were to attempt it, we would not
be the first. I note that ICC's memref_control has an identical
interface to __builtin_prefetch().

-- 
Peter Geoghegan
From 357e3b65c4f1511746a529e41fe180f0c7668d70 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan peter.geoghega...@gmail.com
Date: Sun, 12 Jul 2015 13:14:01 -0700
Subject: [PATCH] Prefetch from memtuples array in tuplesort

Testing shows that prefetching the tuple proper of a slightly later
SortTuple in the memtuples array during each of many sequential,
in-logical-order SortTuple fetches speeds up various sorting intense
operations considerably.  For example, B-Tree index builds are
accelerated as leaf pages are created from the memtuples array.
(i.e.  The operation following actually performing the sort, but
before a tuplesort_end() call is made as a B-Tree spool is destroyed.)

Similarly, ordered set aggregates (all cases except the datumsort case
with a pass-by-value type), and regular heap tuplesorts benefit to about
the same degree.  The optimization is only used when sorts fit in
memory, though.

Also, prefetch a few places ahead within the analogous fetching point
in tuplestore.c.  This appears to offer similar benefits in certain
cases.  For example, queries involving large common table expressions
significantly benefit.
---
 config/c-compiler.m4| 17 +
 configure   | 30 ++
 configure.in|  1 +
 src/backend/utils/sort/tuplesort.c  | 20 
 src/backend/utils/sort/tuplestore.c | 13 +
 src/include/c.h | 14 ++
 src/include/pg_config.h.in  |  3 +++
 src/include/pg_config.h.win32   |  3 +++
 src/include/pg_config_manual.h  | 10 ++
 9 files changed, 111 insertions(+)

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 050bfa5..5776201 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -287,6 +287,23 @@ fi])# PGAC_C_BUILTIN_UNREACHABLE
 
 
 
+# PGAC_C_BUILTIN_PREFETCH
+# -
+# Check if the C compiler understands __builtin_prefetch(),
+# and define HAVE__BUILTIN_PREFETCH if so.
+AC_DEFUN([PGAC_C_BUILTIN_PREFETCH],
+[AC_CACHE_CHECK(for __builtin_prefetch, pgac_cv__builtin_prefetch,
+[AC_COMPILE_IFELSE([AC_LANG_PROGRAM([],
+[int i = 0;__builtin_prefetch(i, 0, 3);])],
+[pgac_cv__builtin_prefetch=yes],
+[pgac_cv__builtin_prefetch=no])])
+if test x$pgac_cv__builtin_prefetch = xyes ; then
+AC_DEFINE(HAVE__BUILTIN_PREFETCH, 1,
+  [Define to 1 if your compiler understands __builtin_prefetch.])
+fi])# PGAC_C_BUILTIN_PREFETCH
+
+
+
 # PGAC_C_VA_ARGS
 # --
 # Check if the C compiler understands C99-style variadic macros,
diff --git a/configure b/configure
index 2176d65..5fdd6bd 100755
--- a/configure
+++ b/configure
@@ -11283,6 +11283,36 @@ if test x$pgac_cv__builtin_unreachable = xyes ; then
 $as_echo #define HAVE__BUILTIN_UNREACHABLE 1 confdefs.h
 
 fi
+{ $as_echo $as_me:${as_lineno-$LINENO}: checking for __builtin_prefetch 5
+$as_echo_n checking for __builtin_prefetch...  6; }
+if ${pgac_cv__builtin_prefetch+:} false; then :
+  $as_echo_n (cached)  6
+else
+  cat confdefs.h - _ACEOF conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+int i = 0;__builtin_prefetch(i, 0, 3);
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile $LINENO; then :
+  pgac_cv__builtin_prefetch=yes
+else
+  pgac_cv__builtin_prefetch=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+{ $as_echo $as_me:${as_lineno-$LINENO}: result: $pgac_cv__builtin_prefetch 5
+$as_echo $pgac_cv__builtin_prefetch 6; }
+if test x$pgac_cv__builtin_prefetch = xyes ; then
+
+$as_echo #define HAVE__BUILTIN_PREFETCH 1 confdefs.h
+
+fi
 { $as_echo $as_me:${as_lineno-$LINENO}: checking for __VA_ARGS__ 5
 $as_echo_n checking for __VA_ARGS__...  6; }
 if ${pgac_cv__va_args+:} false; then :
diff 

Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

2015-07-19 Thread Peter Geoghegan
On Thu, Jul 2, 2015 at 2:58 PM, Peter Geoghegan p...@heroku.com wrote:
 As with row locking, with insertion, if there is a conflict, any
 outcome (UPDATE or INSERT) is then possible.

Where are we on this? It would be nice to get this out of the way
before a 9.5 beta is put out.

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump quietly ignore missing tables - is it bug?

2015-07-19 Thread Pavel Stehule
2015-07-19 21:08 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com:



 2015-07-19 20:54 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com:

 Hi

 I am sending updated version. It implements new long option
 --strict-names. If this option is used, then for any entered name
 (without any wildcard char) must be found least one object. This option has
 not impact on patters (has wildcards chars). When this option is not used,
 then behave is 100% same as before (with same numbers of SQL queries for -t
 option). It is based on Oleksandr's documentation (and lightly modified
 philosophy), and cleaned my previous patch. A test on wildchard existency
 strcspn(cell-val, ?*) cannot be used, because it doesn't calculate
 quotes (but a replace has few lines only).

 There is a possibility to remove a wildcard char test and require least
 one entry for patters too. But I am thinking, when somebody explicitly uses
 any wildcard, then he calculate with a possibility of empty result.


 other variant is using --strict-names behave as default (and implement
 negative option like --disable-strict-names or some similar).


Note: originally I though, we have to fix it and change the default behave.
But with special option, we don't need it. This option in help is signal
for user, so some is risky.

Pavel





 Regards

 Pavel






 2015-07-09 22:48 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com:



 2015-07-08 5:36 GMT+02:00 Fujii Masao masao.fu...@gmail.com:

 On Sat, May 23, 2015 at 1:41 AM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
 
 
  2015-05-22 18:34 GMT+02:00 Tom Lane t...@sss.pgh.pa.us:
 
  Oleksandr Shulgin oleksandr.shul...@zalando.de writes:
   I think this is a bit over-engineered (apart from the fact that
   processSQLNamePattern is also used in two dozen of places in
   psql/describe.c and all of them must be touched for this patch to
   compile).
 
   Also, the new --table-if-exists options seems to be doing what the
 old
   --table did, and I'm not really sure I underestand what --table
 does
   now.
 
  I'm pretty sure we had agreed *not* to change the default behavior
 of -t.
 
   I propose instead to add a separate new option --strict-include,
 without
   argument, that only controls the behavior when an include pattern
 didn't
   find any table (or schema).
 
  If we do it as a separate option, then it necessarily changes the
 behavior
  for *each* -t switch in the call.  Can anyone show a common use-case
 where
  that's no good, and you need separate behavior for each of several -t
  switches?  If not, I like the simplicity of this approach.  (Perhaps
 the
  switch name could use some bikeshedding, though.)
 
 
  it is near to one proposal
 
  implement only new long option --required-table

 There is no updated version of the patch. So I marked this patch
 as Waiting on Author.


 tomorrow I'll return to this topic.



 One very simple question is, doesn't -n option have very similar
 problem?
 Also what about -N, -T and --exclude-table-data? Not sure if we need to
 handle them in the similar way as you proposed.


 hard to say - I understand to your motivation, and it can signalize some
 inconsistency in environment, but it has not same negative impact as same
 problem with -t -n options.

 This is maybe place for warning message with possibility to disable this
 warning. But maybe it is premature optimization?

 Next way is introduction of strictness option - default can be
 equivalent of current, safe can check only tables required for dump, strict
 can check all.



 Isn't it sufficient to only emit the warning message to stderr if there
 is no table matching the pattern specified in -t?


 I prefer raising error in this case.

 1. I am thinking so missing tables for dump signalize important
 inconsistency in environment and it is important bug
 2. The warning is not simply detected (and processed) in bash scripts.

 Regards

 Pavel



 Regards,

 --
 Fujii Masao







Re: [HACKERS] Implementation of global temporary tables?

2015-07-19 Thread Pavel Stehule
2015-07-19 21:39 GMT+02:00 Josh Berkus j...@agliodbs.com:

 Pavel, All:

 Just to be clear, the idea of a global temp table is that the table def
 is available to all users, but the data is private to each session?


yes.

Pavel



 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com



Re: [HACKERS] pg_dump quietly ignore missing tables - is it bug?

2015-07-19 Thread Pavel Stehule
Hi

I am sending updated version. It implements new long option
--strict-names. If this option is used, then for any entered name
(without any wildcard char) must be found least one object. This option has
not impact on patters (has wildcards chars). When this option is not used,
then behave is 100% same as before (with same numbers of SQL queries for -t
option). It is based on Oleksandr's documentation (and lightly modified
philosophy), and cleaned my previous patch. A test on wildchard existency
strcspn(cell-val, ?*) cannot be used, because it doesn't calculate
quotes (but a replace has few lines only).

There is a possibility to remove a wildcard char test and require least one
entry for patters too. But I am thinking, when somebody explicitly uses any
wildcard, then he calculate with a possibility of empty result.

Regards

Pavel






2015-07-09 22:48 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com:



 2015-07-08 5:36 GMT+02:00 Fujii Masao masao.fu...@gmail.com:

 On Sat, May 23, 2015 at 1:41 AM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
 
 
  2015-05-22 18:34 GMT+02:00 Tom Lane t...@sss.pgh.pa.us:
 
  Oleksandr Shulgin oleksandr.shul...@zalando.de writes:
   I think this is a bit over-engineered (apart from the fact that
   processSQLNamePattern is also used in two dozen of places in
   psql/describe.c and all of them must be touched for this patch to
   compile).
 
   Also, the new --table-if-exists options seems to be doing what the
 old
   --table did, and I'm not really sure I underestand what --table does
   now.
 
  I'm pretty sure we had agreed *not* to change the default behavior of
 -t.
 
   I propose instead to add a separate new option --strict-include,
 without
   argument, that only controls the behavior when an include pattern
 didn't
   find any table (or schema).
 
  If we do it as a separate option, then it necessarily changes the
 behavior
  for *each* -t switch in the call.  Can anyone show a common use-case
 where
  that's no good, and you need separate behavior for each of several -t
  switches?  If not, I like the simplicity of this approach.  (Perhaps
 the
  switch name could use some bikeshedding, though.)
 
 
  it is near to one proposal
 
  implement only new long option --required-table

 There is no updated version of the patch. So I marked this patch
 as Waiting on Author.


 tomorrow I'll return to this topic.



 One very simple question is, doesn't -n option have very similar problem?
 Also what about -N, -T and --exclude-table-data? Not sure if we need to
 handle them in the similar way as you proposed.


 hard to say - I understand to your motivation, and it can signalize some
 inconsistency in environment, but it has not same negative impact as same
 problem with -t -n options.

 This is maybe place for warning message with possibility to disable this
 warning. But maybe it is premature optimization?

 Next way is introduction of strictness option - default can be
 equivalent of current, safe can check only tables required for dump, strict
 can check all.



 Isn't it sufficient to only emit the warning message to stderr if there
 is no table matching the pattern specified in -t?


 I prefer raising error in this case.

 1. I am thinking so missing tables for dump signalize important
 inconsistency in environment and it is important bug
 2. The warning is not simply detected (and processed) in bash scripts.

 Regards

 Pavel



 Regards,

 --
 Fujii Masao



commit 2f54f7ea786744540d9176cecc086cfbf32e7695
Author: Pavel Stehule pavel.steh...@gooddata.com
Date:   Sun Jul 19 20:30:32 2015 +0200

initial

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 7467e86..423757d 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -493,6 +493,22 @@ PostgreSQL documentation
  /varlistentry
 
  varlistentry
+  termoption--strict-names//term
+  listitem
+   para
+ Require that table and/or schema without wildcard chars match
+ at least one entity each. Without any entity in the database
+ to be dumped, an error message is printed and dump is aborted.
+   /para
+   para
+ This option has no effect on the exclude table and schema patterns
+ (and also option--exclude-table-data/): not matching any entities
+ isn't considered an error.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termoption-t replaceable class=parametertable/replaceable/option/term
   termoption--table=replaceable class=parametertable/replaceable/option/term
   listitem
diff --git a/src/bin/pg_dump/dumputils.c b/src/bin/pg_dump/dumputils.c
index d7506e1..598df0b 100644
--- a/src/bin/pg_dump/dumputils.c
+++ b/src/bin/pg_dump/dumputils.c
@@ -955,9 +955,9 @@ AddAcl(PQExpBuffer aclbuf, const char *keyword, const char *subname)
 
 
 /*
- * processSQLNamePattern
+ * processSQLName
  *
- * Scan a wildcard-pattern string and generate 

Re: [HACKERS] Bug in bttext_abbrev_convert()

2015-07-19 Thread Andres Freund
On 2015-07-19 12:35:46 -0400, Peter Eisentraut wrote:
 I would like to do that, but the current host has no more capacity and
 the hoster is complaining, so I need to look for a new hosting solution
 before I can expand what is being built.

Perhaps this could be moved into a virtual machine hosted by the
postgres project infrastructure?

What kind of resources do you currently have and what would you need?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-19 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 On 07/17/2015 04:36 PM, Jim Nasby wrote:
 I'm guessing it'd be really ugly/hard to support at least this GUC being
 multi-line?

 Mind you, multi-line GUCs would be useful otherwise, but we don't want
 to hinge this feature on making that work.

I'm pretty sure that changing the GUC parser to allow quoted strings to
continue across lines would be trivial.  The problem with it is not that
it's hard, it's that omitting a closing quote mark would then result in
the entire file being syntactically broken, with the error message(s)
almost certainly pointing somewhere else than where the actual mistake is.
Do we really want such a global reduction in friendliness to make this
feature easier?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-19 Thread Tom Lane
Since I'm not observing any movement on the key question of redesigning
the tablesample method API, and I think that's something that's absolutely
necessary to fix for 9.5, attached is an attempt to respecify the API.

I haven't actually written any code, but I've written a tsmapi.h file
modeled on fdwapi.h, and rewritten tablesample-method.sgml to match;
those two files are attached.  Some notes:

* This is predicated on the assumption that we'll get rid of the
pg_tablesample_method catalog and instead have a single FDW-style handler
function for each sample method.  That fixes the problem with the contrib
modules being broken in DROP/pg_upgrade cases.

* I got rid of the TableSampleDesc struct altogether in favor of giving
the execution functions access to the whole SampleScanState executor
state node.  If we add support for filtering at the join level, filtering
in indexscan nodes, etc, those would become separate sets of API functions
in my view; there is no need to pretend that this set of API functions
works for anything except the SampleScan case.  This way is more parallel
to the FDW precedent, too.  In particular it lets tablesample code get at
the executor's EState, which the old API did not, but which might be
necessary for some scenarios.

* You might have expected me to move the tsmseqscan and tsmpagemode flags
into the TsmRoutine struct, but instead this API puts equivalent flags
into the SampleScanState struct.  The reason for that is that it lets
the settings be determined at runtime after inspecting the TABLESAMPLE
parameters, which I think would be useful.  For example, whether to use
the bulkread strategy should probably depend on what the sampling
percentage is.

* I got rid of the ExamineTuple function altogether.  As I said before,
I think what that is mostly doing is encouraging people to do unsound
things.  But in any case, there is no need for it because NextSampleTuple
*can* look at the HeapScanDesc state if it really wants to: that lets it
identify visible tuples, or even inspect the tuple contents.  In the
attached, I documented the visible-tuples aspect but did not suggest
examining tuple contents.

* As written, this still allows TABLESAMPLE parameters to have null
values, but I'm pretty strongly tempted to get rid of that: remove
the paramisnull[] argument and make the core code reject null values.
I can't see any benefit in allowing null values that would justify the
extra code and risks-of-omission involved in making every tablesample
method check this for itself.

* I specified that omitting NextSampleBlock is allowed and causes the
core code to do a standard seqscan, including syncscan support, which
is a behavior that's impossible with the current API.  If we fix
the bernoulli code to have history-independent sampling behavior,
I see no reason that syncscan shouldn't be enabled for it.

Barring objections, I'll press forward with turning this into code
over the next few days.

regards, tom lane

/*-
 *
 * tsmapi.h
 *	  API for tablesample methods
 *
 * Copyright (c) 2015, PostgreSQL Global Development Group
 *
 * src/include/access/tsmapi.h
 *
 *-
 */
#ifndef TSMAPI_H
#define TSMAPI_H

#include nodes/execnodes.h
#include nodes/relation.h


/*
 * Callback function signatures --- see tablesample-method.sgml for more info.
 */

typedef void (*SampleScanCost_function) (PlannerInfo *root,
		 RelOptInfo *baserel,
		 List *paramexprs,
		 BlockNumber *pages,
		 double *tuples);

typedef void (*BeginSampleScan_function) (SampleScanState *node,
		  int eflags,
		  uint32 seed,
		  Datum *params,
		  bool *paramisnull,
		  int nparams);

typedef BlockNumber (*NextSampleBlock_function) (SampleScanState *node);

typedef OffsetNumber (*NextSampleTuple_function) (SampleScanState *node,
  BlockNumber blockno,
  OffsetNumber maxoffset);

typedef void (*ReScanSampleScan_function) (SampleScanState *node);

typedef void (*EndSampleScan_function) (SampleScanState *node);

/*
 * TsmRoutine is the struct returned by a tablesample method's handler
 * function.  It provides pointers to the callback functions needed by the
 * planner and executor, as well as additional information about the method.
 *
 * More function pointers are likely to be added in the future.
 * Therefore it's recommended that the handler initialize the struct with
 * makeNode(TsmRoutine) so that all fields are set to NULL.  This will
 * ensure that no fields are accidentally left undefined.
 */
typedef struct TsmRoutine
{
	NodeTag		type;

	/* List of datatype OIDs for the arguments of the TABLESAMPLE clause */
	List	   *parameterTypes;

	/* Can method produce repeatable samples across, or even within, queries? */
	bool		repeatable_across_queries;
	bool		

Re: [HACKERS] security labels on databases are bad for dump restore

2015-07-19 Thread Noah Misch
On Wed, Jul 15, 2015 at 11:08:53AM +0200, Andres Freund wrote:
 On 2015-07-15 12:04:40 +0300, Alvaro Herrera wrote:
  Andres Freund wrote:
   One thing worth mentioning is that arguably the problem is caused by the
   fact that we're here emitting database level information in pg_dump,
   normally only done for dumpall.

Consistency with existing practice would indeed have pg_dump ignore
pg_shseclabel and have pg_dumpall reproduce its entries.

  ... the reason for which is probably the lack of CURRENT_DATABASE as a
  database specifier.  It might make sense to add the rest of
  database-level information to pg_dump output, when we get that.
 
 I'm not sure. I mean, it's not that an odd idea to assign a label to a
 database and then restore data into it, and expect the explicitly
 assigned label to survive.  Not sure if there actually is a good
 behaviour either way here :/

In a greenfield, I would make pg_dump --create reproduce pertinent entries
from datacl, pg_db_role_setting, pg_shseclabel and pg_shdescription.  I would
make non-creating pg_dump reproduce none of those.  Moreover, I would enable
--create by default.  Restoring into a user-provided shell database is
specialized compared to reproducing a database from scratch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] xlc 12.1 miscompiles 32-bit ginCompareItemPointers()

2015-07-19 Thread Noah Misch
In a 32-bit build, xlc 12.1 for AIX miscompiles three inline expansions of
ginCompareItemPointers(), all in ginget.c (line numbers per commit 9aa6634):

739 Assert(!ItemPointerIsValid(entry-curItem) ||
740ginCompareItemPointers(entry-curItem, advancePast) = 0);

847 } while (ginCompareItemPointers(entry-curItem, advancePast) 
= 0);

915 if (ginCompareItemPointers(key-curItem, advancePast)  0)

For one of the arguments, instead of computing
  hi  32 | lo  16 | posid
it computes
  (lo  16)  32 | lo  16 | posid

PostgreSQL 9.4, which introduced the inline ginCompareItemPointers(), is the
oldest version affected.  The problem remained invisible until my recent
commit 43d89a2; the quiet inline configure test had been failing, leaving
PG_USE_INLINE unset.

I tried some workarounds.  Introducing intermediate variables or moving parts
of the calculation down into other inline functions did not help.  The code
compiles fine when not inlined.  Changing hi  32 to hi  33 worked (we
need just 48 of the 64 bits), but it presumably reduces performance on ABIs
where the current bit shifts boil down to a no-op.

I propose to expand the gin_private.h #ifdef PG_USE_INLINE test to exclude
xlc 32-bit configurations.  The last 32-bit AIX kernel exited support on
2012-04-30.  There's little reason to prefer 32-bit PostgreSQL under a 64-bit
kernel, so that configuration can do without the latest GIN optimization.  One
alternative would be to distill a configure-time test detecting the bug, so
fixed xlc versions reacquire the optimization.  Another alternative is to
change the bit layout within the uint64; for example, we could use the
most-significant 48 bits instead of the least-significant 48 bits.  I dislike
the idea of a niche configuration driving that choice.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xlc 12.1 miscompiles 32-bit ginCompareItemPointers()

2015-07-19 Thread Andres Freund
On July 19, 2015 9:50:33 PM GMT+02:00, Noah Misch n...@leadboat.com wrote:
In a 32-bit build, xlc 12.1 for AIX miscompiles three inline expansions
of
ginCompareItemPointers(), all in ginget.c (line numbers per commit
9aa6634):

739Assert(!ItemPointerIsValid(entry-curItem) ||
740   ginCompareItemPointers(entry-curItem, advancePast) = 0);

847} while (ginCompareItemPointers(entry-curItem, advancePast) 
=
0);

915if (ginCompareItemPointers(key-curItem, advancePast)  0)

For one of the arguments, instead of computing
  hi  32 | lo  16 | posid
it computes
  (lo  16)  32 | lo  16 | posid

PostgreSQL 9.4, which introduced the inline ginCompareItemPointers(),
is the
oldest version affected.  The problem remained invisible until my
recent
commit 43d89a2; the quiet inline configure test had been failing,
leaving
PG_USE_INLINE unset.

I tried some workarounds.  Introducing intermediate variables or moving
parts
of the calculation down into other inline functions did not help.  The
code
compiles fine when not inlined.  Changing hi  32 to hi  33
worked (we
need just 48 of the 64 bits), but it presumably reduces performance on
ABIs
where the current bit shifts boil down to a no-op.

I propose to expand the gin_private.h #ifdef PG_USE_INLINE test to
exclude
xlc 32-bit configurations.  The last 32-bit AIX kernel exited support
on
2012-04-30. 

I vote to simply error out in that case then. Trying to fix individual compiler 
bugs in an niche OS sounds like a bad idea.

Andres


--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2015-07-19 Thread Andres Freund
On 2015-07-19 11:28:25 -0400, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  So, to get to the actual meat: My goal was to essentially get rid of an
  exclusive lock over relation extension alltogether. I think I found a
  way to do that that addresses the concerns made in this thread.
 
  Thew new algorithm basically is:
  1) Acquire victim buffer, clean it, and mark it as pinned
  2) Get the current size of the relation, save buffer into blockno
  3) Try to insert an entry into the buffer table for blockno
  4) If the page is already in the buffer table, increment blockno by 1,
 goto 3)
  5) Try to read the page. In most cases it'll not yet exist. But the page
 might concurrently have been written by another backend and removed
 from shared buffers already. If already existing, goto 1)
  6) Zero out the page on disk.
 
  I think this does handle the concurrency issues.
 
 The need for (5) kind of destroys my faith in this really being safe: it
 says there are non-obvious race conditions here.

It's not simple, I agree. I'm doubtful that an significantly simpler
approach exists.

 For instance, what about this scenario:
 * Session 1 tries to extend file, allocates buffer for page 42 (so it's
 now between steps 4 and 5).
 * Session 2 tries to extend file, sees buffer for 42 exists, allocates
 buffer for page 43 (so it's also now between 4 and 5).
 * Session 2 tries to read page 43, sees it's not there, and writes out
 page 43 with zeroes (now it's done).
 * Session 1 tries to read page 42, sees it's there and zero-filled
 (not because anybody wrote it, but because holes in files read as 0).
 
 At this point session 1 will go and create page 44, won't it, and you
 just wasted a page.

My local code now recognizes that case and uses the page. We just need
to do an PageIsNew().


 Also, the file is likely to end up badly physically fragmented when the
 skipped pages are finally filled in.  One of the good things about the
 relation extension lock is that the kernel sees the file as being extended
 strictly sequentially, which it should handle fairly well as far as
 filesystem layout goes.  This way might end up creating a mess on-disk.

I don't think that'll actually happen with any recent
filesystems. Pretty much all of them do delayed allocation.  But it
definitely is a concern with older filesystems.

I've just measured and with ext4 the number of extents per segment in a
300GB relation don't show a significant difference when compared between
the existing and new code.

We could try to address this by optionally using posix_fallocate() to do
the actual extension - then there'll not be sparse regions, but actually
allocated disk blocks.

 Perhaps even more to the point, you've added a read() kernel call that was
 previously not there at all, without having removed either the lseek() or
 the write().  Perhaps that scales better when what you're measuring is
 saturation conditions on a many-core machine, but I have a very hard time
 believing that it's not a significant net loss under less-contended
 conditions.

Yes, this has me worried too.


 I'm inclined to think that a better solution in the long run is to keep
 the relation extension lock but find a way to extend files more than
 one page per lock acquisition.

I doubt that'll help as much. As long as you have to search and write
out buffers under an exclusive lock that'll be painful.  You might be
able to make that an infrequent occurrance by extending in larger
amounts and entering the returned pages into the FSM, but you'll have
rather noticeable latency increases everytime that happens. And not just
in the extending relation - all the other relations will wait for the
one doing the extending.  We could move that into some background
process, but at that point things have gotten seriously complex.

The more radical solution would be to have some place in memory that'd
store the current number of blocks. Then all the extension specific
locking we'd need would be around incrementing that.  But how and where
to store that isn't easy.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xlc 12.1 miscompiles 32-bit ginCompareItemPointers()

2015-07-19 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 On July 19, 2015 9:50:33 PM GMT+02:00, Noah Misch n...@leadboat.com wrote:
 I propose to expand the gin_private.h #ifdef PG_USE_INLINE test to
 exclude xlc 32-bit configurations.  The last 32-bit AIX kernel exited
 support on 2012-04-30.

 I vote to simply error out in that case then. Trying to fix individual 
 compiler bugs in an niche OS sounds like a bad idea.

I think I'm with Andres --- are there really enough users of this
configuration to justify working around such a bug?

More: if the compiler does have a bug like that, how much confidence can
we have, really, that there are no other miscompiled places and won't be
any in the future?  If we are going to support this configuration, I think
what the patch ought to do is disable PG_USE_INLINE globally when this
compiler is detected.  That would revert the behavior to what it was
before 43d89a2.  We have absolutely no field experience justifying the
assumption that a narrower fix would be safe.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2015-07-19 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 So, to get to the actual meat: My goal was to essentially get rid of an
 exclusive lock over relation extension alltogether. I think I found a
 way to do that that addresses the concerns made in this thread.

 Thew new algorithm basically is:
 1) Acquire victim buffer, clean it, and mark it as pinned
 2) Get the current size of the relation, save buffer into blockno
 3) Try to insert an entry into the buffer table for blockno
 4) If the page is already in the buffer table, increment blockno by 1,
goto 3)
 5) Try to read the page. In most cases it'll not yet exist. But the page
might concurrently have been written by another backend and removed
from shared buffers already. If already existing, goto 1)
 6) Zero out the page on disk.

 I think this does handle the concurrency issues.

The need for (5) kind of destroys my faith in this really being safe: it
says there are non-obvious race conditions here.

For instance, what about this scenario:
* Session 1 tries to extend file, allocates buffer for page 42 (so it's
now between steps 4 and 5).
* Session 2 tries to extend file, sees buffer for 42 exists, allocates
buffer for page 43 (so it's also now between 4 and 5).
* Session 2 tries to read page 43, sees it's not there, and writes out
page 43 with zeroes (now it's done).
* Session 1 tries to read page 42, sees it's there and zero-filled
(not because anybody wrote it, but because holes in files read as 0).

At this point session 1 will go and create page 44, won't it, and you
just wasted a page.  Now we do have mechanisms for reclaiming such pages
but they may not kick in until VACUUM, so you could end up with a whole
lot of table bloat.

Also, the file is likely to end up badly physically fragmented when the
skipped pages are finally filled in.  One of the good things about the
relation extension lock is that the kernel sees the file as being extended
strictly sequentially, which it should handle fairly well as far as
filesystem layout goes.  This way might end up creating a mess on-disk.

Perhaps even more to the point, you've added a read() kernel call that was
previously not there at all, without having removed either the lseek() or
the write().  Perhaps that scales better when what you're measuring is
saturation conditions on a many-core machine, but I have a very hard time
believing that it's not a significant net loss under less-contended
conditions.

I'm inclined to think that a better solution in the long run is to keep
the relation extension lock but find a way to extend files more than
one page per lock acquisition.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2015-07-19 Thread Andres Freund
On 2015-07-19 11:56:47 -0400, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  On 2015-07-19 11:28:25 -0400, Tom Lane wrote:
  At this point session 1 will go and create page 44, won't it, and you
  just wasted a page.
 
  My local code now recognizes that case and uses the page. We just need
  to do an PageIsNew().
 
 Er, what?  How can you tell whether an all-zero page was or was not
 just written by another session?

The check is only done while holding the io lock on the relevant page
(have to hold that anyway), after reading it in ourselves, just before
setting BM_VALID. As we only can get to that point when there wasn't any
other entry for the page in the buffer table, that guarantees that no
other backend isn't currently expanding into that page. Others might
wait to read it, but those'll wait behind the IO lock.


The situation the read() protect us against is that two backends try to
extend to the same block, but after one of them succeeded the buffer is
written out and reused for an independent page. So there is no in-memory
state telling the slower backend that that page has already been used.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in bttext_abbrev_convert()

2015-07-19 Thread Peter Eisentraut
On 7/6/15 3:52 PM, Alvaro Herrera wrote:
 Kevin Grittner wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Peter Geoghegan wrote:

 It would be nice to always have a html report from gcov always
 available on the internet. That would be something useful to
 automate, IMV.

 http://pgci.eisentraut.org/jenkins/job/postgresql_master_coverage/Coverage/

 What would it take to get something like that which uses the
 check-world target instead of just the check target?
 
 I suggest CC'ing Peter as a first measure.  I already suggested this (or
 something similar) to him months ago.

I would like to do that, but the current host has no more capacity and
the hoster is complaining, so I need to look for a new hosting solution
before I can expand what is being built.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-19 Thread Jeff Janes
On Thu, Jul 16, 2015 at 12:03 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas hlinn...@iki.fi
 wrote:


 Both. Here's the patch.

 Previously, LWLockAcquireWithVar set the variable associated with the
 lock atomically with acquiring it. Before the lwlock-scalability changes,
 that was straightforward because you held the spinlock anyway, but it's a
 lot harder/expensive now. So I changed the way acquiring a lock with a
 variable works. There is now a separate flag, LW_FLAG_VAR_SET, which
 indicates that the current lock holder has updated the variable. The
 LWLockAcquireWithVar function is gone - you now just use LWLockAcquire(),
 which always clears the LW_FLAG_VAR_SET flag, and you can call
 LWLockUpdateVar() after that if you want to set the variable immediately.
 LWLockWaitForVar() always waits if the flag is not set, i.e. it will not
 return regardless of the variable's value, if the current lock-holder has
 not updated it yet.


 I ran this for a while without casserts and it seems to work.  But with
 casserts, I get failures in the autovac process on the GIN index.

 I don't see how this is related to the LWLock issue, but I didn't see it
 without your patch.  Perhaps the system just didn't survive long enough to
 uncover it without the patch (although it shows up pretty quickly).  It
 could just be an overzealous Assert, since the casserts off didn't show
 problems.


 bt and bt full are shown below.

 Cheers,

 Jeff

 #0  0x003dcb632625 in raise () from /lib64/libc.so.6
 #1  0x003dcb633e05 in abort () from /lib64/libc.so.6
 #2  0x00930b7a in ExceptionalCondition (
 conditionName=0x9a1440 !(((PageHeader) (page))-pd_special =
 (__builtin_offsetof (PageHeaderData, pd_linp))), errorType=0x9a12bc
 FailedAssertion,
 fileName=0x9a12b0 ginvacuum.c, lineNumber=713) at assert.c:54
 #3  0x004947cf in ginvacuumcleanup (fcinfo=0x7fffee073a90) at
 ginvacuum.c:713


It now looks like this *is* unrelated to the LWLock issue.  The assert that
it is tripping over was added just recently (302ac7f27197855afa8c) and so I
had not been testing under its presence until now.  It looks like it is
finding all-zero pages (index extended but then a crash before initializing
the page?) and it doesn't like them.

(gdb) f 3
(gdb) p *(char[8192]*)(page)
$11 = '\000' repeats 8191 times

Presumably before this assert, such pages would just be permanently
orphaned.

Cheers,

Jeff


Re: [HACKERS] creating extension including dependencies

2015-07-19 Thread Michael Paquier
On Sat, Jul 18, 2015 at 8:00 AM, Petr Jelinek p...@2ndquadrant.com wrote:
 On 2015-07-15 06:07, Michael Paquier wrote:

 On Fri, Jul 10, 2015 at 11:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Andres Freund and...@anarazel.de writes:

 On July 10, 2015 4:16:59 PM GMT+02:00, Tom Lane t...@sss.pgh.pa.us
 wrote:

 Would that propagate down through multiple levels of CASCADE?
 (Although
 I'm not sure it would be sensible for a non-relocatable extension to
 depend on a relocatable one, so maybe the need doesn't arise in
 practice.)


 I'd day so. I was thinking it'd adding a flag that allows to pass a
 schema to a non relocatable extension. That'd then be passed down. I
 agree that it's unlikely to be used often.


 Yeah, I was visualizing it slightly differently, as a separate
 internal-only option schema_if_needed, but it works out to the
 same thing either way.


 I added DEFAULT SCHEMA option into CREATE EXTENSION which behaves like
 SCHEMA but only for extension that don't specify their schema and is
 mutually exclusive with just SCHEMA. The DEFAULT SCHEMA propagates when
 CASCADE is used while the SCHEMA option does not propagate. I'd like to hear
 opinions about this behavior. It would be possible to propagate SCHEMA as
 DEFAULT SCHEMA but that might have surprising results (installing
 dependencies in same schema as the current ext).

Hm...

First, the current patch has a surprising behavior because it fails to
create an extension in cascade when creation is attempted on a custom
schema:
=# create schema po;
CREATE SCHEMA
=# create extension hstore_plperl with schema po cascade;
NOTICE:  0: installing required extension hstore
LOCATION:  CreateExtension, extension.c:1483
NOTICE:  0: installing required extension plperl
LOCATION:  CreateExtension, extension.c:1483
ERROR:  42704: type hstore does not exist
LOCATION:  typenameType, parse_type.c:258
Time: 30.071 ms
To facilitate the life of users, I think that the parent extensions
should be created on the same schema as its child by default. In this
case hstore should be created in po, not public.

And actually by looking at the code I can guess that when DEFAULT
SCHEMA is used and that a non-relocatable extension with a schema
defined is created in cascade this will error-out as well. That's not
really user-friendly..

Hence, as a schema can only be specified in a control file for a
non-relocatable extension, I think that the schema name propagated to
the root extensions should be the one specified in the control file of
the child if it is defined in it instead of the one specified by user
(imagine for example an extension created in cascade that requires
adminpack, extension that can only be deployed in pg_catalog), and
have the root node use its own schema if it has one in its control
file by default.

For example in this case:
foo1 (WITH SCHEMA hoge) - foo2 (schema = pg_catalog) - foo2_1
|
|-- foo2_2
--- foo3 (no schema)
With this command:
CREATE EXTENSION foo1 WITH SCHEMA hoge;
foo3 is created on schema po. foo2, as well its required dependencies
foo2_1 and foo2_2 are created on pg_catalog.

Now DEFAULT SCHEMA is trying to achieve: Hey, I want to define foo2_1
and foo2_2 on schema hoge. This may be worth achieving (though IMO I
think that foo1 should have direct dependencies with foo2_1 and
foo2_2), but I think that we should decide what is the default
behavior we want, and implement the additional behavior in a second
patch, separated from the patch of this thread. Personally I am more a
fan of propagating to parent extensions the schema of the child and
not of its grand-child by default, but I am not alone here :)

 The list of contrib modules excluded from build in the case of MSVC
 needs to include test_extensions ($contrib_excludes in
 src/tools/msvc/Mkvcbuild.pm) or build on Windows using MS of VC will
 fail. commit_ts does that for example.


 Done, hopefully correctly, I don't have access to MSVC.

That's done correctly.

+   /*
+* Propagate the CASCADE option and
add current extenssion
+* to the list for checking the cyclic
dependencies.
+*/
s/extenssion/extension/

+   /* Check for cyclic dependency between
extension. */
s/extension/extensions/

psql tab completion should be completed with cascade. See the part
with WITH SCHEMA in tab-complete.c.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-19 Thread Jeff Janes
On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas hlinn...@iki.fi wrote:

 On 06/30/2015 11:24 PM, Andres Freund wrote:

 On 2015-06-30 22:19:02 +0300, Heikki Linnakangas wrote:

 Hm. Right. A recheck of the value after the queuing should be sufficient
 to fix? That's how we deal with the exact same scenarios for the normal
 lock state, so that shouldn't be very hard to add.


 Yeah. It's probably more efficient to not release the spinlock between
 checking the value and LWLockQueueSelf() though.


 Right now LWLockQueueSelf() takes spinlocks etc itself, and is used that
 way in a bunch of callsites... So that'd be harder.  Additionally I'm
 planning to get rid of the spinlocks around queuing (they show up as
 bottlenecks in contended workloads), so it seems more future proof not
 to assume that either way.  On top of that I think we should, when
 available (or using the same type of fallback as for 32bit?), use 64 bit
 atomics for the var anyway.

  I'll take a stab at fixing this tomorrow.


 Thanks! Do you mean both or just the second issue?


 Both. Here's the patch.

 Previously, LWLockAcquireWithVar set the variable associated with the lock
 atomically with acquiring it. Before the lwlock-scalability changes, that
 was straightforward because you held the spinlock anyway, but it's a lot
 harder/expensive now. So I changed the way acquiring a lock with a variable
 works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that
 the current lock holder has updated the variable. The LWLockAcquireWithVar
 function is gone - you now just use LWLockAcquire(), which always clears
 the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if
 you want to set the variable immediately. LWLockWaitForVar() always waits
 if the flag is not set, i.e. it will not return regardless of the
 variable's value, if the current lock-holder has not updated it yet.

 This passes make check, but I haven't done any testing beyond that. Does
 this look sane to you?



After applying this patch to commit fdf28853ae6a397497b79f, it has survived
testing long enough to convince that this fixes the problem.

Cheers,

Jeff


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-19 Thread Simon Riggs
On 16 July 2015 at 06:51, Haribabu Kommi kommi.harib...@gmail.com wrote:


 I marked the patch as ready for committer.


The most recent results seem to indicate that the prefetch isn't worth
pursuing, but the vm test is. Can someone repeat the perf tests on
something larger so we can see, when/if there is a benefit?

Jeff, can you add detailed comments to explain the theory of operation of
these patches? The patches add the code, but don't say why. We will forget,
so lets put the detail in there now please. Thanks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Parallel Seq Scan

2015-07-19 Thread Amit Kapila
On Fri, Jul 17, 2015 at 1:22 PM, Haribabu Kommi kommi.harib...@gmail.com
wrote:

 On Thu, Jul 16, 2015 at 1:10 PM, Amit Kapila amit.kapil...@gmail.com
wrote:
  Thanks, I will fix this in next version of patch.
 

 I am posting in this thread as I am not sure, whether it needs a
 separate thread or not?

 I gone through the code and found that the newly added funnel node is
 is tightly coupled with
 partial seq scan, in order to add many more parallel plans along with
 parallel seq scan,
 we need to remove the integration of this node with partial seq scan.


This assumption is wrong, Funnel node can execute any node beneath
it (Refer ExecFunnel-funnel_getnext-ExecProcNode, similarly you
can see exec_parallel_stmt).  Yes, currently nodes supported under
Funnel nodes are limited like partialseqscan, result (due to reasons
mentioned upthread like readfuncs.s doesn't have support to read Plan
nodes which is required for worker backend to read the PlannedStmt,
ofcourse we can add them, but as we are supportting parallelism for
limited nodes, so I have not enhanced the readfuncs.c) but in general
the basic infrastructure is designed such a way that it can support
other nodes beneath it.

 To achieve the same, I have the following ideas.


 Execution:
 The funnel execution varies based on the below plan node.
 1) partial scan - Funnel does the local scan also and returns the tuples
 2) partial agg - Funnel does the merging of aggregate results and
 returns the final result.


Basically Funnel will execute any node beneath it, the Funnel node itself
is not responsible for doing local scan or any form of consolidation of
results, as of now, it has these 3 basic properties
– Has one child, runs multiple copies in parallel.
– Combines the results into a single tuple stream.
– Can run the child itself if no workers available.

 Any other better ideas to achieve the same?


Refer slides 16-19 in Parallel Sequential Scan presentation in PGCon
https://www.pgcon.org/2015/schedule/events/785.en.html

I don't have very clear idea what is the best way to transform the nodes
in optimizer, but I think we can figure that out later unless majority
people
see that as blocking factor.

Thanks for looking into patch!

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Implementation of global temporary tables?

2015-07-19 Thread Gavin Flower

On 20/07/15 15:00, Pavel Stehule wrote:



2015-07-19 21:39 GMT+02:00 Josh Berkus j...@agliodbs.com 
mailto:j...@agliodbs.com:


Pavel, All:

Just to be clear, the idea of a global temp table is that the
table def
is available to all users, but the data is private to each session?


yes.

Pavel


--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Just wondering...

Would it be difficult to add the ability for one user to share the 
contents with a list of named other users (roles)?


-Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementation of global temporary tables?

2015-07-19 Thread Pavel Stehule
2015-07-20 5:33 GMT+02:00 Gavin Flower gavinflo...@archidevsys.co.nz:

 On 20/07/15 15:00, Pavel Stehule wrote:



 2015-07-19 21:39 GMT+02:00 Josh Berkus j...@agliodbs.com mailto:
 j...@agliodbs.com:

 Pavel, All:

 Just to be clear, the idea of a global temp table is that the
 table def
 is available to all users, but the data is private to each session?


 yes.

 Pavel


 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com


  Just wondering...

 Would it be difficult to add the ability for one user to share the
 contents with a list of named other users (roles)?


Probably it is possible, but not for temporary data - short data are in
process memory, so it are not accessible from other sessions.

This sharing tables needs:

1. some infrastructure to hold data about sharing - who can share with what
2. who will clean data? temporary data are cleaned on end of transaction or
end of session
3. data should be saved in shared memory instead process memory

So it is possible, but partially different


 -Gavin



Re: [HACKERS] Relation extension scalability

2015-07-19 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 On 2015-07-19 11:28:25 -0400, Tom Lane wrote:
 At this point session 1 will go and create page 44, won't it, and you
 just wasted a page.

 My local code now recognizes that case and uses the page. We just need
 to do an PageIsNew().

Er, what?  How can you tell whether an all-zero page was or was not
just written by another session?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers