date:20140317


On 03/15/2014 08:53 AM, Fabien COELHO wrote:

* Does min and max really make sense for gaussian and exponential
distributions? For gaussian, I would expect mean and standard deviation as
the parameters, not min/max/threshold.

Yes... and no:-) The aim is to draw an integer primary key from a table,
so it must be in a specified range.


Well, I don't agree with that aim. It's useful for choosing a primary 
key, as in the pgbench TPC-B workload, but a gaussian distributed random 
number could be used for many other things too. For example:


\setrandom foo ... gaussian

select * from cheese where weight  :foo

And :foo should be a float, not an integer. That's what I was trying to 
say earlier, when I said that the variable should be a float. If you 
need an integer, just cast or round it in the query.


I realize that the current \setrandom sets the variable to an integer, 
so gaussian/exponential would be different. But so what? An option to 
generate uniformly distributed floats would be handy too, though.



This is approximated by drawing a
double value with the expected distribution (gaussian or exponential) and
project it carefully onto integers. If it is out of range, there is a loop
and another value is drawn. The minimal threshold constraint (2.0) ensures
that the probability of looping is low.


Well, that's one way to do constraint it to the given range, but there 
are many other ways to do it. Like, clamp it to the min/max if it's out 
of range. I don't think we need to choose any particular method, you can 
handle that in the test script.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gaussian distribution pgbench


On 03/17/2014 10:40 AM, KONDO Mitsumasa wrote:

By the way, you seem to want to remove --gaussian=NUM and --exponential=NUM
command options. Can you tell me the objective reason? I think pgbench is the
benchmark test on PostgreSQL and default benchmark is TPC-B-like benchmark.
It is written in documents, and default benchmark wasn't changed by my patch.
So we need not remove command options, and they are one of the variety of
benchmark options. Maybe you have something misunderstanding about my patch...


There is an infinite number of variants of the TPC-B test that we could 
include in pgbench. If we start adding every one of them, we're quickly 
going to have hundreds of options to choose the workload. I'd like to 
keep pgbench simple. These two new test variants, gaussian and 
exponential, are not that special that they'd deserve to be included in 
the program itself.


pgbench already has a mechanism for running custom scripts, in which you 
can specify whatever workload you want. Let's use that. If it's missing 
something you need to specify the workload you want, let's enhance the 
script language.


The features we're missing, which makes it difficult to write the 
gaussian and exponential variants as custom scripts, is the capability 
to create random numbers with a non-uniform distribution. That's the 
feature we should include in pgbench.


(Actually, you could do the Box-Muller transformation as part of the 
query, to convert the uniform random variable to a gaussian one. Then 
you wouldn't need any changes to pgbench. But I agree that \setrandom 
... gaussian would be quite handy)


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gaussian distribution pgbench

2014-03-17 Thread KONDO Mitsumasa


(2014/03/17 17:46), Heikki Linnakangas wrote:

On 03/15/2014 08:53 AM, Fabien COELHO wrote:

* Does min and max really make sense for gaussian and exponential
distributions? For gaussian, I would expect mean and standard deviation as
the parameters, not min/max/threshold.

Yes... and no:-) The aim is to draw an integer primary key from a table,
so it must be in a specified range.


Well, I don't agree with that aim. It's useful for choosing a primary key, as in
the pgbench TPC-B workload, but a gaussian distributed random number could be
used for many other things too. For example:

\setrandom foo ... gaussian

select * from cheese where weight  :foo

And :foo should be a float, not an integer. That's what I was trying to say
earlier, when I said that the variable should be a float. If you need an 
integer,
just cast or round it in the query.

I realize that the current \setrandom sets the variable to an integer, so
gaussian/exponential would be different. But so what? An option to generate
uniformly distributed floats would be handy too, though.
Well, it seems new feature. If you want to realise it as double, add 
'\setrandomd' as a double random generator in pgbebch. I will agree with that.



This is approximated by drawing a
double value with the expected distribution (gaussian or exponential) and
project it carefully onto integers. If it is out of range, there is a loop
and another value is drawn. The minimal threshold constraint (2.0) ensures
that the probability of looping is low.


Well, that's one way to do constraint it to the given range, but there are many
other ways to do it. Like, clamp it to the min/max if it's out of range.

It's too heavy method.. Client calculation must be light.


I don't
think we need to choose any particular method, you can handle that in the test
script.

I think our implementation is the best way to realize it.
It is fast and robustness for the probability of looping is low.

If you have better idea, please teach us.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gaussian distribution pgbench

2014-03-17 Thread KONDO Mitsumasa


(2014/03/17 18:02), Heikki Linnakangas wrote:

On 03/17/2014 10:40 AM, KONDO Mitsumasa wrote:

By the way, you seem to want to remove --gaussian=NUM and --exponential=NUM
command options. Can you tell me the objective reason? I think pgbench is the
benchmark test on PostgreSQL and default benchmark is TPC-B-like benchmark.
It is written in documents, and default benchmark wasn't changed by my patch.
So we need not remove command options, and they are one of the variety of
benchmark options. Maybe you have something misunderstanding about my patch...


There is an infinite number of variants of the TPC-B test that we could include
in pgbench. If we start adding every one of them, we're quickly going to have
hundreds of options to choose the workload. I'd like to keep pgbench simple.
These two new test variants, gaussian and exponential, are not that special that
they'd deserve to be included in the program itself.
Well, I add only two options, and they are major distribution that are seen in 
real database system than uniform distiribution. I'm afraid, I think you are too 
worried and it will not be added hundreds of options. And pgbench is still simple.



pgbench already has a mechanism for running custom scripts, in which you can
specify whatever workload you want. Let's use that. If it's missing something 
you
need to specify the workload you want, let's enhance the script language.
I have not seen user who is using pgbench custom script very much. And gaussian 
and exponential distribution are much better to measure the real system 
perfomance, so I'd like to use it command option. In now pgbench, we can only 
measure about database size, but it isn't realistic situation. We want to forcast 
the required system from calculating the size of hot spot or distirbution of 
access pettern.


I'd realy like to include it on my heart:)  Please...

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gaussian distribution pgbench

On Mon, Mar 17, 2014 at 7:07 PM, KONDO Mitsumasa
kondo.mitsum...@lab.ntt.co.jp wrote:
 (2014/03/17 18:02), Heikki Linnakangas wrote:

 On 03/17/2014 10:40 AM, KONDO Mitsumasa wrote:

 By the way, you seem to want to remove --gaussian=NUM and
 --exponential=NUM
 command options. Can you tell me the objective reason? I think pgbench is
 the
 benchmark test on PostgreSQL and default benchmark is TPC-B-like
 benchmark.
 It is written in documents, and default benchmark wasn't changed by my
 patch.
 So we need not remove command options, and they are one of the variety of
 benchmark options. Maybe you have something misunderstanding about my
 patch...


 There is an infinite number of variants of the TPC-B test that we could
 include
 in pgbench. If we start adding every one of them, we're quickly going to
 have
 hundreds of options to choose the workload. I'd like to keep pgbench
 simple.
 These two new test variants, gaussian and exponential, are not that
 special that
 they'd deserve to be included in the program itself.

 Well, I add only two options, and they are major distribution that are seen
 in real database system than uniform distiribution. I'm afraid, I think you
 are too worried and it will not be added hundreds of options. And pgbench is
 still simple.


 pgbench already has a mechanism for running custom scripts, in which you
 can
 specify whatever workload you want. Let's use that. If it's missing
 something you
 need to specify the workload you want, let's enhance the script language.

 I have not seen user who is using pgbench custom script very much. And
 gaussian and exponential distribution are much better to measure the real
 system perfomance, so I'd like to use it command option. In now pgbench, we
 can only measure about database size, but it isn't realistic situation. We
 want to forcast the required system from calculating the size of hot spot or
 distirbution of access pettern.

 I'd realy like to include it on my heart:)  Please...

I have no strong opinion about the command-line option for gaussian,
but I think that we should focus on \setrandom gaussian first. Even
after that's committed, we can implement that commnand-line option
later if many people think that's necessary.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: show relation and tuple infos of a lock to acquire

2014-03-17 Thread Christian Kruse

Hi Amit,

I've been ill the last few days, so sorry for my late response.

 I have updated the patch to pass TID and operation information in
 error context and changed some of the comments in code.
 Let me know if the added operation information is useful, else
 we can use better generic message in context.

I don't think that this fixes the translation guideline issues brought
up by Robert. This produces differing strings for the different cases
as well and it also passes in altering data directly to the error
string.

It also may produce error messages that are really weird. You
initialize the string with while attempting to . The remaining part
of the function is covered by if()s which may lead to error messages
like this:

„while attempting to “
„while attempting to in relation public.xyz of database abc“
„while attempting to in database abc“

Although this may not be very likely (because
ItemPointerIsValid((info-ctid))) should in this case not return
false).

Attached you will find a new version of this patch; it slightly
violates the translation guidelines as well: it assembles an error
string (but it doesn't pass in altering data like ctid or things like
that). I simply couldn't think of a nice solution without doing so,
and after looking through the code there are a few cases
(e.g. CheckTableNotInUse()) where this is done, too. If we insist of
having complete strings in this case we would have to have 6 * 3 * 2
error strings in the code.

Best regards,

-- 
 Christian Kruse   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e2337ac..c0f5881 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -107,6 +107,8 @@ static TransactionId MultiXactIdGetUpdateXid(TransactionId xmax,
 		uint16 t_infomask);
 static void MultiXactIdWait(MultiXactId multi, MultiXactStatus status,
 int *remaining, uint16 infomask);
+static void MultiXactIdWaitExtended(Relation rel, ItemPointerData ctid, const char *opr_name,
+	MultiXactId multi, MultiXactStatus status, int *remaining, uint16 infomask);
 static bool ConditionalMultiXactIdWait(MultiXactId multi,
 		   MultiXactStatus status, int *remaining,
 		   uint16 infomask);
@@ -2714,8 +2716,9 @@ l1:
 		if (infomask  HEAP_XMAX_IS_MULTI)
 		{
 			/* wait for multixact */
-			MultiXactIdWait((MultiXactId) xwait, MultiXactStatusUpdate,
-			NULL, infomask);
+			MultiXactIdWaitExtended(relation, tp.t_data-t_ctid, delete,
+	(MultiXactId) xwait,
+	MultiXactStatusUpdate, NULL, infomask);
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 
 			/*
@@ -2741,7 +2744,8 @@ l1:
 		else
 		{
 			/* wait for regular transaction to end */
-			XactLockTableWait(xwait);
+			XactLockTableWaitExtended(relation, tp.t_data-t_ctid,
+	  delete, xwait);
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 
 			/*
@@ -3266,8 +3270,8 @@ l2:
 			int			remain;
 
 			/* wait for multixact */
-			MultiXactIdWait((MultiXactId) xwait, mxact_status, remain,
-			infomask);
+			MultiXactIdWaitExtended(relation, oldtup.t_data-t_ctid, update,
+	   (MultiXactId) xwait, mxact_status, remain, infomask);
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 
 			/*
@@ -3306,7 +3310,7 @@ l2:
 			/*
 			 * There was no UPDATE in the MultiXact; or it aborted. No
 			 * TransactionIdIsInProgress() call needed here, since we called
-			 * MultiXactIdWait() above.
+			 * MultiXactIdWaitExtended() above.
 			 */
 			if (!TransactionIdIsValid(update_xact) ||
 TransactionIdDidAbort(update_xact))
@@ -3341,7 +3345,8 @@ l2:
 			else
 			{
 /* wait for regular transaction to end */
-XactLockTableWait(xwait);
+XactLockTableWaitExtended(relation, oldtup.t_data-t_ctid,
+		  update, xwait);
 LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 
 /*
@@ -4409,7 +4414,10 @@ l3:
 		RelationGetRelationName(relation;
 }
 else
-	MultiXactIdWait((MultiXactId) xwait, status, NULL, infomask);
+	MultiXactIdWaitExtended(relation, tuple-t_data-t_ctid,
+			lock, (MultiXactId) xwait,
+			status, NULL,
+			infomask);
 
 /* if there are updates, follow the update chain */
 if (follow_updates 
@@ -4464,7 +4472,8 @@ l3:
 		RelationGetRelationName(relation;
 }
 else
-	XactLockTableWait(xwait);
+	XactLockTableWaitExtended(relation, tuple-t_data-t_ctid,
+			  lock, xwait);
 
 /* if there are updates, follow the update chain */
 if (follow_updates 
@@ -5151,7 +5160,9 @@ l4:
 	if (needwait)
 	{
 		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-		XactLockTableWait(members[i].xid);
+		XactLockTableWaitExtended(rel, mytup.t_data-t_ctid,
+  lock updated,
+  members[i].xid);
 		pfree(members);
 		goto l4;
 	}
@@ -5211,7 +5222,8 @@ l4:
 if (needwait)
 {
 	LockBuffer(buf,

[HACKERS] Various typos

2014-03-17 Thread Thom Brown

Attached is a small patch to fix various typos.

Regards

Thom
diff --git a/contrib/pgcrypto/openssl.c b/contrib/pgcrypto/openssl.c
index ad7fb9e..86c0fb0 100644
--- a/contrib/pgcrypto/openssl.c
+++ b/contrib/pgcrypto/openssl.c
@@ -429,7 +429,7 @@ bf_init(PX_Cipher *c, const uint8 *key, unsigned klen, const uint8 *iv)
 
 	/*
 	 * Test if key len is supported. BF_set_key silently cut large keys and it
-	 * could be be a problem when user transfer crypted data from one server
+	 * could be a problem when user transfer crypted data from one server
 	 * to another.
 	 */
 
diff --git a/doc/src/sgml/gin.sgml b/doc/src/sgml/gin.sgml
index 80fb4f3..561608f 100644
--- a/doc/src/sgml/gin.sgml
+++ b/doc/src/sgml/gin.sgml
@@ -270,7 +270,7 @@
contains the corresponding query keys. Likewise, the function must
return GIN_FALSE only if the item does not match, whether or not it 
contains the GIN_MAYBE keys. If the result depends on the GIN_MAYBE
-   entries, ie. the match cannot be confirmed or refuted based on the
+   entries, i.e. the match cannot be confirmed or refuted based on the
known query keys, the function must return GIN_MAYBE.
   /para
   para
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index e24ff18..9d96bf5 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -933,7 +933,7 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
 	/*
 	 * Ok, we now know that there are no matches  minItem.
 	 *
-	 * If minItem is lossy, it means that there there were no exact items on
+	 * If minItem is lossy, it means that there were no exact items on
 	 * the page among requiredEntries, because lossy pointers sort after exact
 	 * items. However, there might be exact items for the same page among
 	 * additionalEntries, so we mustn't advance past them.
diff --git a/src/backend/access/gin/ginlogic.c b/src/backend/access/gin/ginlogic.c
index 4c8d706..b13ce4c 100644
--- a/src/backend/access/gin/ginlogic.c
+++ b/src/backend/access/gin/ginlogic.c
@@ -10,7 +10,7 @@
  * a GIN scan can apply various optimizations, if it can determine that an
  * item matches or doesn't match, even if it doesn't know if some of the keys
  * are present or not. Hence, it's useful to have a ternary-logic consistent
- * function, where where each key can be TRUE (present), FALSE (not present),
+ * function, where each key can be TRUE (present), FALSE (not present),
  * or MAYBE (don't know if present). This file provides such a ternary-logic
  * consistent function,  implemented by calling the regular boolean consistent
  * function many times, with all the MAYBE arguments set to all combinations
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 239c7da..4cf07ea 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -783,9 +783,9 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
  * deal with WAL logging at all - an fsync() at the end of a rewrite would be
  * sufficient for crash safety. Any mapping that hasn't been safely flushed to
  * disk has to be by an aborted (explicitly or via a crash) transaction and is
- * ignored by virtue of the xid in it's name being subject to a
+ * ignored by virtue of the xid in its name being subject to a
  * TransactionDidCommit() check. But we want to support having standbys via
- * physical replication, both for availability and to to do logical decoding
+ * physical replication, both for availability and to do logical decoding
  * there.
  * 
  */
@@ -1046,7 +1046,7 @@ logical_rewrite_log_mapping(RewriteState state, TransactionId xid,
 
 /*
  * Perform logical remapping for a tuple that's mapped from old_tid to
- * new_tuple-t_self by rewrite_heap_tuple() iff necessary for the tuple.
+ * new_tuple-t_self by rewrite_heap_tuple() if necessary for the tuple.
  */
 static void
 logical_rewrite_heap_tuple(RewriteState state, ItemPointerData old_tid,
diff --git a/src/backend/lib/ilist.c b/src/backend/lib/ilist.c
index 58c5810..73a6473 100644
--- a/src/backend/lib/ilist.c
+++ b/src/backend/lib/ilist.c
@@ -26,7 +26,7 @@
 /*
  * Delete 'node' from list.
  *
- * It is not allowed to delete a 'node' which is is not in the list 'head'
+ * It is not allowed to delete a 'node' which is not in the list 'head'
  *
  * Caution: this is O(n); consider using slist_delete_current() instead.
  */
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 04e407a..8c6c6c2 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -293,7 +293,7 @@ CreateInitDecodingContext(char *plugin,
 	 * So we have to acquire the ProcArrayLock to prevent computation of new
 	 * xmin horizons by other backends, get the safe decoding xid, and inform
 	 * the slot

Re: [HACKERS] Changeset Extraction v7.9.1

On Wed, Mar 12, 2014 at 2:06 PM, Andres Freund and...@2ndquadrant.com wrote:
 Attached are the collected remaining patches. The docs might need
 further additions, but it seems better to add them now.

A few questions about pg_recvlogical:

- There doesn't seem to be any provision for this tool to ever switch
from one output file to the next.  That seems like a practical need.
One idea would be to have it respond to SIGHUP by reopening the
originally-named output file.  Another would be to switch, after so
many bytes, to filename.1, then filename.2, etc.

- It confirms the write and flush positions, but doesn't appear to
actually flush anywhere.

- While I quite agree with your desire for stringinfo in src/common,
couldn't you use the roughly-equivalent PQExpBuffer facilities in
libpq instead?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.9.1

On 2014-03-17 06:55:28 -0400, Robert Haas wrote:
 On Wed, Mar 12, 2014 at 2:06 PM, Andres Freund and...@2ndquadrant.com wrote:
  Attached are the collected remaining patches. The docs might need
  further additions, but it seems better to add them now.
 
 A few questions about pg_recvlogical:
 
 - There doesn't seem to be any provision for this tool to ever switch
 from one output file to the next.  That seems like a practical need.
 One idea would be to have it respond to SIGHUP by reopening the
 originally-named output file.  Another would be to switch, after so
 many bytes, to filename.1, then filename.2, etc.

Hm. So far I haven't had the need, but you're right, it would be
useful. I don't like the .n notion, but SIGHUP would be fine with
me. I'll add that.

 - It confirms the write and flush positions, but doesn't appear to
 actually flush anywhere.

Yea. The reason it reports the flush position is that it allows to test
sync rep. I don't think other usecases will appreciate frequent
fsyncs... Maybe make it optional?

 - While I quite agree with your desire for stringinfo in src/common,
 couldn't you use the roughly-equivalent PQExpBuffer facilities in
 libpq instead?

Yes.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_dump without explicit table locking

2014-03-17 Thread Jürgen Strobel


Hi,

at work at my company I inherited responsibility for a large PG 8.1 DB,
with a an extreme number of tables (~30). Surprisingly this is
working quite well, except for maintenance and backup. I am tasked with
finding a way to do dump  restore to 9.3 with as little downtime as
possible.

Using 9.3's pg_dump with -j12 I found out that pg_dump takes 6 hours to
lock tables using a single thread, then does the data dump in 1 more
hour using 12 workers. However if I patch out the explicit LOCK TABLE
statements this only takes 1 hour total. Of course no one else is using
the DB at this time. In a pathological test case scenario in a staging
environment the dump time decreased from 5 hours to 5 minutes.

I've googled the problem and there seem to be more people with similar
problems, so I made this a command line option --no-table-locks and
wrapped it up in as nice a patch against github/master as I can manage
(and I didn't use C for a long time). I hope you find it useful.

regards,
Jürgen Strobel

commit 393d47cf6b5ce77297e52e7fc390aa862fd2c2fd
Author: Jürgen Strobel juergen+git...@strobel.info
Date:   Fri Mar 7 16:54:22 2014 +0100

Add option --no-table-locks to pg_dump.

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 1f0d4de..5725c46 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -772,6 +772,25 @@ PostgreSQL documentation
  /varlistentry
 
  varlistentry
+  termoption--no-table-locks//term
+  listitem
+   para
+If this option is specified, tables to be dumped will not be locked 
explicitly
+at the start of pg_dump. It implies 
option--no-synchronized-snapshots/.
+This is potentially dangerous and should only be used by experts,
+and only while there is no other activity in the database.
+   /para
+   para
+In the presense of an extreme number of tables pg_dump can exhibit
+bad startup performance because it needs to issue LOCK TABLE 
statements for all
+tables. This is intended as a last resort option for dump and restore 
version
+upgrades, in order to make downtime manageable. In an especially bad 
case
+with ~30 tables this reduced dump time from 6 to 1 hours.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termoption--no-tablespaces/option/term
   listitem
para
diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index 6f2634b..ac05356 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -32,6 +32,9 @@
 #define PIPE_READ  0
 #define PIPE_WRITE 1
 
+/* flag for the no-table-locks command-line long option */
+intno_table_locks = 0;
+
 /* file-scope variables */
 #ifdef WIN32
 static unsigned int tMasterThreadId = 0;
@@ -886,7 +889,7 @@ WaitForCommands(ArchiveHandle *AH, int pipefd[2])
 * meantime.  lockTableNoWait dies in this case to 
prevent a
 * deadlock.
 */
-   if (strcmp(te-desc, BLOBS) != 0)
+   if (!no_table_locks  strcmp(te-desc, BLOBS) != 0)
lockTableNoWait(AH, te);
 
/*
diff --git a/src/bin/pg_dump/parallel.h b/src/bin/pg_dump/parallel.h
index 7a32a9b..f95a6bb 100644
--- a/src/bin/pg_dump/parallel.h
+++ b/src/bin/pg_dump/parallel.h
@@ -70,6 +70,9 @@ extern bool parallel_init_done;
 extern DWORD mainThreadId;
 #endif
 
+/* flag for the no-table-locks command-line long option */
+extern int no_table_locks;
+
 extern void init_parallel_dump_utils(void);
 
 extern int GetIdleWorker(ParallelState *pstate);
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 17bb846..e1fe047 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -357,6 +357,7 @@ main(int argc, char **argv)
{use-set-session-authorization, no_argument, 
use_setsessauth, 1},
{no-security-labels, no_argument, no_security_labels, 1},
{no-synchronized-snapshots, no_argument, 
no_synchronized_snapshots, 1},
+   {no-table-locks, no_argument, no_table_locks, 1},
{no-unlogged-table-data, no_argument, 
no_unlogged_table_data, 1},
 
{NULL, 0, NULL, 0}
@@ -562,6 +563,10 @@ main(int argc, char **argv)
if (column_inserts)
dump_inserts = 1;
 
+   /* --no-table-locks implies --no_synchronized_snapshots */
+   if (no_table_locks)
+   no_synchronized_snapshots = 1;
+
if (dataOnly  schemaOnly)
{
write_msg(NULL, options -s/--schema-only and -a/--data-only 
cannot be used together\n);
@@ -902,6 +907,7 @@ help(const char *progname)
printf(_(  --insertsdump data as INSERT commands,

Re: [HACKERS] Various typos

On Mon, Mar 17, 2014 at 7:52 PM, Thom Brown t...@linux.com wrote:
 Attached is a small patch to fix various typos.

Thanks! Committed.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.9.1

On Mon, Mar 17, 2014 at 7:27 AM, Andres Freund and...@2ndquadrant.com wrote:
 - There doesn't seem to be any provision for this tool to ever switch
 from one output file to the next.  That seems like a practical need.
 One idea would be to have it respond to SIGHUP by reopening the
 originally-named output file.  Another would be to switch, after so
 many bytes, to filename.1, then filename.2, etc.

 Hm. So far I haven't had the need, but you're right, it would be
 useful. I don't like the .n notion, but SIGHUP would be fine with
 me. I'll add that.

Cool.

 - It confirms the write and flush positions, but doesn't appear to
 actually flush anywhere.

 Yea. The reason it reports the flush position is that it allows to test
 sync rep. I don't think other usecases will appreciate frequent
 fsyncs... Maybe make it optional?

Well, as I'm sure you recognize, if you're actually trying to build a
replication solution with this tool, you can't let the database throw
away the state required to suck changes out of the database unless
you've got those changes safely stored away somewhere else.  Now, of
course, if you don't acknowledge to the database that the stuff is on
disk, you're going to get data file bloat and excess WAL retention,
unlucky you.  But acknowledging that you've got the changes when
they're not actually on disk doesn't actually provide the guarantees
you went to so much trouble to build in to the mechanism.  So the
no-flush version really can ONLY ever be useful for testing, AFAICS,
or if you really don't care that much whether it can survive a server
crash.

Perhaps there could be a switch for an fsync interval, or something
like that.  The default could be, say, to fsync every 10 seconds.  And
if you want to change it, then go ahead; 0 disables.  Writing to
standard output would be documented as unreliable.  Other ideas
welcome.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump without explicit table locking

On Mon, Mar 17, 2014 at 7:52 AM, Jürgen Strobel juergen...@strobel.info wrote:
 at work at my company I inherited responsibility for a large PG 8.1 DB,
 with a an extreme number of tables (~30). Surprisingly this is
 working quite well, except for maintenance and backup. I am tasked with
 finding a way to do dump  restore to 9.3 with as little downtime as
 possible.

 Using 9.3's pg_dump with -j12 I found out that pg_dump takes 6 hours to
 lock tables using a single thread, then does the data dump in 1 more
 hour using 12 workers. However if I patch out the explicit LOCK TABLE
 statements this only takes 1 hour total. Of course no one else is using
 the DB at this time. In a pathological test case scenario in a staging
 environment the dump time decreased from 5 hours to 5 minutes.

 I've googled the problem and there seem to be more people with similar
 problems, so I made this a command line option --no-table-locks and
 wrapped it up in as nice a patch against github/master as I can manage
 (and I didn't use C for a long time). I hope you find it useful.

Fascinating report.  Whether we use your patch or not, that's
interesting to know about.  Please add your patch here so we don't
forget about it:

https://commitfest.postgresql.org/action/commitfest_view/open

See also https://wiki.postgresql.org/wiki/Submitting_a_Patch

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

On Mon, Mar 17, 2014 at 10:20 AM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Mar 16, 2014 at 6:23 AM, MauMau maumau...@gmail.com wrote:
 The PostgreSQL documentation describes cp (on UNIX/Linux) or copy (on
 Windows) as an example for archive_command.  However, cp/copy does not sync
 the copied data to disk.  As a result, the completed WAL segments would be
 lost in the following sequence:

 1. A WAL segment fills up.

 2. The archiver process archives the just filled WAL segment using
 archive_command.  That is, cp/copy reads the WAL segment file from pg_xlog/
 and writes to the archive area.  At this point, the WAL file is not
 persisted to the archive area yet, because cp/copy doesn't sync the writes.

 3. The checkpoint processing removes the WAL segment file from pg_xlog/.

 4. The OS crashes.  The filled WAL segment doesn't exist anywhere any more.

 Considering the reliable image of PostgreSQL and widespread use in
 enterprise systems, I think something should be done.  Could you give me
 your opinions on the right direction?  Although the doc certainly escapes by
 saying (This is an example, not a recommendation, and might not work on all
 platforms.), it seems from pgsql-xxx MLs that many people are following
 this example.

 * Improve the example in the documentation.
 But what command can we use to reliably sync just one file?

 * Provide some command, say pg_copy, which copies a file synchronously by
 using fsync(), and describes in the doc something like for simple use
 cases, you can use pg_copy as the standard reliable copy command.

 +1.  This won't obviate the need for tools to manage replication, but
 it would make it possible to get the simplest case right without
 guessing.

+1, too.

And, what about making pg_copy call posix_fadvise(DONT_NEED) against the
archived file after the copy? Also It might be good idea to support the direct
copy of the file to avoid wasting the file cache.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump without explicit table locking

2014-03-17 Thread Pavel Stehule

2014-03-17 12:52 GMT+01:00 Jürgen Strobel juergen...@strobel.info:


 Hi,

 at work at my company I inherited responsibility for a large PG 8.1 DB,
 with a an extreme number of tables (~30). Surprisingly this is
 working quite well, except for maintenance and backup. I am tasked with
 finding a way to do dump  restore to 9.3 with as little downtime as
 possible.

 Using 9.3's pg_dump with -j12 I found out that pg_dump takes 6 hours to
 lock tables using a single thread, then does the data dump in 1 more
 hour using 12 workers. However if I patch out the explicit LOCK TABLE
 statements this only takes 1 hour total. Of course no one else is using
 the DB at this time. In a pathological test case scenario in a staging
 environment the dump time decreased from 5 hours to 5 minutes.

 I've googled the problem and there seem to be more people with similar
 problems, so I made this a command line option --no-table-locks and
 wrapped it up in as nice a patch against github/master as I can manage
 (and I didn't use C for a long time). I hope you find it useful.


Joe Conway sent me a tip so commit eeb6f37d89fc60c6449ca12ef9e914
91069369cb significantly decrease a time necessary for locking. So it can
help to.

I am not sure, if missing lock is fully correct. In same situation I though
about some form of database level lock. So you can get a protected access
by one statement.

Regards

Pavel Stehule



 regards,
 Jürgen Strobel



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.9.1

On 2014-03-17 08:00:22 -0400, Robert Haas wrote:
  Yea. The reason it reports the flush position is that it allows to test
  sync rep. I don't think other usecases will appreciate frequent
  fsyncs... Maybe make it optional?
 
 Well, as I'm sure you recognize, if you're actually trying to build a
 replication solution with this tool, you can't let the database throw
 away the state required to suck changes out of the database unless
 you've got those changes safely stored away somewhere else.

Hm. I don't think a replication tool will use pg_recvlogical, do you
really forsee that? The main use cases for it that I can see are
testing/development of output plugins and logging/auditing.

That's not to say safe writing method isn't interesting tho.

 Perhaps there could be a switch for an fsync interval, or something
 like that.  The default could be, say, to fsync every 10 seconds.  And
 if you want to change it, then go ahead; 0 disables.  Writing to
 standard output would be documented as unreliable.  Other ideas
 welcome.

Hm. That'll be a bit nasty. fsync() is async signal safe, but it's still
forbidden to be called from a signal on windows IIRC. I guess we can
couple it with the standby_message_timeout stuff.

Unless you have a better idea?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.9.1

On 2014-03-17 08:00:22 -0400, Robert Haas wrote:
 On Mon, Mar 17, 2014 at 7:27 AM, Andres Freund and...@2ndquadrant.com wrote:
  - There doesn't seem to be any provision for this tool to ever switch
  from one output file to the next.  That seems like a practical need.
  One idea would be to have it respond to SIGHUP by reopening the
  originally-named output file.  Another would be to switch, after so
  many bytes, to filename.1, then filename.2, etc.
 
  Hm. So far I haven't had the need, but you're right, it would be
  useful. I don't like the .n notion, but SIGHUP would be fine with
  me. I'll add that.
 
 Cool.

So, I've implemented this, but it won't work on windows afaics. There's
no SIGHUP on windows, and the signal emulation code used in the backend
is backend only...
I'll be happy enough to declare this a known limitation for
now. Arguments to the contrary, best complemented with a solution?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.9.1

On Mon, Mar 17, 2014 at 8:29 AM, Andres Freund and...@2ndquadrant.com wrote:
 Perhaps there could be a switch for an fsync interval, or something
 like that.  The default could be, say, to fsync every 10 seconds.  And
 if you want to change it, then go ahead; 0 disables.  Writing to
 standard output would be documented as unreliable.  Other ideas
 welcome.

 Hm. That'll be a bit nasty. fsync() is async signal safe, but it's still
 forbidden to be called from a signal on windows IIRC. I guess we can
 couple it with the standby_message_timeout stuff.

Eh... I don't see any need to involve signals.  I'd just check after
each write() whether enough time has passed, or something like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.9.1

On Mon, Mar 17, 2014 at 8:58 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-03-17 08:00:22 -0400, Robert Haas wrote:
 On Mon, Mar 17, 2014 at 7:27 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  - There doesn't seem to be any provision for this tool to ever switch
  from one output file to the next.  That seems like a practical need.
  One idea would be to have it respond to SIGHUP by reopening the
  originally-named output file.  Another would be to switch, after so
  many bytes, to filename.1, then filename.2, etc.
 
  Hm. So far I haven't had the need, but you're right, it would be
  useful. I don't like the .n notion, but SIGHUP would be fine with
  me. I'll add that.

 Cool.

 So, I've implemented this, but it won't work on windows afaics. There's
 no SIGHUP on windows, and the signal emulation code used in the backend
 is backend only...
 I'll be happy enough to declare this a known limitation for
 now. Arguments to the contrary, best complemented with a solution?

Blarg.  I don't really like that, but I admit I don't have a better
idea, unless it's to go back to the suffix idea, with something like
--file-size-limit=XXX to trigger the switch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.9.1

On 2014-03-17 09:13:38 -0400, Robert Haas wrote:
 On Mon, Mar 17, 2014 at 8:29 AM, Andres Freund and...@2ndquadrant.com wrote:
  Perhaps there could be a switch for an fsync interval, or something
  like that.  The default could be, say, to fsync every 10 seconds.  And
  if you want to change it, then go ahead; 0 disables.  Writing to
  standard output would be documented as unreliable.  Other ideas
  welcome.
 
  Hm. That'll be a bit nasty. fsync() is async signal safe, but it's still
  forbidden to be called from a signal on windows IIRC. I guess we can
  couple it with the standby_message_timeout stuff.
 
 Eh... I don't see any need to involve signals.  I'd just check after
 each write() whether enough time has passed, or something like that.

What if no new writes are needed? Because e.g. there's either no write
activity on the primary or all writes are in another database or
somesuch?
I think checking in sendFeedback() is the best bet...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.9.1

On 2014-03-17 09:14:51 -0400, Robert Haas wrote:
 On Mon, Mar 17, 2014 at 8:58 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2014-03-17 08:00:22 -0400, Robert Haas wrote:
  On Mon, Mar 17, 2014 at 7:27 AM, Andres Freund and...@2ndquadrant.com 
  wrote:
   - There doesn't seem to be any provision for this tool to ever switch
   from one output file to the next.  That seems like a practical need.
   One idea would be to have it respond to SIGHUP by reopening the
   originally-named output file.  Another would be to switch, after so
   many bytes, to filename.1, then filename.2, etc.
  
   Hm. So far I haven't had the need, but you're right, it would be
   useful. I don't like the .n notion, but SIGHUP would be fine with
   me. I'll add that.
 
  Cool.
 
  So, I've implemented this, but it won't work on windows afaics. There's
  no SIGHUP on windows, and the signal emulation code used in the backend
  is backend only...
  I'll be happy enough to declare this a known limitation for
  now. Arguments to the contrary, best complemented with a solution?
 
 Blarg.  I don't really like that, but I admit I don't have a better
 idea, unless it's to go back to the suffix idea, with something like
 --file-size-limit=XXX to trigger the switch.

I think the SIGHUP support can be a useful independently from
--file-size-limit, so adding it seems like a good idea anyway. I think
anything more advanced than the SIGHUP stuff is for another day.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

On Sun, Mar 16, 2014 at 7:15 AM, Alexander Korotkov
aekorot...@gmail.com wrote:
 On Sat, Mar 15, 2014 at 11:27 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:

 On 03/15/2014 08:40 PM, Fujii Masao wrote:

 Hi,

 I executed the following statements in HEAD and 9.3, and compared
 the size of WAL which were generated by data insertion in GIN index.

 -
 CREATE EXTENSION pg_trgm;
 CREATE TABLE hoge (col1 text);
 CREATE INDEX hogeidx ON hoge USING gin (col1 gin_trgm_ops) WITH
 (FASTUPDATE = off);

 CHECKPOINT;
 SELECT pg_switch_xlog();
 SELECT pg_switch_xlog();

 SELECT pg_current_xlog_location();
 INSERT INTO hoge SELECT 'POSTGRESQL' FROM generate_series(1, 100);
 SELECT pg_current_xlog_location();
 -

 The results of WAL size are

  960 MB (9.3)
2113 MB (HEAD)

 The WAL size in HEAD was more than two times bigger than that in 9.3.
 Recently the source code of GIN index has been changed dramatically.
 Is the increase in GIN-related WAL intentional or a bug?


 It was somewhat expected. Updating individual items on the new-format GIN
 pages requires decompressing and recompressing the page, and the
 recompressed posting lists need to be WAL-logged. Which generates much
 larger WAL records.

 That said, I didn't expect the difference to be quite that big when you're
 appending to the end of the table. When the new entries go to the end of the
 posting lists, you only need to recompress and WAL-log the last posting
 list, which is max 256 bytes long. But I guess that's still a lot more WAL
 than in the old format.

I ran pg_xlogdump | grep Gin and checked the size of GIN-related WAL,
and then found its max seems more than 256B. Am I missing something?

What I observed is

[In HEAD]
At first, the size of GIN-related WAL is gradually increasing up to about 1400B.

rmgr: Gin len (rec/tot): 48/80, tx:   1813,
lsn: 0/020020D8, prev 0/0270, bkp: , desc: Insert item, node:
1663/12945/16441 blkno: 1 isdata: F isleaf: T isdelete: F
rmgr: Gin len (rec/tot): 56/88, tx:   1813,
lsn: 0/02002440, prev 0/020023F8, bkp: , desc: Insert item, node:
1663/12945/16441 blkno: 1 isdata: F isleaf: T isdelete: T
rmgr: Gin len (rec/tot): 64/96, tx:   1813,
lsn: 0/020044D8, prev 0/02004490, bkp: , desc: Insert item, node:
1663/12945/16441 blkno: 1 isdata: F isleaf: T isdelete: T
...
rmgr: Gin len (rec/tot):   1376/  1408, tx:   1813,
lsn: 0/02A7EE90, prev 0/02A7E910, bkp: , desc: Insert item, node:
1663/12945/16441 blkno: 2 isdata: F isleaf: T isdelete: T
rmgr: Gin len (rec/tot):   1392/  1424, tx:   1813,
lsn: 0/02A7F458, prev 0/02A7F410, bkp: , desc: Create posting
tree, node: 1663/12945/16441 blkno: 4

Then the size decreases to about 100B and is gradually increasing
again up to 320B.

rmgr: Gin len (rec/tot):116/   148, tx:   1813,
lsn: 0/02A7F9E8, prev 0/02A7F458, bkp: , desc: Insert item, node:
1663/12945/16441 blkno: 4 isdata: T isleaf: T unmodified: 1280 length:
1372 (compressed)
rmgr: Gin len (rec/tot): 40/72, tx:   1813,
lsn: 0/02A7FA80, prev 0/02A7F9E8, bkp: , desc: Insert item, node:
1663/12945/16441 blkno: 3 isdata: F isleaf: T isdelete: T
...
rmgr: Gin len (rec/tot):118/   150, tx:   1813,
lsn: 0/02A83BA0, prev 0/02A83B58, bkp: , desc: Insert item, node:
1663/12945/16441 blkno: 4 isdata: T isleaf: T unmodified: 1280 length:
1374 (compressed)
...
rmgr: Gin len (rec/tot):288/   320, tx:   1813,
lsn: 0/02AEDE28, prev 0/02AEDCE8, bkp: , desc: Insert item, node:
1663/12945/16441 blkno: 14 isdata: T isleaf: T unmodified: 1280
length: 1544 (compressed)

Then the size decreases to 66B and is gradually increasing again up to 320B.
This increase and decrease of WAL size seems to continue.

[In 9.3]
At first, the size of GIN-related WAL is gradually increasing up to about 2700B.

rmgr: Gin len (rec/tot): 52/84, tx:   1812,
lsn: 0/02000430, prev 0/020003D8, bkp: , desc: Insert item, node:
1663/12896/16441 blkno: 1 offset: 11 nitem: 1 isdata: F isleaf T
isdelete F updateBlkno:4294967295
rmgr: Gin len (rec/tot): 60/92, tx:   1812,
lsn: 0/020004D0, prev 0/02000488, bkp: , desc: Insert item, node:
1663/12896/16441 blkno: 1 offset: 1 nitem: 1 isdata: F isleaf T
isdelete T updateBlkno:4294967295
...
rmgr: Gin len (rec/tot):   2740/  2772, tx:   1812,
lsn: 0/026D1670, prev 0/026D0B98, bkp: , desc: Insert item, node:
1663/12896/16441 blkno: 5 offset: 2 nitem: 1 isdata: F isleaf T
isdelete T updateBlkno:4294967295
rmgr: Gin len (rec/tot):   2714/  2746, tx:   1812,
lsn: 0/026D21A8, prev 0/026D2160, bkp: , desc: Create posting
tree, node: 1663/12896/16441 blkno: 6

The size decreases to 66B and then is never changed.

rmgr: Gin len (rec/tot):

Re: [HACKERS] gaussian distribution pgbench

KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp writes:
 (2014/03/17 18:02), Heikki Linnakangas wrote:
 On 03/17/2014 10:40 AM, KONDO Mitsumasa wrote:
 There is an infinite number of variants of the TPC-B test that we could 
 include
 in pgbench. If we start adding every one of them, we're quickly going to have
 hundreds of options to choose the workload. I'd like to keep pgbench simple.
 These two new test variants, gaussian and exponential, are not that special 
 that
 they'd deserve to be included in the program itself.

 Well, I add only two options, and they are major distribution that are seen 
 in 
 real database system than uniform distiribution. I'm afraid, I think you are 
 too 
 worried and it will not be added hundreds of options. And pgbench is still 
 simple.

FWIW, I concur with Heikki on this.  Adding new versions of \setrandom is
useful functionality.  Embedding them in the standard test is not,
because that just makes it (even) less standard.  And pgbench has too darn
many switches already.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-17 Thread Mitsumasa KONDO

2014-03-17 21:12 GMT+09:00 Fujii Masao masao.fu...@gmail.com:

 On Mon, Mar 17, 2014 at 10:20 AM, Robert Haas robertmh...@gmail.com
 wrote:
  On Sun, Mar 16, 2014 at 6:23 AM, MauMau maumau...@gmail.com wrote:
  The PostgreSQL documentation describes cp (on UNIX/Linux) or copy (on
  Windows) as an example for archive_command.  However, cp/copy does not
 sync
  the copied data to disk.  As a result, the completed WAL segments would
 be
  lost in the following sequence:
 
  1. A WAL segment fills up.
 
  2. The archiver process archives the just filled WAL segment using
  archive_command.  That is, cp/copy reads the WAL segment file from
 pg_xlog/
  and writes to the archive area.  At this point, the WAL file is not
  persisted to the archive area yet, because cp/copy doesn't sync the
 writes.
 
  3. The checkpoint processing removes the WAL segment file from pg_xlog/.
 
  4. The OS crashes.  The filled WAL segment doesn't exist anywhere any
 more.
 
  Considering the reliable image of PostgreSQL and widespread use in
  enterprise systems, I think something should be done.  Could you give me
  your opinions on the right direction?  Although the doc certainly
 escapes by
  saying (This is an example, not a recommendation, and might not work
 on all
  platforms.), it seems from pgsql-xxx MLs that many people are following
  this example.
 
  * Improve the example in the documentation.
  But what command can we use to reliably sync just one file?
 
  * Provide some command, say pg_copy, which copies a file synchronously
 by
  using fsync(), and describes in the doc something like for simple use
  cases, you can use pg_copy as the standard reliable copy command.
 
  +1.  This won't obviate the need for tools to manage replication, but
  it would make it possible to get the simplest case right without
  guessing.

 +1, too.

 And, what about making pg_copy call posix_fadvise(DONT_NEED) against the
 archived file after the copy? Also It might be good idea to support the
 direct
 copy of the file to avoid wasting the file cache.

Use direct_cp.
http://directcp.sourceforge.net/direct_cp.html

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2014-03-17 Thread Alvaro Herrera

Fujii Masao escribió:
 On Sun, Mar 16, 2014 at 7:15 AM, Alexander Korotkov
 aekorot...@gmail.com wrote:

  That could be optimized, but I figured we can live with it, thanks to the
  fastupdate feature. Fastupdate allows amortizing that cost over several
  insertions. But of course, you explicitly disabled that...
 
  Let me know if you want me to write patch addressing this issue.
 
 Yeah, I really want you to address this problem! That's definitely useful
 for every users disabling FASTUPDATE option for some reasons.

Users that disable FASTUPDATE, in my experience, do so because their
stock work_mem is way too high and GIN searches become too slow due to
having to scan too large a list.  I think it might make sense to invest
a modest amount of time in getting FASTUPDATE to be sized completely
differently from today -- perhaps base it on a hardcoded factor of
BLCKSZ, rather than work_mem.  Or, if we really need to make it
configurable, then let it have its own parameter.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump without explicit table locking

Pavel Stehule pavel.steh...@gmail.com writes:
 2014-03-17 12:52 GMT+01:00 JÃ¼rgen Strobel juergen...@strobel.info:
 I've googled the problem and there seem to be more people with similar
 problems, so I made this a command line option --no-table-locks and
 wrapped it up in as nice a patch against github/master as I can manage
 (and I didn't use C for a long time). I hope you find it useful.

 Joe Conway sent me a tip so commit eeb6f37d89fc60c6449ca12ef9e914
 91069369cb significantly decrease a time necessary for locking. So it can
 help to.

Indeed.  I think there's zero chance that we'd accept the patch as
proposed.  If there's still a performance problem in HEAD, we'd look
for some other way to improve matters more.

(Note that this is only one of assorted O(N^2) behaviors in older versions
of pg_dump; we've gradually stamped them out over time.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Archive recovery won't be completed on some situation.


On 03/15/2014 05:59 PM, Fujii Masao wrote:

What about adding new option into pg_resetxlog so that we can
reset the pg_control's backup start location? Even after we've
accidentally entered into the situation that you described, we can
exit from that by resetting the backup start location in pg_control.
Also this option seems helpful to salvage the data as a last resort
from the corrupted backup.


Yeah, seems reasonable. After you run pg_resetxlog, there's no hope that 
the backup end record would arrive any time later. And if it does, it 
won't really do much good after you've reset the WAL.


We probably should just clear out the backup start/stop location always 
when you run pg_resetxlog. Your database is potentially broken if you 
reset the WAL before reaching consistency, but if forcibly do that with 
pg_resetxlog -f, you've been warned.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: show relation and tuple infos of a lock to acquire

2014-03-17 Thread Amit Kapila

On Mon, Mar 17, 2014 at 4:20 PM, Christian Kruse
christ...@2ndquadrant.com wrote:
 Hi Amit,

 I've been ill the last few days, so sorry for my late response.

   Sorry to hear and no problem for delay.

 I don't think that this fixes the translation guideline issues brought
 up by Robert. This produces differing strings for the different cases
 as well and it also passes in altering data directly to the error
 string.

 It also may produce error messages that are really weird. You
 initialize the string with while attempting to . The remaining part
 of the function is covered by if()s which may lead to error messages
 like this:

 while attempting to 
 while attempting to in relation public.xyz of database abc
 while attempting to in database abc

 Although this may not be very likely (because
 ItemPointerIsValid((info-ctid))) should in this case not return
 false).

Yes, this is good point and even I have thought of it before sending
the patch. The reason why it seems not good to duplicate the
message is that I am not aware even of single case where
tuple info (chid, relinfo) will not be available by the time it will come
to wait on tuple in new call (XactLockTableWaitExtended), do you think
there exists any such case or it's just based on apprehension.
If there exists any such, then isn't it better to use earlier version of
API for that call site as is case we have left for
SnapBuildFindSnapshot?

If it's just apprehension, then I think it's better to have a check such
that if any of the tuple info is missing then don't add context and in
future if we have any such usecase then we can have multiple
messages.

 Attached you will find a new version of this patch; it slightly
 violates the translation guidelines as well: it assembles an error
 string (but it doesn't pass in altering data like ctid or things like
 that). I simply couldn't think of a nice solution without doing so,
 and after looking through the code there are a few cases
 (e.g. CheckTableNotInUse()) where this is done, too.

I could not see any problem in the modifications done by you,
but if the function is generic enough that it doesn't assume what
caller has set info, then why to assume that opr_name will always
be filled.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index


On 03/17/2014 03:20 PM, Fujii Masao wrote:

On Sun, Mar 16, 2014 at 7:15 AM, Alexander Korotkov
aekorot...@gmail.com wrote:

On Sat, Mar 15, 2014 at 11:27 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:



I ran pg_xlogdump | grep Gin and checked the size of GIN-related WAL,
and then found its max seems more than 256B. Am I missing something?

What I observed is

[In HEAD]
At first, the size of GIN-related WAL is gradually increasing up to about 1400B.
 rmgr: Gin len (rec/tot): 48/80, tx:   1813,
lsn: 0/020020D8, prev 0/0270, bkp: , desc: Insert item, node:
1663/12945/16441 blkno: 1 isdata: F isleaf: T isdelete: F
 rmgr: Gin len (rec/tot): 56/88, tx:   1813,
lsn: 0/02002440, prev 0/020023F8, bkp: , desc: Insert item, node:
1663/12945/16441 blkno: 1 isdata: F isleaf: T isdelete: T
 rmgr: Gin len (rec/tot): 64/96, tx:   1813,
lsn: 0/020044D8, prev 0/02004490, bkp: , desc: Insert item, node:
1663/12945/16441 blkno: 1 isdata: F isleaf: T isdelete: T
 ...
 rmgr: Gin len (rec/tot):   1376/  1408, tx:   1813,
lsn: 0/02A7EE90, prev 0/02A7E910, bkp: , desc: Insert item, node:
1663/12945/16441 blkno: 2 isdata: F isleaf: T isdelete: T
 rmgr: Gin len (rec/tot):   1392/  1424, tx:   1813,
lsn: 0/02A7F458, prev 0/02A7F410, bkp: , desc: Create posting
tree, node: 1663/12945/16441 blkno: 4


This corresponds to the stage where the items are stored in-line in the 
entry-tree. After it reaches a certain size, a posting tree is created.



Then the size decreases to about 100B and is gradually increasing
again up to 320B.

 rmgr: Gin len (rec/tot):116/   148, tx:   1813,
lsn: 0/02A7F9E8, prev 0/02A7F458, bkp: , desc: Insert item, node:
1663/12945/16441 blkno: 4 isdata: T isleaf: T unmodified: 1280 length:
1372 (compressed)
 rmgr: Gin len (rec/tot): 40/72, tx:   1813,
lsn: 0/02A7FA80, prev 0/02A7F9E8, bkp: , desc: Insert item, node:
1663/12945/16441 blkno: 3 isdata: F isleaf: T isdelete: T
 ...
 rmgr: Gin len (rec/tot):118/   150, tx:   1813,
lsn: 0/02A83BA0, prev 0/02A83B58, bkp: , desc: Insert item, node:
1663/12945/16441 blkno: 4 isdata: T isleaf: T unmodified: 1280 length:
1374 (compressed)
 ...
 rmgr: Gin len (rec/tot):288/   320, tx:   1813,
lsn: 0/02AEDE28, prev 0/02AEDCE8, bkp: , desc: Insert item, node:
1663/12945/16441 blkno: 14 isdata: T isleaf: T unmodified: 1280
length: 1544 (compressed)

Then the size decreases to 66B and is gradually increasing again up to 320B.
This increase and decrease of WAL size seems to continue.


Here the new items are appended to posting tree pages. This is where the 
maximum of 256 bytes I mentioned applies. 256 bytes is the max size of 
one compressed posting list, the WAL record containing it includes some 
other stuff too, which adds up to that 320 bytes.



[In 9.3]
At first, the size of GIN-related WAL is gradually increasing up to about 2700B.

 rmgr: Gin len (rec/tot): 52/84, tx:   1812,
lsn: 0/02000430, prev 0/020003D8, bkp: , desc: Insert item, node:
1663/12896/16441 blkno: 1 offset: 11 nitem: 1 isdata: F isleaf T
isdelete F updateBlkno:4294967295
 rmgr: Gin len (rec/tot): 60/92, tx:   1812,
lsn: 0/020004D0, prev 0/02000488, bkp: , desc: Insert item, node:
1663/12896/16441 blkno: 1 offset: 1 nitem: 1 isdata: F isleaf T
isdelete T updateBlkno:4294967295
 ...
 rmgr: Gin len (rec/tot):   2740/  2772, tx:   1812,
lsn: 0/026D1670, prev 0/026D0B98, bkp: , desc: Insert item, node:
1663/12896/16441 blkno: 5 offset: 2 nitem: 1 isdata: F isleaf T
isdelete T updateBlkno:4294967295
 rmgr: Gin len (rec/tot):   2714/  2746, tx:   1812,
lsn: 0/026D21A8, prev 0/026D2160, bkp: , desc: Create posting
tree, node: 1663/12896/16441 blkno: 6

The size decreases to 66B and then is never changed.


Same mechanism on 9.3, but the insertions to the posting tree pages are 
constant size.



That could be optimized, but I figured we can live with it, thanks to the
fastupdate feature. Fastupdate allows amortizing that cost over several
insertions. But of course, you explicitly disabled that...


Let me know if you want me to write patch addressing this issue.


Yeah, I really want you to address this problem! That's definitely useful
for every users disabling FASTUPDATE option for some reasons.


Ok, let's think about it a little bit. I think there are three fairly 
simple ways to address this:


1. The GIN data leaf recompress record contains an offset called 
unmodifiedlength, and the data that comes after that offset. 
Currently, the record is written so that unmodifiedlength points to the 
end of the last compressed posting list stored on the page that was not 
modified, followed by all the modified ones. The straightforward way to 
cut down the WAL record

Re: [HACKERS] gaussian distribution pgbench

On Sat, Mar 15, 2014 at 4:50 AM, Mitsumasa KONDO
kondo.mitsum...@gmail.com wrote:
 There are explanations and computations as comments in the code. If it is
 about the documentation, I'm not sure that a very precise mathematical
 definition will help a lot of people, and might rather hinder understanding,
 so the doc focuses on an intuitive explanation instead.

 Yeah, I think that we had better to only explain necessary infomation for
 using this feature. If we add mathematical theory in docs, it will be too
 difficult for user.  And it's waste.

Well, if you *don't* include at least *some* mathematical description
of what the feature does in the documentation, then users who need to
understand it will have to read the source code to figure it out,
which is going to be even more difficult.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

Heikki Linnakangas hlinnakan...@vmware.com writes:
 2. Instead of storing the new compressed posting list in the WAL record, 
 store only the new item pointers added to the page. WAL replay would 
 then have to duplicate the work done in the main insertion code path: 
 find the right posting lists to insert to, decode them, add the new 
 items, and re-encode.

That sounds fairly dangerous ... is any user-defined code involved in
those decisions?

 This record format would be higher-level, in the sense that we would not 
 store the physical copy of the compressed posting list as it was formed 
 originally. The same work would be done at WAL replay. As the code 
 stands, it will produce exactly the same result, but that's not 
 guaranteed if we make bugfixes to the code later, and a master and 
 standby are running different minor version. There's not necessarily 
 anything wrong with that, but it's something to keep in mind.

Version skew would be a hazard too, all right.  I think it's important
that WAL replay be a pretty mechanical, predictable process.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions

On Mon, Mar 10, 2014 at 11:48 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Mar 7, 2014 at 2:40 PM, Robert Haas robertmh...@gmail.com wrote:
 Hmm.  So the problematic sequence of events is where a postmaster
 child forks, and then exits without exec-ing, perhaps because e.g.
 exec fails?

 I've attempted a fix for this case.  The attached patch makes
 test_shm_mq fork() a child process that calls on_exit_reset() and then
 exits.  Without the fix I just pushed, this causes the tests to fail;
 with this fix, this does not cause the tests to fail.

 I'm not entirely sure that this is exactly right, but I think it's an
 improvement.

 This looks generally sane to me, but I wonder whether you should also
 teach PGSharedMemoryDetach() to physically detach from DSM segments
 not just the main shmem seg.

I looked at this a little more.  It seems dsm_backend_shutdown()
already does almost the right thing; it fires all the on-detach
callbacks and unmaps the corresponding segments.  It does NOT unmap
the control segment, though, and processes that are unmapping the main
shared memory segment for safety should really be getting rid of the
control segment too (even though the consequences of corrupting the
control segment are much less bad).

One option is to just change that function to also unmap the control
segment, and maybe rename it to dsm_detach_all(), and then use that
everywhere.  The problem is that I'm not sure we really want to incur
the overhead of an extra munmap() during every backend exit, just to
get rid of a control segment which was going to be unmapped anyway by
process termination.  For that matter, I'm not sure why we bother
arranging that for the main shared memory segment, either: surely
whatever function the shmdt() and munmap() calls in IpcMemoryDetach
may have will be equally well-served by the forthcoming exit()?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index


On 03/17/2014 04:33 PM, Tom Lane wrote:

Heikki Linnakangas hlinnakan...@vmware.com writes:

2. Instead of storing the new compressed posting list in the WAL record,
store only the new item pointers added to the page. WAL replay would
then have to duplicate the work done in the main insertion code path:
find the right posting lists to insert to, decode them, add the new
items, and re-encode.


That sounds fairly dangerous ... is any user-defined code involved in
those decisions?


No.


This record format would be higher-level, in the sense that we would not
store the physical copy of the compressed posting list as it was formed
originally. The same work would be done at WAL replay. As the code
stands, it will produce exactly the same result, but that's not
guaranteed if we make bugfixes to the code later, and a master and
standby are running different minor version. There's not necessarily
anything wrong with that, but it's something to keep in mind.


Version skew would be a hazard too, all right.  I think it's important
that WAL replay be a pretty mechanical, predictable process.


Yeah. One particular point to note is that if in one place we do the 
more high level thing and have WAL replay re-encode the page as it 
sees fit, then we can *not* rely on the page being byte-by-byte 
identical in other places. Like, in vacuum, where items are deleted.


Heap and B-tree WAL records also rely on PageAddItem etc. to reconstruct 
the page, instead of making a physical copy of the modified parts. And 
_bt_restore_page even inserts the items physically in different order 
than the normal codepath does. So for good or bad, there is some 
precedence for this.


The imminent danger I see is if we change the logic on how the items are 
divided into posting lists, and end up in a situation where a master 
server adds an item to a page, and it just fits, but with the 
compression logic the standby version has, it cannot make it fit. As an 
escape hatch for that, we could have the WAL replay code try the 
compression again, with a larger max. posting list size, if it doesn't 
fit at first. And/or always leave something like 10 bytes of free space 
on every data page to make up for small differences in the logic.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

On Mon, Mar 17, 2014 at 10:54 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Heap and B-tree WAL records also rely on PageAddItem etc. to reconstruct the
 page, instead of making a physical copy of the modified parts. And
 _bt_restore_page even inserts the items physically in different order than
 the normal codepath does. So for good or bad, there is some precedence for
 this.

Yikes.

 The imminent danger I see is if we change the logic on how the items are
 divided into posting lists, and end up in a situation where a master server
 adds an item to a page, and it just fits, but with the compression logic the
 standby version has, it cannot make it fit. As an escape hatch for that, we
 could have the WAL replay code try the compression again, with a larger max.
 posting list size, if it doesn't fit at first. And/or always leave something
 like 10 bytes of free space on every data page to make up for small
 differences in the logic.

That scares the crap out of me.  I don't see any intrinsic problem
with relying on the existence page contents to figure out how to roll
forward, as PageAddItem does; after all, we do FPIs precisely so that
the page is in a known good state when we start.  However, I really
think we ought to try hard to make this deterministic in terms of what
the resulting state of the page is; anything else seems like it's
playing with fire, and I bet we'll get burned sooner rather than
later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] First-draft release notes for next week's releases

Greg Stark st...@mit.edu writes:
 This is not really accurate:
 This error allowed multiple versions of the same row to become
 visible to queries, resulting in apparent duplicates. Since the error
 is in WAL replay, it would only manifest during crash recovery or on
 standby servers.

 I think the idea is coming from what the second sentence below is
 getting at but it may be too complex to explain in a release note:

 The error causes some rows to disappear from indexes resulting in
 inconsistent query results on a hot standby depending on whether
 indexes are used. If the standby is subsequently activated or if it
 occurs during recovery after a crash or backup restore it could result
 in unique constraint violations as well.

Hm ... rows disappearing from indexes might make people think that
they could fix or mitigate the damage via REINDEX.  That's not really
true though is it?  It looks to me like IndexBuildHeapScan will suffer
an Assert failure in an assert-enabled build, or build a bogus index
if not assert-enabled, when it comes across a heap-only tuple that
has no parent.

I'm thinking we'd better promote that Assert to a normal runtime elog.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions

Robert Haas robertmh...@gmail.com writes:
 One option is to just change that function to also unmap the control
 segment, and maybe rename it to dsm_detach_all(), and then use that
 everywhere.  The problem is that I'm not sure we really want to incur
 the overhead of an extra munmap() during every backend exit, just to
 get rid of a control segment which was going to be unmapped anyway by
 process termination.  For that matter, I'm not sure why we bother
 arranging that for the main shared memory segment, either: surely
 whatever function the shmdt() and munmap() calls in IpcMemoryDetach
 may have will be equally well-served by the forthcoming exit()?

Before you lobotomize that code too much, consider the postmaster
crash-recovery case.  That path does need to detach from the old
shmem segment.

Also, I might be wrong, but I think IpcMemoryDetach is a *postmaster*
on_shmem_exit routine; it isn't called during backend exit.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 17, 2014 at 10:54 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 Heap and B-tree WAL records also rely on PageAddItem etc. to reconstruct the
 page, instead of making a physical copy of the modified parts. And
 _bt_restore_page even inserts the items physically in different order than
 the normal codepath does. So for good or bad, there is some precedence for
 this.

 Yikes.

Yeah.  I think it's arguably a bug that _bt_restore_page works like that,
even though we've not been burnt up to now.

 The imminent danger I see is if we change the logic on how the items are
 divided into posting lists, and end up in a situation where a master server
 adds an item to a page, and it just fits, but with the compression logic the
 standby version has, it cannot make it fit. As an escape hatch for that, we
 could have the WAL replay code try the compression again, with a larger max.
 posting list size, if it doesn't fit at first. And/or always leave something
 like 10 bytes of free space on every data page to make up for small
 differences in the logic.

 That scares the crap out of me.

Likewise.  Saving some WAL space is *not* worth this kind of risk.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Planner hints in Postgresql

2014-03-17 Thread Rajmohan C

I am implementing Planner hints in Postgresql to force the optimizer to
select a particular plan for a query on request from sql input. I am having
trouble in modifying the planner code. I want to create a path node of hint
plan and make it the plan to be used by executor. How do I enforce this ?
Should I create a new Plan for this ..how to create a plan node which can
be then given directly to executor for a particular query?

Re: [HACKERS] Planner hints in Postgresql

On Mon, Mar 17, 2014 at 9:22 PM, Rajmohan C csrajmo...@gmail.com wrote:

 I am implementing Planner hints in Postgresql to force the optimizer to
 select a particular plan for a query on request from sql input. I am having
 trouble in modifying the planner code. I want to create a path node of hint
 plan and make it the plan to be used by executor. How do I enforce this ?
 Should I create a new Plan for this ..how to create a plan node which can
 be then given directly to executor for a particular query?




Planner hints have been discussed a lot before as well and AFAIK there is a
wiki page that says why we shouldnt implement them. Have you referred to
them?

Please share if you have any new points on the same.

Regards,

Atri

Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index


On 03/17/2014 05:35 PM, Tom Lane wrote:

Robert Haas robertmh...@gmail.com writes:

On Mon, Mar 17, 2014 at 10:54 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

The imminent danger I see is if we change the logic on how the items are
divided into posting lists, and end up in a situation where a master server
adds an item to a page, and it just fits, but with the compression logic the
standby version has, it cannot make it fit. As an escape hatch for that, we
could have the WAL replay code try the compression again, with a larger max.
posting list size, if it doesn't fit at first. And/or always leave something
like 10 bytes of free space on every data page to make up for small
differences in the logic.



That scares the crap out of me.


Likewise.  Saving some WAL space is *not* worth this kind of risk.


One fairly good compromise would be to only include the new items, not 
the whole modified compression lists, and let the replay logic do the 
re-encoding of the posting lists. But also include the cutoff points of 
each posting list in the WAL record. That way the replay code would have 
no freedom in how it decides to split the items into compressed lists, 
that would be fully specified by the WAL record.


Here's a refresher for those who didn't follow the development of the 
new page format: The data page basically contains a list of 
ItemPointers. The items are compressed, to save disk space. However, to 
make random access faster, all the items on the page are not compressed 
as one big list. Instead, the big array of items is split into roughly 
equal chunks, and each chunk is compressed separately. The chunks are 
stored on the page one after each other. (The chunks are called posting 
lists in the code, the struct is called GinPostingListData)


The compression is completely deterministic (each item is stored as a 
varbyte-encoded delta from the previous item), but there are no hard 
rules on how the items on the page ought to be divided into the posting 
lists. Currently, the code tries to maintain a max size of 256 bytes per 
list - but it will cope with any size it finds on disk. This is where 
the danger lies, where we could end up with a different physical page 
after WAL replay, if we just include the new items in the WAL record. 
The WAL replay might decide to split the items into posting lists 
differently than was originally done. (as the code stands, it would 
always make the same decision, completely deterministically, but that 
might change in a minor version if we're not careful)


We can tie WAL replay's hands about that, if we include a list of items 
that form the posting lists in the WAL record. That adds some bloat, 
compared to only including the new items, but not too much. (and we 
still only need do that for posting lists following the first modified one.)


Alexander, would you like to give that a shot, or will I?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Portability issues in shm_mq

On Sun, Mar 16, 2014 at 10:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 But I think there's another possible problem here.  In order for reads
 from the buffer not to suffer alignment problems, the chunk size for
 reads and writes from the buffer needs to be MAXIMUM_ALIGNOF (or some
 multiple of it).  And in order to avoid a great deal of additional and
 unwarranted complexity, the size of the message word also needs to be
 MAXIMUM_ALIGNOF (or some multiple of it).  So the message word can
 only be of size 4 if MAXIMUM_ALIGNOF is also 4.  IOW, I think your
 approach is going to run into trouble on any system where
 sizeof(Size)==4 but MAXIMUM_ALIGNOF==8.

 Well, it will result in padding space when you maxalign the length word,
 but I don't see why it wouldn't work; and it would certainly be no less
 efficient than what's there today.

Well, the problem is with this:

/* Write the message length into the buffer. */
if (!mqh-mqh_did_length_word)
{
res = shm_mq_send_bytes(mqh, sizeof(uint64), nbytes, nowait,
bytes_written);

If I change nbytes to be of type Size, and the second argument to
sizeof(Size), then it's wrong whenever sizeof(Size) % MAXIMUM_ALIGNOF
!= 0.  I could do something like:

union
{
char pad[MAXIMUM_ALIGNOF];
Size val;
} padded_size;

padded_size.val = nbytes;
res = shm_mq_send_bytes(mqh, sizeof(padded_size), padded_size,
nowait, bytes_written);

...but that probably *is* less efficient, and it's certainly a lot
uglier; a similar hack will be needed when extracting the length word,
and assertions elsewhere will need adjustment.  I wonder if it
wouldn't be better to adjust the external API to use Size just for
consistency but, internally to the module, keep using 8 byte sizes
within the buffer.  Really, I think that would boil down to going
through and making sure that we use TYPEALIGN(...,sizeof(uint64))
everywhere instead of MAXALIGN(), which doesn't seem like a big deal.
On the third hand, maybe there are or will be platforms out there
where MAXIMUM_ALIGNOF  8.  If so, it's probably best to bite the
bullet and allow for padding now, so we don't have to monkey with this
again later.

Sorry for belaboring this point, but I want to make sure I only need
to fix this once.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Portability issues in shm_mq

Robert Haas robertmh...@gmail.com writes:
 On Sun, Mar 16, 2014 at 10:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, it will result in padding space when you maxalign the length word,
 but I don't see why it wouldn't work; and it would certainly be no less
 efficient than what's there today.

 Well, the problem is with this:

 /* Write the message length into the buffer. */
 if (!mqh-mqh_did_length_word)
 {
 res = shm_mq_send_bytes(mqh, sizeof(uint64), nbytes, nowait,
 bytes_written);

 If I change nbytes to be of type Size, and the second argument to
 sizeof(Size), then it's wrong whenever sizeof(Size) % MAXIMUM_ALIGNOF
 != 0.

Well, you need to maxalign the number of bytes physically inserted into
the queue.  Doesn't shm_mq_send_bytes do that?  Where do you do the
maxaligning of the message payload data, when the payload is odd-length?
I would have expected some logic like copy N bytes but then advance
the pointer by maxalign(N).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread David Johnston

Atri Sharma wrote
On Mon, Mar 17, 2014 at 9:22 PM, Rajmohan C lt;

csrajmohan@

gt; wrote:

I am implementing Planner hints in Postgresql to force the optimizer to
select a particular plan for a query on request from sql input. I am
having
trouble in modifying the planner code. I want to create a path node of
hint
plan and make it the plan to be used by executor. How do I enforce this ?
Should I create a new Plan for this ..how to create a plan node which can
be then given directly to executor for a particular query?

Planner hints have been discussed a lot before as well and AFAIK there is
a
wiki page that says why we shouldnt implement them. Have you referred to
them?

Please share if you have any new points on the same.

Regards,

Atri

http://wiki.postgresql.org/wiki/Todo

(I got to it via the FAQ link on the homepage and the Developer FAQ
section there-in. You should make sure you've scanned that as well.)

Note the final section titled: Features We Do Not Want

Also, you need to consider what you are doing when you cross-post (a bad
thing generally) -hackers and -novice. As there is, rightly IMO, no
-novice-hackers list you should have probably just hit up -general.

Need to discuss the general why before any meaningful help on the how is
going to be considered by hackers.

David J.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Planner-hints-in-Postgresql-tp5796347p5796353.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Planner hints in Postgresql

David Johnston pol...@yahoo.com writes:
 Need to discuss the general why before any meaningful help on the how is
 going to be considered by hackers.

Possibly worth noting is that in past discussions, we've concluded that
the most sensible type of hint would not be use this plan at all, but
here's what to assume about the selectivity of this WHERE clause.
That seems considerably less likely to break than any attempt to directly
specify plan details.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] warning when compiling utils/tqual.h

2014-03-17 Thread Alvaro Herrera

I noticed (by running cd src/include ; make check with the attached
patch applied) that since commit b89e151054 (Introduce logical
decoding.) tqual.h now emits this warning when compiled standalone:

/pgsql/source/HEAD/src/include/utils/tqual.h:101:13: warning: ‘struct HTAB’ 
declared inside parameter list [enabled by default]
/pgsql/source/HEAD/src/include/utils/tqual.h:101:13: warning: its scope is only 
this definition or declaration, which is probably not what you want [enabled by 
default]

The prototype in question is:

/*
 * To avoid leaking to much knowledge about reorderbuffer implementation
 * details this is implemented in reorderbuffer.c not tqual.c.
 */
extern bool ResolveCminCmaxDuringDecoding(HTAB *tuplecid_data,
  Snapshot snapshot,
  HeapTuple htup,
  Buffer buffer,
  CommandId *cmin, CommandId *cmax);

Now there are two easy ways to silence this:

1. add struct HTAB; to the previous line in tqual.h, i.e. a forward
declaration.  We have very few of these; most of the time ISTM we prefer
a solution like the next one:

2. add #include utils/hsearch.h at the top of tqual.h.

Now my inclination is to use the second one, because it seems cleaner to
me and avoid having a forward struct decl in an unrelated header, which
will later bleed into other source files.  Do we have a project
guideline saying that we prefer option two, and does that explain why we
have so few declarations as in option one?


There is of course a third choice which is to dictate that this function
ought to be declared in reorderbuffer.h; but that would have the
unpleasant side-effect that tqual.c would need to #include that.

Opinions please?  I would like to have a guideline about this so I can
reason in a principled way in future cases in which this kind of
question arises.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] warning when compiling utils/tqual.h

Hi,


On 2014-03-17 13:40:53 -0300, Alvaro Herrera wrote:
 I noticed (by running cd src/include ; make check with the attached
 patch applied) that since commit b89e151054 (Introduce logical
 decoding.) tqual.h now emits this warning when compiled standalone:

I think we should add such a check (refined to not rely on precompiled
headers) to check-world or something...

 1. add struct HTAB; to the previous line in tqual.h, i.e. a forward
 declaration.  We have very few of these; most of the time ISTM we prefer
 a solution like the next one:

That's what I am for.

We don't have *that* few of those. There's also several cases where
struct pointers are members in other structs. There you don't even need
a forward declaration (like e.g. struct HTAB* in PlannerInfo).

 There is of course a third choice which is to dictate that this function
 ought to be declared in reorderbuffer.h; but that would have the
 unpleasant side-effect that tqual.c would need to #include that.

I am pretty clearly against this.

 Opinions please?  I would like to have a guideline about this so I can
 reason in a principled way in future cases in which this kind of
 question arises.

I'd welcome a clearly stated policy as well. I think forward
declarations are sensible tool if used sparingly, to decouple things
that are primarily related on the implementation level.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] warning when compiling utils/tqual.h

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 I noticed (by running cd src/include ; make check with the attached
 patch applied) that since commit b89e151054 (Introduce logical
 decoding.) tqual.h now emits this warning when compiled standalone:

 /pgsql/source/HEAD/src/include/utils/tqual.h:101:13: warning: âstruct 
 HTABâ declared inside parameter list [enabled by default]
 /pgsql/source/HEAD/src/include/utils/tqual.h:101:13: warning: its scope is 
 only this definition or declaration, which is probably not what you want 
 [enabled by default]

 The prototype in question is:

 /*
  * To avoid leaking to much knowledge about reorderbuffer implementation
  * details this is implemented in reorderbuffer.c not tqual.c.
  */
 extern bool ResolveCminCmaxDuringDecoding(HTAB *tuplecid_data,
   Snapshot snapshot,
   HeapTuple htup,
   Buffer buffer,
   CommandId *cmin, CommandId *cmax);

I guess the real question is why such a prototype is in tqual.h in
the first place.  ISTM this should be pushed somewhere specific to
reorderbuffer.c.  I'm -1 on having struct HTAB bleed into tqual.h
via either of the methods you suggest.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] warning when compiling utils/tqual.h

On 2014-03-17 12:50:37 -0400, Tom Lane wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
  I noticed (by running cd src/include ; make check with the attached
  patch applied) that since commit b89e151054 (Introduce logical
  decoding.) tqual.h now emits this warning when compiled standalone:
 
  /pgsql/source/HEAD/src/include/utils/tqual.h:101:13: warning: ‘struct HTAB’ 
  declared inside parameter list [enabled by default]
  /pgsql/source/HEAD/src/include/utils/tqual.h:101:13: warning: its scope is 
  only this definition or declaration, which is probably not what you want 
  [enabled by default]
 
  The prototype in question is:
 
  /*
   * To avoid leaking to much knowledge about reorderbuffer implementation
   * details this is implemented in reorderbuffer.c not tqual.c.
   */
  extern bool ResolveCminCmaxDuringDecoding(HTAB *tuplecid_data,
Snapshot snapshot,
HeapTuple htup,
Buffer buffer,
CommandId *cmin, CommandId *cmax);
 
 I guess the real question is why such a prototype is in tqual.h in
 the first place.  ISTM this should be pushed somewhere specific to
 reorderbuffer.c.  I'm -1 on having struct HTAB bleed into tqual.h
 via either of the methods you suggest.

It's the only logical decoding specific routine that's called by tqual.c
which doesn't need anything else from reorderbuffer. I'd like to avoid
more stuff bleeding in there...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] warning when compiling utils/tqual.h

Andres Freund and...@2ndquadrant.com writes:
 On 2014-03-17 13:40:53 -0300, Alvaro Herrera wrote:
 There is of course a third choice which is to dictate that this function
 ought to be declared in reorderbuffer.h; but that would have the
 unpleasant side-effect that tqual.c would need to #include that.

 I am pretty clearly against this.

Let me get this straight.  reorderbuffer.c exports a function that needs
to be used by tqual.c.  The obvious method to do this is to declare the
function in reorderbuffer.h and have tqual.c #include that.  Apparently
you think it's better to have tqual.h declare the function.  How is that
not 100% backwards?  Even worse that it requires more header-inclusion
bloat for some functionality entirely unrelated to snapshots?

That sounds borderline insane from here.  You need a whole lot more
justification than I'm against it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] warning when compiling utils/tqual.h

Andres Freund and...@2ndquadrant.com writes:
 On 2014-03-17 12:50:37 -0400, Tom Lane wrote:
 I guess the real question is why such a prototype is in tqual.h in
 the first place.  ISTM this should be pushed somewhere specific to
 reorderbuffer.c.  I'm -1 on having struct HTAB bleed into tqual.h
 via either of the methods you suggest.

 It's the only logical decoding specific routine that's called by tqual.c
 which doesn't need anything else from reorderbuffer. I'd like to avoid
 more stuff bleeding in there...

Instead, you'd like HTAB to bleed into everything that uses tqual.h?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] warning when compiling utils/tqual.h

On 2014-03-17 12:57:15 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-03-17 12:50:37 -0400, Tom Lane wrote:
  I guess the real question is why such a prototype is in tqual.h in
  the first place.  ISTM this should be pushed somewhere specific to
  reorderbuffer.c.  I'm -1 on having struct HTAB bleed into tqual.h
  via either of the methods you suggest.
 
  It's the only logical decoding specific routine that's called by tqual.c
  which doesn't need anything else from reorderbuffer. I'd like to avoid
  more stuff bleeding in there...
 
 Instead, you'd like HTAB to bleed into everything that uses tqual.h?

Meh. a struct forward declaration isn't exactly a significant thing to
expose.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] warning when compiling utils/tqual.h

On 2014-03-17 12:56:12 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-03-17 13:40:53 -0300, Alvaro Herrera wrote:
  There is of course a third choice which is to dictate that this function
  ought to be declared in reorderbuffer.h; but that would have the
  unpleasant side-effect that tqual.c would need to #include that.
 
  I am pretty clearly against this.
 
 Let me get this straight.  reorderbuffer.c exports a function that needs
 to be used by tqual.c.  The obvious method to do this is to declare the
 function in reorderbuffer.h and have tqual.c #include that.  Apparently
 you think it's better to have tqual.h declare the function.  How is that
 not 100% backwards?  Even worse that it requires more header-inclusion
 bloat for some functionality entirely unrelated to snapshots?

I don't think cmin/cmax are entirely unrelated to tuple visibility. If
it didn't require exposing reorderbuffer.c internals, it's
implementation should have been in tqual.c.

And a struct HTAB; wouldn't require including a header.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Josh Berkus

On 03/17/2014 08:28 AM, Tom Lane wrote:
 Greg Stark st...@mit.edu writes:
 The error causes some rows to disappear from indexes resulting in
 inconsistent query results on a hot standby depending on whether
 indexes are used. If the standby is subsequently activated or if it
 occurs during recovery after a crash or backup restore it could result
 in unique constraint violations as well.
 
 Hm ... rows disappearing from indexes might make people think that
 they could fix or mitigate the damage via REINDEX.  That's not really
 true though is it?  It looks to me like IndexBuildHeapScan will suffer
 an Assert failure in an assert-enabled build, or build a bogus index
 if not assert-enabled, when it comes across a heap-only tuple that
 has no parent.

First, see suggested text in my first-draft release announcement.

Second, if a user has encountered this kind of data corruption on their
master (due to crash recovery), how exactly *do* they fix it?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] First-draft release notes for next week's releases

On 2014-03-17 10:03:52 -0700, Josh Berkus wrote:
 On 03/17/2014 08:28 AM, Tom Lane wrote:
  Greg Stark st...@mit.edu writes:
  The error causes some rows to disappear from indexes resulting in
  inconsistent query results on a hot standby depending on whether
  indexes are used. If the standby is subsequently activated or if it
  occurs during recovery after a crash or backup restore it could result
  in unique constraint violations as well.
  
  Hm ... rows disappearing from indexes might make people think that
  they could fix or mitigate the damage via REINDEX.  That's not really
  true though is it?  It looks to me like IndexBuildHeapScan will suffer
  an Assert failure in an assert-enabled build, or build a bogus index
  if not assert-enabled, when it comes across a heap-only tuple that
  has no parent.
 
 First, see suggested text in my first-draft release announcement.

I don't think that text is any better, it's imo even wrong:
The bug causes rows to vanish from indexes during recovery due to
simultaneous updates of rows on both sides of a foreign key.

Neither is a foreign key, nor simultaneous updates, nor both sides a
prerequisite.

 Second, if a user has encountered this kind of data corruption on their
 master (due to crash recovery), how exactly *do* they fix it?

Dump/restore is the most obvious candidate. The next best thing I can
think of is a noop rewriting ALTER TABLE, that doesn't deal with ctid
chains IIRC, in contrast to CLUSTER/VACUUM FULL.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Planner hints in Postgresql

On Mon, Mar 17, 2014 at 9:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 David Johnston pol...@yahoo.com writes:
  Need to discuss the general why before any meaningful help on the
 how is
  going to be considered by hackers.

 Possibly worth noting is that in past discussions, we've concluded that
 the most sensible type of hint would not be use this plan at all, but
 here's what to assume about the selectivity of this WHERE clause.
 That seems considerably less likely to break than any attempt to directly
 specify plan details.


Isnt using a user given value for selectivity a pretty risky situation as
it can horribly screw up the plan selection?

Why not allow the user to specify an alternate plan and have the planner
assign a higher preference to it during plan evaluation? This shall allow
us to still have a fair evaluation of all possible plans as we do right now
and yet have a higher preference for the user given plan during evaluation?

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions

On Mon, Mar 17, 2014 at 11:32 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 One option is to just change that function to also unmap the control
 segment, and maybe rename it to dsm_detach_all(), and then use that
 everywhere.  The problem is that I'm not sure we really want to incur
 the overhead of an extra munmap() during every backend exit, just to
 get rid of a control segment which was going to be unmapped anyway by
 process termination.  For that matter, I'm not sure why we bother
 arranging that for the main shared memory segment, either: surely
 whatever function the shmdt() and munmap() calls in IpcMemoryDetach
 may have will be equally well-served by the forthcoming exit()?

 Before you lobotomize that code too much, consider the postmaster
 crash-recovery case.  That path does need to detach from the old
 shmem segment.

 Also, I might be wrong, but I think IpcMemoryDetach is a *postmaster*
 on_shmem_exit routine; it isn't called during backend exit.

Ah, right.  I verified using strace that, at least on Linux
2.6.32-279.el6.x86_64, the only system call made on disconnecting a
psql session is exit_group(0).  I also tried setting breakpoints on
PGSharedMemoryDetach and IpcMemoryDetach and they do not fire.

After mulling over a few possible approaches, I came up with the
attached, which seems short and to the point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index cf2ce46..8153160 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -40,6 +40,7 @@
 #include postmaster/fork_process.h
 #include postmaster/pgarch.h
 #include postmaster/postmaster.h
+#include storage/dsm.h
 #include storage/fd.h
 #include storage/ipc.h
 #include storage/latch.h
@@ -163,6 +164,7 @@ pgarch_start(void)
 			on_exit_reset();
 
 			/* Drop our connection to postmaster's shared memory, as well */
+			dsm_detach_all();
 			PGSharedMemoryDetach();
 
 			PgArchiverMain(0, NULL);
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 1ca5d13..3dc280a 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -50,6 +50,7 @@
 #include postmaster/postmaster.h
 #include storage/proc.h
 #include storage/backendid.h
+#include storage/dsm.h
 #include storage/fd.h
 #include storage/ipc.h
 #include storage/latch.h
@@ -709,6 +710,7 @@ pgstat_start(void)
 			on_exit_reset();
 
 			/* Drop our connection to postmaster's shared memory, as well */
+			dsm_detach_all();
 			PGSharedMemoryDetach();
 
 			PgstatCollectorMain(0, NULL);
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index e277a9a..4731ab7 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -39,6 +39,7 @@
 #include postmaster/fork_process.h
 #include postmaster/postmaster.h
 #include postmaster/syslogger.h
+#include storage/dsm.h
 #include storage/ipc.h
 #include storage/latch.h
 #include storage/pg_shmem.h
@@ -626,6 +627,7 @@ SysLogger_Start(void)
 			on_exit_reset();
 
 			/* Drop our connection to postmaster's shared memory, as well */
+			dsm_detach_all();
 			PGSharedMemoryDetach();
 
 			/* do the work */
diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index 232c099..c967177 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -722,6 +722,8 @@ dsm_attach(dsm_handle h)
 
 /*
  * At backend shutdown time, detach any segments that are still attached.
+ * (This is similar to dsm_detach_all, except that there's no reason to
+ * unmap the control segment before exiting, so we don't bother.)
  */
 void
 dsm_backend_shutdown(void)
@@ -736,6 +738,31 @@ dsm_backend_shutdown(void)
 }
 
 /*
+ * Detach all shared memory segments, including the control segments.  This
+ * should be called, along with PGSharedMemoryDetach, in processes that
+ * might inherit mappings but are not intended to be connected to dynamic
+ * shared memory.
+ */
+void
+dsm_detach_all(void)
+{
+	void   *control_address = dsm_control;
+
+	while (!dlist_is_empty(dsm_segment_list))
+	{
+		dsm_segment	   *seg;
+
+		seg = dlist_head_element(dsm_segment, node, dsm_segment_list);
+		dsm_detach(seg);
+	}
+
+	if (control_address != NULL)
+		dsm_impl_op(DSM_OP_DETACH, dsm_control_handle, 0,
+	dsm_control_impl_private, control_address,
+	dsm_control_mapped_size, ERROR);
+}
+
+/*
  * Resize an existing shared memory segment.
  *
  * This may cause the shared memory segment to be remapped at a different
diff --git a/src/include/storage/dsm.h b/src/include/storage/dsm.h
index 03dd767..e4669a1 100644
--- a/src/include/storage/dsm.h
+++ b/src/include/storage/dsm.h
@@ -20,6 +20,7 @@ typedef struct dsm_segment dsm_segment;
 /* Startup and shutdown functions. */
 extern void dsm_postmaster_startup(void);
 extern void

Re: [HACKERS] Planner hints in Postgresql

Atri Sharma atri.j...@gmail.com writes:
 On Mon, Mar 17, 2014 at 9:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Possibly worth noting is that in past discussions, we've concluded that
 the most sensible type of hint would not be use this plan at all, but
 here's what to assume about the selectivity of this WHERE clause.
 That seems considerably less likely to break than any attempt to directly
 specify plan details.

 Isnt using a user given value for selectivity a pretty risky situation as
 it can horribly screw up the plan selection?

And forcing a plan to be used *isn't* that?  Please re-read the older
threads, since you evidently have not.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Portability issues in shm_mq

On Mon, Mar 17, 2014 at 12:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, Mar 16, 2014 at 10:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, it will result in padding space when you maxalign the length word,
 but I don't see why it wouldn't work; and it would certainly be no less
 efficient than what's there today.

 Well, the problem is with this:

 /* Write the message length into the buffer. */
 if (!mqh-mqh_did_length_word)
 {
 res = shm_mq_send_bytes(mqh, sizeof(uint64), nbytes, nowait,
 bytes_written);

 If I change nbytes to be of type Size, and the second argument to
 sizeof(Size), then it's wrong whenever sizeof(Size) % MAXIMUM_ALIGNOF
 != 0.

 Well, you need to maxalign the number of bytes physically inserted into
 the queue.  Doesn't shm_mq_send_bytes do that?  Where do you do the
 maxaligning of the message payload data, when the payload is odd-length?
 I would have expected some logic like copy N bytes but then advance
 the pointer by maxalign(N).

Oh, yeah.  Duh.  Clearly my brain isn't working today.  Hmm, so maybe
this will be fairly simple... will try it out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Stephen Frost

* Atri Sharma (atri.j...@gmail.com) wrote:
 Isnt using a user given value for selectivity a pretty risky situation as
 it can horribly screw up the plan selection?
 
 Why not allow the user to specify an alternate plan and have the planner

Uh, you're worried about the user given us a garbage selectivity, but
they're going to get a full-blown plan perfect?

 assign a higher preference to it during plan evaluation? This shall allow
 us to still have a fair evaluation of all possible plans as we do right now
 and yet have a higher preference for the user given plan during evaluation?

What exactly would such a preference look like?  A cost modifier?
We'd almost certainly have to make that into a GUC or a value passed in
as part of the query, with a high likelihood of users figuring out how
to use it to say use my plan forever and always..

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread David Johnston

Atri Sharma wrote
 On Mon, Mar 17, 2014 at 9:45 PM, Tom Lane lt;

 tgl@.pa

 gt; wrote:
 
 David Johnston lt;

 polobo@

 gt; writes:
  Need to discuss the general why before any meaningful help on the
 how is
  going to be considered by hackers.

 Possibly worth noting is that in past discussions, we've concluded that
 the most sensible type of hint would not be use this plan at all, but
 here's what to assume about the selectivity of this WHERE clause.
 That seems considerably less likely to break than any attempt to directly
 specify plan details.


 Isnt using a user given value for selectivity a pretty risky situation as
 it can horribly screw up the plan selection?
 
 Why not allow the user to specify an alternate plan and have the planner
 assign a higher preference to it during plan evaluation? This shall allow
 us to still have a fair evaluation of all possible plans as we do right
 now
 and yet have a higher preference for the user given plan during
 evaluation?

The larger question to answer first is whether we want to implement
something that is deterministic...

How about just dropping the whole concept of hinting and provide a way for
someone to say use this plan, or die trying.  Maybe require it be used in
conjunction with named PREPAREd statements:

PREPARE s1 (USING /path/to/plan_def_on_server_or_something_similar) AS
SELECT ...;

Aside from whole-plan specification I can definitely see where join/where
specification could be useful if it can overcome the current limitation of
not being able to calculate inter-table estimations.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Planner-hints-in-Postgresql-tp5796347p5796378.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] First-draft release notes for next week's releases

On 2014-03-15 16:02:19 -0400, Tom Lane wrote:
 First-draft release notes are committed, and should be visible at
 http://www.postgresql.org/docs/devel/static/release-9-3-4.html
 once guaibasaurus does its next buildfarm run a few minutes from
 now.  Any suggestions?

So, the current text is:
This error allowed multiple versions of the same row to become visible
to queries, resulting in apparent duplicates. Since the error is in WAL
replay, it would only manifest during crash recovery or on standby
servers.

what about:

The most prominent consequence of this bug is that rows can appear to
not exist when accessed via an index, while still being visible in
sequential scans. This in turn can lead to constraints, including unique
and foreign key ones, to be violated lateron.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Planner hints in Postgresql

On Mon, Mar 17, 2014 at 10:58 PM, Stephen Frost sfr...@snowman.net wrote:

 * Atri Sharma (atri.j...@gmail.com) wrote:
  Isnt using a user given value for selectivity a pretty risky situation as
  it can horribly screw up the plan selection?
 
  Why not allow the user to specify an alternate plan and have the planner

 Uh, you're worried about the user given us a garbage selectivity, but
 they're going to get a full-blown plan perfect?



I never said that the user plan would be perfect. The entire point of
planner hints is based on the assumption that the user knows more about the
data than the planner does hence the user's ideas about the plan should be
given a preference. Garbage selectivity can screw up  the cost estimation
of *all* our possible plans and we could end up preferring a sequential
scan over an index only scan for e.g. I am trying to think of ways that
give some preference to a user plan but do not interfere with the cost
estimation of our other potential plans.



 What exactly would such a preference look like?  A cost modifier?
 We'd almost certainly have to make that into a GUC or a value passed in
 as part of the query, with a high likelihood of users figuring out how
 to use it to say use my plan forever and always..


A factor that we experimentally determine by which we decrease the cost of
the user specified plan so that it gets a higher preference in the plan
evaluation.

Of course, this is not a nice hack. Specifically after our discussion on
IRC the other day, I am against planner hints, but if we are just
discussing how it could be done, I  could think of some ways which I listed.

Regards,

Atri
-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions

Robert Haas robertmh...@gmail.com writes:
 After mulling over a few possible approaches, I came up with the
 attached, which seems short and to the point.

Looks reasonable in principle.  I didn't run through all the existing
PGSharedMemoryDetach calls to see if there are any other places to
call dsm_detach_all, but it's an easy fix if there are any.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Stephen Frost

* Atri Sharma (atri.j...@gmail.com) wrote:
 Of course, this is not a nice hack. Specifically after our discussion on
 IRC the other day, I am against planner hints, but if we are just
 discussing how it could be done, I  could think of some ways which I listed.

There's lots of ways to implement planner hints, but I fail to see the
point in discussing how to implement something we actively don't want.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] First-draft release notes for next week's releases

Andres Freund and...@2ndquadrant.com writes:
 On 2014-03-17 10:03:52 -0700, Josh Berkus wrote:
 First, see suggested text in my first-draft release announcement.

 I don't think that text is any better, it's imo even wrong:
 The bug causes rows to vanish from indexes during recovery due to
 simultaneous updates of rows on both sides of a foreign key.

 Neither is a foreign key, nor simultaneous updates, nor both sides a
 prerequisite.

What I've got at the moment is

  This error caused updated rows to disappear from indexes, resulting
  in inconsistent query results depending on whether an index scan was
  used.  Subsequent processing could result in unique-key violations,
  since the previously updated row would not be found by later index
  searches.  Since this error is in WAL replay, it would only manifest
  during crash recovery or on standby servers.  The improperly-replayed
  case can arise when a table row that is referenced by a foreign-key
  constraint is updated concurrently with creation of a
  referencing row.

OK, or not?  The time window for bikeshedding this is dwindling rapidly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Planner hints in Postgresql

The larger question to answer first is whether we want to implement

 something that is deterministic...

 How about just dropping the whole concept of hinting and provide a way
 for
 someone to say use this plan, or die trying.  Maybe require it be used in
 conjunction with named PREPAREd statements:



You mean taking away the entire concept of query planning and cost
estimation? Thats like replacing the optimizer with DBA decision and I am
not at all comfortable with that idea. That are only my thoughts though.




 PREPARE s1 (USING /path/to/plan_def_on_server_or_something_similar) AS
 SELECT ...;

 Aside from whole-plan specification I can definitely see where join/where
 specification could be useful if it can overcome the current limitation of
 not being able to calculate inter-table estimations.



Prepare plans use a generic plan for the execution. Replacing it with a
totally user defined plan does not seem to be clean.

The crux is that IMHO planner hints are a bad way of trying to circumvent
the need for cross-column statistics. We should do cross-column statistics
done and ignore planner hints completely.

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Planner hints in Postgresql

 There's lots of ways to implement planner hints, but I fail to see the
 point in discussing how to implement something we actively don't want.



+1. The original poster wanted a way to implement it as a personal project
or something ( I think he only replied to me, not the entire list).

Planner hints should be ignored :)

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] jsonb status

2014-03-17 Thread Andrew Dunstan



On 03/16/2014 04:10 AM, Peter Geoghegan wrote:

On Thu, Mar 13, 2014 at 2:00 PM, Andrew Dunstan and...@dunslane.net wrote:

I'll be travelling a good bit of tomorrow (Friday), but I hope Peter has
finished by the time I am back on deck late tomorrow and that I am able to
commit this on Saturday.

I asked Andrew to hold off on committing this today. It was agreed
that we weren't quite ready, because there were one or two remaining
bugs (since fixed), but also because I felt that it would be useful to
first hear the opinions of more people before proceeding. I think that
we're not that far from having something committed. Obviously I hope
to get this into 9.4, and attach a lot of strategic importance to
having the feature, which is why I made a large effort to help land
it.

Attached patch has a number of notable revisions. Throughout, it has
been possible for anyone to follow our progress here:
https://github.com/feodor/postgres/commits/jsonb_and_hstore

* In general, the file jsonb_support.c (renamed to jsonb_utils.c) is
vastly better commented, and has a much clearer structure. This was
not something I did much with in the previous revision, and so it has
been a definite focus of this one.

* Hashing is refactored to not use CRC32 anymore. I felt this was a
questionable method of hashing, both within jsonb_hash(), as well as
the jsonb_hash_ops GIN operator class.

* Dead code elimination.

* I got around to fixing the memory leaks in B-Tree support function one.

* Andrew added hstore_to_jsonb, hstore_to_jsonb_loose functions and a
cast. One goal of this effort is to preserve a parallel set of
facilities for the json and jsonb types, and that includes
hstore-related features.

* A fix from Alexander for the jsonb_hash_ops @operator issue I
complained about during the last submission was merged.

* There is no longer any GiST opclass. That just leaves B-Tree, hash,
GIN (default) and GIN jsonb_hash_ops opclasses.

My outstanding concerns are:

* Have we got things right with GIN indexing, containment semantics,
etc? See my remarks in the patch, by grepping contain within
jsonb_util.c. Is the GIN text storage serialization format appropriate
and correct?

* General design concerns. By far the largest source of these is the
file jsonb_util.c.

* Is the on-disk format that we propose to tie Postgres to as good as
it could be?





I've been working through all the changes and fixes that Peter and 
others have made, and they look pretty good to me. There are a few 
mostly cosmetic changes I want to make, but nothing that would be worth 
holding up committing this for. I'm fairly keen to get this committed, 
get some buildfarm coverage and get more people playing with it and 
testing it.


Like Peter, I would like to see more comments from people on the GIN 
support, especially.


The one outstanding significant question of substance I have is this: 
given the commit 5 days ago of provision for triConsistent functions for 
GIN opclasses, should be be adding these to the two GIN opclasses we are 
providing, and what should they look like? Again, this isn't an issue 
that I think needs to hold up committing what we have now.


Regarding Peter's last question, if we're not satisfied with the on-disk 
format proposed it would mean throwing the whole effort out and starting 
again. The only thing I have thought of as an alternative would be to 
store the structure and values separately rather than with values inline 
with the structure. That way you could have a hash of values more or 
less, which would eliminate redundancy of storage of things like object 
field names. But such a structure might well involve at least as much 
computational overhead as the current structure. And nobody's been 
saying all along hold on, we can do better than this. So I'm pretty 
inclined to go with what we have.


cheers

andrew






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Planner hints in Postgresql

On Mon, Mar 17, 2014 at 10:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Atri Sharma atri.j...@gmail.com writes:
  On Mon, Mar 17, 2014 at 9:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Possibly worth noting is that in past discussions, we've concluded that
  the most sensible type of hint would not be use this plan at all, but
  here's what to assume about the selectivity of this WHERE clause.
  That seems considerably less likely to break than any attempt to
 directly
  specify plan details.

  Isnt using a user given value for selectivity a pretty risky situation as
  it can horribly screw up the plan selection?

 And forcing a plan to be used *isn't* that?  Please re-read the older
 threads, since you evidently have not.


I never said that we force a plan to be used. I just said that we should
increase the preference for a user given plan and not interfere in the cost
estimation of the other potential plans and the evaluation of the final
selected plan.

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Planner hints in Postgresql

On Mon, Mar 17, 2014 at 11:15 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 David Johnston pol...@yahoo.com writes:
 Need to discuss the general why before any meaningful help on the how is
 going to be considered by hackers.

 Possibly worth noting is that in past discussions, we've concluded that
 the most sensible type of hint would not be use this plan at all, but
 here's what to assume about the selectivity of this WHERE clause.
 That seems considerably less likely to break than any attempt to directly
 specify plan details.

Yeah -- the most common case I see is outlier culling where several
repeated low non-deterministic selectivity quals stack reducing the
row count estimate to 1.  For example:
SELECT * FROM foo WHERE length(bar) = 1000 AND length(bar) = 2;

The user may have special knowledge that the above is very (or very
un-) selective that is difficult or not cost effective to gather in
the general case.  IIRC in the archives (heh) there is a special
workaround using indexes and some discussion regarding how a
hypothetical feature involving user input selectivity estimates might
look.  I don't think that discussion is complete: the syntax for user
input selectivity is an unsolved problem.

There's a big difference between saying to the planner, Use plan X
vs Here's some information describing the data supporting choosing
plan X intelligently.  The latter allows for better plans in the face
of varied/changing data, integrates with the planner in natural way,
and encourages users to understand how the planner works.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] First-draft release notes for next week's releases

On 2014-03-17 13:42:59 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-03-17 10:03:52 -0700, Josh Berkus wrote:
  First, see suggested text in my first-draft release announcement.
 
  I don't think that text is any better, it's imo even wrong:
  The bug causes rows to vanish from indexes during recovery due to
  simultaneous updates of rows on both sides of a foreign key.
 
  Neither is a foreign key, nor simultaneous updates, nor both sides a
  prerequisite.
 
 What I've got at the moment is
 
   This error caused updated rows to disappear from indexes, resulting
   in inconsistent query results depending on whether an index scan was
   used.  Subsequent processing could result in unique-key violations,
   since the previously updated row would not be found by later index
   searches.  Since this error is in WAL replay, it would only manifest
   during crash recovery or on standby servers.  The improperly-replayed
   case can arise when a table row that is referenced by a foreign-key
   constraint is updated concurrently with creation of a
   referencing row.
 
 OK, or not?  The time window for bikeshedding this is dwindling rapidly.

That's much better, yes. Two things:

* I'd change the warning about unique key violations into a more general
  one about constraints. Foreign key and exclusion constraint are also
  affected...
* I wonder if we should make the possible origins a bit more
  general as it's perfectly possible to trigger the problem without
  foreign keys. Maybe: can arise when a table row that has been updated
  is row locked; that can e.g. happen when foreign keys are used.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] First-draft release notes for next week's releases

On 2014-03-17 11:28:45 -0400, Tom Lane wrote:
 Hm ... rows disappearing from indexes might make people think that
 they could fix or mitigate the damage via REINDEX.

Good point. I guess in some cases it will end up working because
VACUUM/hot pruning have cleaned up the mess, but that's certainly not
something I'd want to rely upon. They very well could have messed up
things when presented with bogus input data.

  That's not really
 true though is it?  It looks to me like IndexBuildHeapScan will suffer
 an Assert failure in an assert-enabled build, or build a bogus index
 if not assert-enabled, when it comes across a heap-only tuple that
 has no parent.
 
 I'm thinking we'd better promote that Assert to a normal runtime elog.

I wonder if we also should make rewriteheap.c warn about such
things. Although I don't immediately see a trivial way to do so.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Stephen Frost

* Merlin Moncure (mmonc...@gmail.com) wrote:
 Yeah -- the most common case I see is outlier culling where several
 repeated low non-deterministic selectivity quals stack reducing the
 row count estimate to 1.  For example:
 SELECT * FROM foo WHERE length(bar) = 1000 AND length(bar) = 2;

This is exactly the issue that I've seen also- where we end up picking a
Nested Loop because we think only one row is going to be returned and
instead we end up getting a bunch and it takes forever.

There was also some speculation on trying to change plans mid-stream to
address a situation like that, once we realize what's happening.  Not
sure that's really practical but it would be nice to find some solution.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Planner hints in Postgresql

There's a big difference between saying to the planner, Use plan X
 vs Here's some information describing the data supporting choosing
 plan X intelligently.  The latter allows for better plans in the face
 of varied/changing data, integrates with the planner in natural way,
 and encourages users to understand how the planner works.



+1

I was thinking of varying the 'weight' of a user defined plan by an fixed
experimental factor to tell the planner to give higher/lower preference to
this plan, but after your idea above, I think Stephen's point of
introducing a GUC for the factor is the only way possible and I agree with
him on the point that eventually the user will figure out a way to force
usage of his plan using the GUC.

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] First-draft release notes for next week's releases

Andres Freund and...@2ndquadrant.com writes:
 That's much better, yes. Two things:

 * I'd change the warning about unique key violations into a more general
   one about constraints. Foreign key and exclusion constraint are also
   affected...

I'll see what I can do.

 * I wonder if we should make the possible origins a bit more
   general as it's perfectly possible to trigger the problem without
   foreign keys. Maybe: can arise when a table row that has been updated
   is row locked; that can e.g. happen when foreign keys are used.

IIUC, this case only occurs when using the new-in-9.3 types of
nonexclusive row locks.  I'm willing to bet that the number of
applications using those is negligible; so I think it's all right to not
mention that case explicitly, as long as the wording doesn't say that
foreign keys are the *only* cause (which I didn't).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] First-draft release notes for next week's releases

On 2014-03-17 14:01:03 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  * I wonder if we should make the possible origins a bit more
general as it's perfectly possible to trigger the problem without
foreign keys. Maybe: can arise when a table row that has been updated
is row locked; that can e.g. happen when foreign keys are used.
 
 IIUC, this case only occurs when using the new-in-9.3 types of
 nonexclusive row locks.  I'm willing to bet that the number of
 applications using those is negligible; so I think it's all right to not
 mention that case explicitly, as long as the wording doesn't say that
 foreign keys are the *only* cause (which I didn't).

I actually think the issue could also occur with row locks of other
severities (is that the correct term?). Alvaro probably knows better,
but if I see correctly it's also triggerable if a backend waits for an
updating transaction to finish and follow_updates = true is passed to
heap_lock_tuple(). Which e.g. nodeLockRows.c does...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] First-draft release notes for next week's releases

Andres Freund and...@2ndquadrant.com writes:
 On 2014-03-17 14:01:03 -0400, Tom Lane wrote:
 IIUC, this case only occurs when using the new-in-9.3 types of
 nonexclusive row locks.  I'm willing to bet that the number of
 applications using those is negligible; so I think it's all right to not
 mention that case explicitly, as long as the wording doesn't say that
 foreign keys are the *only* cause (which I didn't).

 I actually think the issue could also occur with row locks of other
 severities (is that the correct term?).

The commit log entry says

We were resetting the tuple's HEAP_HOT_UPDATED flag as well as t_ctid on
WAL replay of a tuple-lock operation, which is incorrect when the tuple
is already updated.

Back-patch to 9.3.  The clearing of both header elements was there
previously, but since no update could be present on a tuple that was
being locked, it was harmless.

which I read to mean that the case can't occur with the types of row
locks that were allowed pre-9.3.

 but if I see correctly it's also triggerable if a backend waits for an
 updating transaction to finish and follow_updates = true is passed to
 heap_lock_tuple(). Which e.g. nodeLockRows.c does...

That sounds backwards.  nodeLockRows locks the latest tuple in the chain,
so it can't be subject to this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb status

2014-03-17 Thread Oleg Bartunov

Alexander will take a look on TriConsistent function.

On Mon, Mar 17, 2014 at 9:48 PM, Andrew Dunstan and...@dunslane.net wrote:

 On 03/16/2014 04:10 AM, Peter Geoghegan wrote:

 On Thu, Mar 13, 2014 at 2:00 PM, Andrew Dunstan and...@dunslane.net
 wrote:

 I'll be travelling a good bit of tomorrow (Friday), but I hope Peter has
 finished by the time I am back on deck late tomorrow and that I am able
 to
 commit this on Saturday.

 I asked Andrew to hold off on committing this today. It was agreed
 that we weren't quite ready, because there were one or two remaining
 bugs (since fixed), but also because I felt that it would be useful to
 first hear the opinions of more people before proceeding. I think that
 we're not that far from having something committed. Obviously I hope
 to get this into 9.4, and attach a lot of strategic importance to
 having the feature, which is why I made a large effort to help land
 it.

 Attached patch has a number of notable revisions. Throughout, it has
 been possible for anyone to follow our progress here:
 https://github.com/feodor/postgres/commits/jsonb_and_hstore

 * In general, the file jsonb_support.c (renamed to jsonb_utils.c) is
 vastly better commented, and has a much clearer structure. This was
 not something I did much with in the previous revision, and so it has
 been a definite focus of this one.

 * Hashing is refactored to not use CRC32 anymore. I felt this was a
 questionable method of hashing, both within jsonb_hash(), as well as
 the jsonb_hash_ops GIN operator class.

 * Dead code elimination.

 * I got around to fixing the memory leaks in B-Tree support function one.

 * Andrew added hstore_to_jsonb, hstore_to_jsonb_loose functions and a
 cast. One goal of this effort is to preserve a parallel set of
 facilities for the json and jsonb types, and that includes
 hstore-related features.

 * A fix from Alexander for the jsonb_hash_ops @operator issue I
 complained about during the last submission was merged.

 * There is no longer any GiST opclass. That just leaves B-Tree, hash,
 GIN (default) and GIN jsonb_hash_ops opclasses.

 My outstanding concerns are:

 * Have we got things right with GIN indexing, containment semantics,
 etc? See my remarks in the patch, by grepping contain within
 jsonb_util.c. Is the GIN text storage serialization format appropriate
 and correct?

 * General design concerns. By far the largest source of these is the
 file jsonb_util.c.

 * Is the on-disk format that we propose to tie Postgres to as good as
 it could be?




 I've been working through all the changes and fixes that Peter and others
 have made, and they look pretty good to me. There are a few mostly cosmetic
 changes I want to make, but nothing that would be worth holding up
 committing this for. I'm fairly keen to get this committed, get some
 buildfarm coverage and get more people playing with it and testing it.

 Like Peter, I would like to see more comments from people on the GIN
 support, especially.

 The one outstanding significant question of substance I have is this: given
 the commit 5 days ago of provision for triConsistent functions for GIN
 opclasses, should be be adding these to the two GIN opclasses we are
 providing, and what should they look like? Again, this isn't an issue that I
 think needs to hold up committing what we have now.

 Regarding Peter's last question, if we're not satisfied with the on-disk
 format proposed it would mean throwing the whole effort out and starting
 again. The only thing I have thought of as an alternative would be to store
 the structure and values separately rather than with values inline with the
 structure. That way you could have a hash of values more or less, which
 would eliminate redundancy of storage of things like object field names. But
 such a structure might well involve at least as much computational overhead
 as the current structure. And nobody's been saying all along hold on, we
 can do better than this. So I'm pretty inclined to go with what we have.


 cheers

 andrew






 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Planner hints in Postgresql

On Mon, Mar 17, 2014 at 12:57 PM, Atri Sharma atri.j...@gmail.com wrote:

 There's a big difference between saying to the planner, Use plan X
 vs Here's some information describing the data supporting choosing
 plan X intelligently.  The latter allows for better plans in the face
 of varied/changing data, integrates with the planner in natural way,
 and encourages users to understand how the planner works.

 +1

 I was thinking of varying the 'weight' of a user defined plan by an fixed
 experimental factor to tell the planner to give higher/lower preference to
 this plan, but after your idea above, I think Stephen's point of introducing
 a GUC for the factor is the only way possible and I agree with him on the
 point that eventually the user will figure out a way to force usage of his
 plan using the GUC.

GUC is not the answer beyond the broad brush mostly debugging level
features they already support.   What do you do if your plan
simultaneously needs and does not need nestloops?

A query plan is a complicated thing that is the result of detail
analysis of the data.  I bet there are less than 100 users on the
planet with the architectural knowledge of the planner to submit a
'plan'.  What users do have is knowledge of the data that the database
can't effectively gather for some reason.  Looking at my query above,
what it would need (assuming the planner could not be made to look
through length()) would be something like:

SELECT * FROM foo WHERE
  length(bar) = 1000 WITH SELECTIVITY 0.999
  AND length(bar) = 2 WITH SELECTIVITY 0.999;

Note, that's a trivial treatment of the syntax challenges.  Ultimately
it'd probably look different and/or be hooked in a different way (say,
via the function call).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] First-draft release notes for next week's releases

On 2014-03-17 14:16:41 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-03-17 14:01:03 -0400, Tom Lane wrote:
  IIUC, this case only occurs when using the new-in-9.3 types of
  nonexclusive row locks.  I'm willing to bet that the number of
  applications using those is negligible; so I think it's all right to not
  mention that case explicitly, as long as the wording doesn't say that
  foreign keys are the *only* cause (which I didn't).
 
  I actually think the issue could also occur with row locks of other
  severities (is that the correct term?).
 
 The commit log entry says
 
 We were resetting the tuple's HEAP_HOT_UPDATED flag as well as t_ctid on
 WAL replay of a tuple-lock operation, which is incorrect when the tuple
 is already updated.
 
 Back-patch to 9.3.  The clearing of both header elements was there
 previously, but since no update could be present on a tuple that was
 being locked, it was harmless.
 
 which I read to mean that the case can't occur with the types of row
 locks that were allowed pre-9.3.

That's not an unreasonable interpretation of the commit message, but I
don't think it's correct with respect to the code :(

  but if I see correctly it's also triggerable if a backend waits for an
  updating transaction to finish and follow_updates = true is passed to
  heap_lock_tuple(). Which e.g. nodeLockRows.c does...
 
 That sounds backwards.  nodeLockRows locks the latest tuple in the chain,
 so it can't be subject to this.

Hm, I don't see anything in the code preventing it, that's the
lock_tuple() before the EPQ stuff... in ExecLockRows():
foreach(lc, node-lr_arowMarks)
{
test = heap_lock_tuple(erm-relation, tuple,
   
estate-es_output_cid,
   lockmode, 
erm-noWait, true,
   buffer, hufd);
ReleaseBuffer(buffer);
switch (test)
{
case HeapTupleSelfUpdated:
...

the true passed to heap_lock_tuple() is the follow_updates
parameter. And then in heap_lock_tuple():
if (require_sleep)
{
if (infomask  HEAP_XMAX_IS_MULTI)
{
...
/* if there are updates, follow the update 
chain */
if (follow_updates 
!HEAP_XMAX_IS_LOCKED_ONLY(infomask))
{
HTSU_Result res;

res = heap_lock_updated_tuple(relation, 
tuple, t_ctid,

  GetCurrentTransactionId(),
...
else
{
/* wait for regular transaction to end */
if (nowait)
{
if 
(!ConditionalXactLockTableWait(xwait))
ereport(ERROR,

(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
 errmsg(could 
not obtain lock on row in relation \%s\,

RelationGetRelationName(relation;
}
else
XactLockTableWait(xwait);

/* if there are updates, follow the update 
chain */
if (follow_updates 
!HEAP_XMAX_IS_LOCKED_ONLY(infomask))
...
if (RelationNeedsWAL(relation))
{
xl_heap_lock xlrec;
XLogRecPtr  recptr;
XLogRecData rdata[2];

xlrec.target.node = relation-rd_node;
xlrec.target.tid = tuple-t_self;
...

To me that looks sufficient to trigger the bug, because we're issuing a
wal record about the row that was passed to heap_lock_update(), not the
latest one in the ctid chain. When replaying that record, it will reset
the t_ctid field, thus breaking the chain.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] First-draft release notes for next week's releases

Andres Freund and...@2ndquadrant.com writes:
 To me that looks sufficient to trigger the bug, because we're issuing a
 wal record about the row that was passed to heap_lock_update(), not the
 latest one in the ctid chain. When replaying that record, it will reset
 the t_ctid field, thus breaking the chain.

[ scratches head ... ]  If that's what's happening, isn't it a bug in
itself?  Surely the WAL record ought to point at the tuple that was
locked.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] First-draft release notes for next week's releases

On 2014-03-17 14:29:56 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  To me that looks sufficient to trigger the bug, because we're issuing a
  wal record about the row that was passed to heap_lock_update(), not the
  latest one in the ctid chain. When replaying that record, it will reset
  the t_ctid field, thus breaking the chain.
 
 [ scratches head ... ]  If that's what's happening, isn't it a bug in
 itself?  Surely the WAL record ought to point at the tuple that was
 locked.

There's a separate XLOG_HEAP2_LOCK_UPDATED record, for every later tuple
version, emitted by heap_lock_updated_tuple_rec(). This really is mind
bendingly complex :(.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Planner hints in Postgresql

On Mon, Mar 17, 2014 at 11:50 PM, Merlin Moncure mmonc...@gmail.com wrote:

 On Mon, Mar 17, 2014 at 12:57 PM, Atri Sharma atri.j...@gmail.com wrote:
 
  There's a big difference between saying to the planner, Use plan X
  vs Here's some information describing the data supporting choosing
  plan X intelligently.  The latter allows for better plans in the face
  of varied/changing data, integrates with the planner in natural way,
  and encourages users to understand how the planner works.
 
  +1
 
  I was thinking of varying the 'weight' of a user defined plan by an fixed
  experimental factor to tell the planner to give higher/lower preference
 to
  this plan, but after your idea above, I think Stephen's point of
 introducing
  a GUC for the factor is the only way possible and I agree with him on the
  point that eventually the user will figure out a way to force usage of
 his
  plan using the GUC.

 GUC is not the answer beyond the broad brush mostly debugging level
 features they already support.   What do you do if your plan
 simultaneously needs and does not need nestloops?

 A query plan is a complicated thing that is the result of detail
 analysis of the data.  I bet there are less than 100 users on the
 planet with the architectural knowledge of the planner to submit a
 'plan'.  What users do have is knowledge of the data that the database
 can't effectively gather for some reason.  Looking at my query above,
 what it would need (assuming the planner could not be made to look
 through length()) would be something like:

 SELECT * FROM foo WHERE
   length(bar) = 1000 WITH SELECTIVITY 0.999
   AND length(bar) = 2 WITH SELECTIVITY 0.999;



Wont this have scaling issues and  issues over time as the data in the
table changes?

Suppose I make a view with the above query. With time, as the data in the
table changes, the selectivity values wont be good for planning. This may
potentially lead to a lot of changes in the view definition and other
places where this query was used.



In general, I think I step back on my point that specifying the selectivity
is a bad idea.

Could this also work (for the time being) for cross-column statistics?

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Pavel Stehule

2014-03-17 19:35 GMT+01:00 Atri Sharma atri.j...@gmail.com:




 On Mon, Mar 17, 2014 at 11:50 PM, Merlin Moncure mmonc...@gmail.comwrote:

 On Mon, Mar 17, 2014 at 12:57 PM, Atri Sharma atri.j...@gmail.com
 wrote:
 
  There's a big difference between saying to the planner, Use plan X
  vs Here's some information describing the data supporting choosing
  plan X intelligently.  The latter allows for better plans in the face
  of varied/changing data, integrates with the planner in natural way,
  and encourages users to understand how the planner works.
 
  +1
 
  I was thinking of varying the 'weight' of a user defined plan by an
 fixed
  experimental factor to tell the planner to give higher/lower preference
 to
  this plan, but after your idea above, I think Stephen's point of
 introducing
  a GUC for the factor is the only way possible and I agree with him on
 the
  point that eventually the user will figure out a way to force usage of
 his
  plan using the GUC.

 GUC is not the answer beyond the broad brush mostly debugging level
 features they already support.   What do you do if your plan
 simultaneously needs and does not need nestloops?

 A query plan is a complicated thing that is the result of detail
 analysis of the data.  I bet there are less than 100 users on the
 planet with the architectural knowledge of the planner to submit a
 'plan'.  What users do have is knowledge of the data that the database
 can't effectively gather for some reason.  Looking at my query above,
 what it would need (assuming the planner could not be made to look
 through length()) would be something like:

 SELECT * FROM foo WHERE
   length(bar) = 1000 WITH SELECTIVITY 0.999
   AND length(bar) = 2 WITH SELECTIVITY 0.999;



 Wont this have scaling issues and  issues over time as the data in the
 table changes?

 Suppose I make a view with the above query. With time, as the data in the
 table changes, the selectivity values wont be good for planning. This may
 potentially lead to a lot of changes in the view definition and other
 places where this query was used.



 In general, I think I step back on my point that specifying the
 selectivity is a bad idea.

 Could this also work (for the time being) for cross-column statistics?


It is another issue.

I don't believe so SELECTIVITY can work well too. Slow queries are usually
related to some strange points in data. I am thinking so well concept
should be based on validity of estimations. Some plans are based on totally
wrong estimation, but should be fast due less sensitivity to bad
estimations. So well concept is penalization some risk plans - or use brute
force - like COLUMN store engine does. Their plan is usually simply and
tolerant to bad estimations.

Pavel


 Regards,

 Atri



 --
 Regards,

 Atri
 *l'apprenant*

Re: [HACKERS] First-draft release notes for next week's releases

On 2014-03-17 14:52:25 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-03-17 14:29:56 -0400, Tom Lane wrote:
  [ scratches head ... ]  If that's what's happening, isn't it a bug in
  itself?  Surely the WAL record ought to point at the tuple that was
  locked.
 
  There's a separate XLOG_HEAP2_LOCK_UPDATED record, for every later tuple
  version, emitted by heap_lock_updated_tuple_rec(). This really is mind
  bendingly complex :(.
 
 Ah, I see; so only the original tuple in the chain is at risk?

Depending on what you define the original tuple in the chain to
be. No, if you happen to mean the root tuple of a ctid chain or similar;
which I guess you didn't. Yes, if you mean the tuplepassed to
heap_lock_tuple(). heap_xlog_lock_updated() looks (and has looked)
correct.

 How about this:

Sounds good to me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] First-draft release notes for next week's releases

Andres Freund and...@2ndquadrant.com writes:
 On 2014-03-17 14:29:56 -0400, Tom Lane wrote:
 [ scratches head ... ]  If that's what's happening, isn't it a bug in
 itself?  Surely the WAL record ought to point at the tuple that was
 locked.

 There's a separate XLOG_HEAP2_LOCK_UPDATED record, for every later tuple
 version, emitted by heap_lock_updated_tuple_rec(). This really is mind
 bendingly complex :(.

Ah, I see; so only the original tuple in the chain is at risk?

How about this:

  This error caused updated rows to not be found by index scans, resulting
  in inconsistent query results depending on whether an index scan was
  used.  Subsequent processing could result in constraint violations,
  since the previously updated row would not be found by later index
  searches, thus possibly allowing conflicting rows to be inserted.
  Since this error is in WAL replay, it would only manifest during crash
  recovery or on standby servers.  The improperly-replayed case most
  commonly arises when a table row that is referenced by a foreign-key
  constraint is updated concurrently with creation of a referencing row;
  but it can also occur when any variant of commandSELECT FOR UPDATE/
  is applied to a row that is being concurrently updated.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Planner hints in Postgresql

Atri Sharma atri.j...@gmail.com writes:
 Wont this have scaling issues and  issues over time as the data in the
 table changes?

It can't possibly have worse problems of that sort than explicitly
specifying a plan does.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Planner hints in Postgresql

On Mon, Mar 17, 2014 at 1:45 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 I don't believe so SELECTIVITY can work well too. Slow queries are usually
 related to some strange points in data. I am thinking so well concept should
 be based on validity of estimations. Some plans are based on totally wrong
 estimation, but should be fast due less sensitivity to bad estimations. So
 well concept is penalization some risk plans - or use brute force - like
 COLUMN store engine does. Their plan is usually simply and tolerant to bad
 estimations.

Disagree.  There is a special case of slow query where problem is not
with the data but with the expression over the data; something in the
query defeats sampled selectivity.  Common culprits are:

*) CASE expressions
*) COALESCE
*) casts
*) simple tranformational expressions
*) predicate string concatenation

When using those expressions, you often end up with default
selectivity assumptions and if they are way off -- watch out.

Plan risk analysis solves a different problem: small changes in the
data mean big changes in the execution runtime.  It probably wouldn't
even help cases where the server thinks there is one row and you
actually have thousands or millions unless you want to implement a
selectivity range with perhaps a risk coefficient.  This was also
suggested sometime back and was also met with some skepticism (but
it'd be interesting to see!).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Alvaro Herrera

Andres Freund wrote:
 On 2014-03-17 14:01:03 -0400, Tom Lane wrote:
  Andres Freund and...@2ndquadrant.com writes:
   * I wonder if we should make the possible origins a bit more
 general as it's perfectly possible to trigger the problem without
 foreign keys. Maybe: can arise when a table row that has been updated
 is row locked; that can e.g. happen when foreign keys are used.
  
  IIUC, this case only occurs when using the new-in-9.3 types of
  nonexclusive row locks.  I'm willing to bet that the number of
  applications using those is negligible; so I think it's all right to not
  mention that case explicitly, as long as the wording doesn't say that
  foreign keys are the *only* cause (which I didn't).
 
 I actually think the issue could also occur with row locks of other
 severities (is that the correct term?). Alvaro probably knows better,
 but if I see correctly it's also triggerable if a backend waits for an
 updating transaction to finish and follow_updates = true is passed to
 heap_lock_tuple(). Which e.g. nodeLockRows.c does...

Uhm.  But at the bottom of that block, right above the failed: label
(heapam.c line 4527 in current master), we recheck the tuple for
locked-only-ness; and fail the whole operation by returning
HeapTupleUpdated, if it's not locked-only, no?  Which would cause
ExecLockRows to grab the next version via EvalPlanQualFetch.
Essentially that check is a lock-conflict test, and the only thing that
does not conflict with an update is a FOR KEY SHARE lock.

Note the only way to pass that test is that either the tuple is
locked-only (spelled in three different ways), or !require_sleep.

Am I completely misunderstanding what's being said here?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] First-draft release notes for next week's releases

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Uhm.  But at the bottom of that block, right above the failed: label
 (heapam.c line 4527 in current master), we recheck the tuple for
 locked-only-ness; and fail the whole operation by returning
 HeapTupleUpdated, if it's not locked-only, no?  Which would cause
 ExecLockRows to grab the next version via EvalPlanQualFetch.
 Essentially that check is a lock-conflict test, and the only thing that
 does not conflict with an update is a FOR KEY SHARE lock.

Right, the row-lock acquisition has to succeed for there to be a risk.
I wasn't sure whether 9.3 had introduced any such cases for existing
row lock types.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Jim Nasby


On 3/17/14, 12:58 PM, Stephen Frost wrote:

* Merlin Moncure (mmonc...@gmail.com) wrote:

Yeah -- the most common case I see is outlier culling where several
repeated low non-deterministic selectivity quals stack reducing the
row count estimate to 1.  For example:
SELECT * FROM foo WHERE length(bar) = 1000 AND length(bar) = 2;


This is exactly the issue that I've seen also- where we end up picking a
Nested Loop because we think only one row is going to be returned and
instead we end up getting a bunch and it takes forever.


FWIW, I've also seen problems with merge and hash joins at work, but I don't 
have any concrete examples handy. :(


There was also some speculation on trying to change plans mid-stream to
address a situation like that, once we realize what's happening.  Not
sure that's really practical but it would be nice to find some solution.


Just being able to detect that something has possibly gone wrong would be 
useful. We could log that to alert the DBA/user of a potential bad plan. We 
could even format this in such a fashion that it's suitable for emailing the 
community with; the query, the plan, the stats, etc. That might make it easier 
for us to fix the planner (although at this point it seems like we're hitting 
statistics gathering problems that we simply don't know how to solve).

There is another aspect of this though: plan stability. There are lots of cases 
where users couldn't care less about getting an optimal plan, but they care 
*greatly* about not getting a brain-dead plan.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Planner hints in Postgresql

On Mon, Mar 17, 2014 at 2:02 PM, Jim Nasby j...@nasby.net wrote:
 Just being able to detect that something has possibly gone wrong would be
 useful. We could log that to alert the DBA/user of a potential bad plan. We
 could even format this in such a fashion that it's suitable for emailing the
 community with; the query, the plan, the stats, etc. That might make it
 easier for us to fix the planner (although at this point it seems like we're
 hitting statistics gathering problems that we simply don't know how to
 solve).

Again, that's not the case here.  The problem is that the server is
using hard wired assumptions (like, 10% selective) *instead* of
statistics -- at least in the case discussed above.  That being said,
I think you're on to something: EXPLAIN ANALYZE rowcounts don't
indicate if the row count was generated from data based assumptions or
SWAGs.  So maybe you could decorate the plan description with an
indicator that suggests when default selectivity rules were hit.

 There is another aspect of this though: plan stability. There are lots of
 cases where users couldn't care less about getting an optimal plan, but they
 care *greatly* about not getting a brain-dead plan.

Except for cases I noted above, I don't understand how you could flag
'sub-optimal' or 'brain-dead' plans.   The server always picks the
best plan it can.  The trick is to (in a very simple and
cpu-unintensive way) indicate when there isn't a lot of confidence in
the plan -- but that's not the same thing.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] First-draft release notes for next week's releases