date:20150622

On Tue, Jun 23, 2015 at 12:11 AM, Sandro Santilli  wrote:
> I've noted that upgrading from PostgreSQL 9.3 to 9.5 I'm suddenly
> unable to specify a "check" rule in the Makefile that includes the
> PGXS one. The error is:
>
>  $ make check
>  rm -rf ''/tmp_install
>  make -C '/home/postgresql-9.5/lib/pgxs/src/makefiles/../..' 
> DESTDIR=''/tmp_install install
>  make[1]: Entering directory `/home/postgresql-9.5/lib/pgxs'
>  make[1]: *** No rule to make target `install'.  Stop.
>  make[1]: Leaving directory `/home/postgresql-9.5/lib/pgxs'
>  make: *** [temp-install] Error 2
>
> I tracked the dangerous -rf to come from Makefile.global and it's empty
> string being due to abs_top_builddir not being define in my own Makefile.
> But beside that, which I can probably fix, it doesn't sound correct
> that a "check" rule insists in finding an "install" rule.

Oops, this is a regression, and a dangerous one indeed. This is caused
by dcae5fac.

One fix is to use NO_TEMP_INSTALL=yes in Makefile.global in the
context of PGXS, like in the patch attached, this variable needing to
be set before Makefile.global is loaded. We could as well use directly
PGXS in the section "Testing", but that does not sound appealing for
Makefile.global's readability.

> I'm also
> surprised that there's no warning coming out from the "make" invocation
> given I'm defining a "check" rule myself (after inclusion).

Why? It looks perfectly normal to me to be able to define your own
check rule. That's more flexible this way.

Thoughts?
-- 
Michael
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index c583b44..82f0c1d 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -54,7 +54,9 @@ top_srcdir = $(abs_top_srcdir)
 srcdir = $(top_srcdir)/$(subdir)
 VPATH = $(srcdir)
 endif
-endif # not PGXS
+else # not PGXS
+NO_TEMP_INSTALL=yes
+endif # PGXS
 
 vpathsearch = `for f in $(addsuffix /$(1),$(subst :, ,. $(VPATH))); do test -r $$f && echo $$f && break; done`
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 make world failing due to sgml tools missing




libxml2 has been a required documentation build tool since PostgreSQL
9.0.  The only thing that's new is that xmllint is in a different
subpackage on some systems.  So just install that and you're all set for
the foreseeable future.


Well, something is different in 9.5. On this same system (Linux Mint 17.1)
I can build all previous versions with "make world" and I do not get this
error.


Maybe the error on the missing tool was previously ignored (that is it 
failed without saying so), and it is now reported?



It might be related to:

commit 5d93ce2d0c619ba1b408eb749715e7223e23f6ae
Author: Peter Eisentraut 
Date:   Tue Oct 21 14:46:38 2014 -0400

doc: Check DocBook XML validity during the build

Building the documentation with XSLT does not check the DTD, like a
DSSSL build would.  One can often get away with having invalid XML, but
the stylesheets might then create incorrect output, as they are not
designed to handle that.  Therefore, check the validity of the XML
against the DTD, using xmllint, during the build.

Add xmllint detection to configure, and add some documentation.

xmllint comes with libxml2, which is already in use, but it might be
in a separate package, such as libxml2-utils on Debian.

Reviewed-by: Fabien COELHO 

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A couple of newlines missing in pg_rewind log entries

On Tue, Jun 23, 2015 at 1:39 PM, Michael Paquier
 wrote:
> Hi all,
>
> Some grepping is showing up that a couple of newlines are missing in
> pg_rewind, leading to unreadable log entries:
> libpq_fetch.c:pg_log(PG_DEBUG, "getting file chunks");
> logging.c:pg_log(PG_PROGRESS, "%*s/%s kB (%d%%) copied",
> filemap.c:pg_fatal("could not stat file \"%s\": %s",
>
> Attached is a patch to fix them.

Please ignore that. It has been fixed yesterday with e98d635.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 make world failing due to sgml tools missing

2015-06-22 Thread Keith Fiske



On Sun, Jun 21, 2015 at 10:56 AM, Peter Eisentraut  wrote:

> On 6/18/15 8:54 AM, Tom Lane wrote:
> > Sure; the point is that libxml2 has suddenly been reclassified as a
> > documentation build tool, which is at least a surprising categorization.
>
> libxml2 has been a required documentation build tool since PostgreSQL
> 9.0.  The only thing that's new is that xmllint is in a different
> subpackage on some systems.  So just install that and you're all set for
> the foreseeable future.
>
>
Well, something is different in 9.5. On this same system (Linux Mint 17.1)
I can build all previous versions with "make world" and I do not get this
error.

Re: [HACKERS] upper planner path-ification

2015-06-22 Thread David Rowley

On 23 June 2015 at 13:55, Kouhei Kaigai  wrote:

> Once we support to add aggregation path during path consideration,
> we need to pay attention morphing of the final target-list according
> to the intermediate path combination, tentatively chosen.
> For example, if partial-aggregation makes sense from cost perspective;
> like SUM(NRows) of partial COUNT(*) AS NRows instead of COUNT(*) on
> billion rows, planner also has to adjust the final target-list according
> to the interim paths. In this case, final output shall be SUM(), instead
> of COUNT().
>
>
This sounds very much like what's been discussed here:

http://www.postgresql.org/message-id/CA+U5nMJ92azm0Yt8TT=hNxFP=vjfhdqfpawfmj+66-4zvcg...@mail.gmail.com

The basic concept is that we add another function set to aggregates that
allow the combination of 2 states. For the case of MIN() and MAX() this
will just be the same as the transfn. SUM() is similar for many types, more
complex for others. I've quite likely just borrowed SUM(BIGINT)'s transfer
functions to allow COUNT()'s to be combined.

More time does need spent inventing the new combining functions that don't
currently exist, but that shouldn't matter as it can be done later.

Commitfest link to patch here https://commitfest.postgresql.org/5/131/

I see you've signed up to review it!

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] checkpointer continuous flushing



Hello Amit,


medium2: scale=300 shared_buffers=5GB checkpoint_timeout=30min
  max_wal_size=4GB warmup=1200 time=7500

  flsh |  full speed tps   | percent of late tx, 4 clients
  /srt |  1 client   |  4 clients  |   100 |   200 |   400 |
   N/N | 173 +- 289* | 198 +- 531* | 27.61 | 43.92 | 61.16 |
   N/Y | 458 +- 327* | 743 +- 920* |  7.05 | 14.24 | 24.07 |
   Y/N | 169 +- 166* | 187 +- 302* |  4.01 | 39.84 | 65.70 |
   Y/Y | 546 +- 143  | 681 +- 459  |  1.55 |  3.51 |  2.84 |

The effect of sorting is very positive (+150% to 270% tps). On this run,

flushing has a positive (+20% with 1 client) or negative (-8 % with 4
clients) on throughput, and late transactions are reduced by 92-95% when
both options are activated.

Why there is dip in performance with multiple clients,


I'm not sure to see the "dip". The performances are better with 4 clients 
compared to 1 client?


can it be due to reason that we started doing more stuff after holding 
bufhdr lock in below code?


I think it is very unlikely that the buffer being locked would be 
simultaneously requested by one of the 4 clients for an UPDATE, so I do 
not think it should have a significant impact.



BufferSync() [...]



BufferSync()
{
..
- buf_id = StrategySyncStart(NULL, NULL);
- num_to_scan = NBuffers;
+ active_spaces = nb_spaces;
+ space = 0;
 num_written = 0;
- while (num_to_scan-- > 0)
+
+ while (active_spaces != 0)
..
}

The changed code doesn't seems to give any consideration to
clock-sweep point


Indeed.

which might not be helpful for cases when checkpoint could have flushed 
soon-to-be-recycled buffers. I think flushing the sorted buffers w.r.t 
tablespaces is a good idea, but not giving any preference to clock-sweep 
point seems to me that we would loose in some cases by this new change.


I do not see how to do both, as these two orders seem more or less 
unrelated?  The traditionnal assumption is that the I/O are very slow and 
they are to be optimized first, so going for buffer ordering to be nice to 
the disk looks like the priority.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewind failure by file deletion in source server

On Fri, Jun 19, 2015 at 9:22 PM, Robert Haas  wrote:
> On Fri, Jun 19, 2015 at 8:18 AM, Robert Haas  wrote:
>> On Fri, Jun 19, 2015 at 12:14 AM, Michael Paquier
>>  wrote:
 Listing the directories with pg_ls_dir() has the same problem.
>>>
>>> (After some discussion on IM with Heikki on this one).
>>> This is actually more tricky because pg_ls_dir() does not return '.'
>>> or '..' that we could use to identify that the directory actually
>>> exists or not when it is empty. Hence I think that we should add two
>>> options to pg_ls_dir:
>>> - include_self, default to false. If set to true, '.' is added in the
>>> list of items.
>>> - if_not_exists, to bypass error that a folder does not exist, default
>>> at false. If if_not_exists = true and include_self = true, returning
>>> only '.' would mean that the folder exist but that it is empty. If
>>> if_not_exists = true and include_self = false, no rows are returned.
>>> We could as well ERROR as well if both options are set like that. I am
>>> fine with any of them as long as behavior is properly documented.
>>
>> Including '.' to distinguish between an empty directory and a
>> nonexistent one seems like an unnecessarily complicated and
>> non-obvious API.  How about just one additional parameter bool
>> *exists.  If NULL and no directory, ERROR, else on return set *exists
>> to true or false.
>
> Err, wait.  You're talking about an SQL function, heh heh.  So that
> won't work.  Maybe what you proposed is the best we can do, then,
> although I would suggest that you rename the include_self parameter to
> something like include_dot_dirs and return both "." and "..".
> Returning only "." seems like it will seem weird to people.

So... Attached are a set of patches dedicated at fixing this issue:
- 0001, add if_not_exists to pg_tablespace_location, returning NULL if
path does not exist
- 0002, same with pg_stat_file, returning NULL if file does not exist
- 0003, same with pg_read_*file. I added them to all the existing
functions for consistency.
- 0004, pg_ls_dir extended with if_not_exists and include_dot_dirs
(thanks Robert for the naming!)
- 0005, as things get complex, a set of regression tests aimed to
covering those things. pg_tablespace_location is platform-dependent,
so there are no tests for it.
- 0006, the fix for pg_rewind, using what has been implemented before.

Attached is an updated test case triggering the issue
(rewind_test.bash), with the small patch attached that adds a pg_sleep
call in pg_rewind.c (20150623_pg_rewind_sleep.patch).

I imagine that this is a bug people are going to meet in the field
easily, particularly with temporary relation files or temporary XLOG
files.
Regards,
-- 
Michael
From 454fa9e67c67621b0832d7fbd90153328c99197b Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 23 Jun 2015 11:45:49 +0900
Subject: [PATCH 1/6] Extend pg_tablespace_location with if_not_exists option

This is useful for code paths that prefer receive an empty response
instead of an ERROR if tablespace specified does not exist.
---
 doc/src/sgml/func.sgml|  8 --
 src/backend/utils/adt/misc.c  | 60 ++-
 src/include/catalog/pg_proc.h |  2 ++
 src/include/utils/builtins.h  |  1 +
 4 files changed, 57 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 650051b..7929e7e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15844,9 +15844,13 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
get the set of database OIDs that have objects in the tablespace
   
   
-   pg_tablespace_location(tablespace_oid)
+   pg_tablespace_location(tablespace_oid [, if_not_exists boolean])
text
-   get the path in the file system that this tablespace is located in
+   
+Get the path in the file system that this tablespace is located in.
+If if_not_exists is true, NULL
+is returned if tablespace does not exist instead of an error.
+   
   
   
pg_typeof(any)
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index c0495d9..034728d 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -336,17 +336,24 @@ pg_tablespace_databases(PG_FUNCTION_ARGS)
 	SRF_RETURN_DONE(funcctx);
 }
 
-
 /*
- * pg_tablespace_location - get location for a tablespace
+ * Wrapper function for pg_tablespace_location and
+ * pg_tablespace_location_extended.
  */
-Datum
-pg_tablespace_location(PG_FUNCTION_ARGS)
+static Datum
+tablespace_location_wrapper(FunctionCallInfo fcinfo)
 {
 	Oid			tablespaceOid = PG_GETARG_OID(0);
 	char		sourcepath[MAXPGPATH];
 	char		targetpath[MAXPGPATH];
 	int			rllen;
+	bool		if_not_exists = false;
+
+	/*
+	 * Check for IF NOT EXISTS option.
+	 */
+	if (PG_NARGS() == 2 && !PG_ARGISNULL(1))
+		if_not_exists = PG_GETARG_BOOL(1);
 
 	/*
 	 * It's useful to apply this function to pg_class.reltablespace, wherein
@@ -373,15 +3

[HACKERS] A couple of newlines missing in pg_rewind log entries

Hi all,

Some grepping is showing up that a couple of newlines are missing in
pg_rewind, leading to unreadable log entries:
libpq_fetch.c:pg_log(PG_DEBUG, "getting file chunks");
logging.c:pg_log(PG_PROGRESS, "%*s/%s kB (%d%%) copied",
filemap.c:pg_fatal("could not stat file \"%s\": %s",

Attached is a patch to fix them.
Regards,
-- 
Michael
diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c
index 3821e9c..b877d55 100644
--- a/src/bin/pg_rewind/filemap.c
+++ b/src/bin/pg_rewind/filemap.c
@@ -256,7 +256,7 @@ process_target_file(const char *path, file_type_t type, size_t oldsize,
 	if (lstat(localpath, &statbuf) < 0)
 	{
 		if (errno != ENOENT)
-			pg_fatal("could not stat file \"%s\": %s",
+			pg_fatal("could not stat file \"%s\": %s\n",
 	 localpath, strerror(errno));
 
 		exists = false;
diff --git a/src/bin/pg_rewind/libpq_fetch.c b/src/bin/pg_rewind/libpq_fetch.c
index 8565bb9..393baa6 100644
--- a/src/bin/pg_rewind/libpq_fetch.c
+++ b/src/bin/pg_rewind/libpq_fetch.c
@@ -218,7 +218,7 @@ receiveFileChunks(const char *sql)
 	if (PQsendQueryParams(conn, sql, 0, NULL, NULL, NULL, NULL, 1) != 1)
 		pg_fatal("could not send query: %s\n", PQerrorMessage(conn));
 
-	pg_log(PG_DEBUG, "getting file chunks");
+	pg_log(PG_DEBUG, "getting file chunks\n");
 
 	if (PQsetSingleRowMode(conn) != 1)
 		pg_fatal("could not set libpq connection to single row mode\n");
diff --git a/src/bin/pg_rewind/logging.c b/src/bin/pg_rewind/logging.c
index 0e05f96..5763460 100644
--- a/src/bin/pg_rewind/logging.c
+++ b/src/bin/pg_rewind/logging.c
@@ -137,7 +137,7 @@ progress_report(bool force)
 	snprintf(fetch_size_str, sizeof(fetch_size_str), INT64_FORMAT,
 			 fetch_size / 1024);
 
-	pg_log(PG_PROGRESS, "%*s/%s kB (%d%%) copied",
+	pg_log(PG_PROGRESS, "%*s/%s kB (%d%%) copied\n",
 		   (int) strlen(fetch_size_str), fetch_done_str, fetch_size_str,
 		   percent);
 	printf("\r");

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2015-06-22 Thread Amit Kapila

On Mon, Jun 22, 2015 at 1:41 PM, Fabien COELHO  wrote:
>
>
> 
>
>> It'd be interesting to see numbers for tiny, without the overly small
>> checkpoint timeout value. 30s is below the OS's writeback time.
>
>
> Here are some tests with longer timeout:
>
> tiny2: scale=10 shared_buffers=1GB checkpoint_timeout=5min
>  max_wal_size=1GB warmup=600 time=4000
>
>   flsh |  full speed tps  | percent of late tx, 4 clients, for
tps:
>   /srt |  1 client  |  4 clients  |  100 |  200 |  400 |  800 | 1200 |
1600
>   N/N  | 930 +- 124 | 2560 +- 394 | 0.70 | 1.03 | 1.27 | 1.56 | 2.02 |
2.38
>   N/Y  | 924 +- 122 | 2612 +- 326 | 0.63 | 0.79 | 0.94 | 1.15 | 1.45 |
1.67
>   Y/N  | 907 +- 112 | 2590 +- 315 | 0.58 | 0.83 | 0.68 | 0.71 | 0.81 |
1.26
>   Y/Y  | 915 +- 114 | 2590 +- 317 | 0.60 | 0.68 | 0.70 | 0.78 | 0.88 |
1.13
>
> There seems to be a small 1-2% performance benefit with 4 clients, this
is reversed for 1 client, there are significantly and consistently less
late transactions when options are activated, the performance is more stable
> (standard deviation reduced by 10-18%).
>
> The db is about 200 MB ~ 25000 pages, at 2500+ tps it is written 40 times
over in 5 minutes, so the checkpoint basically writes everything in 220
seconds, 0.9 MB/s. Given the preload phase the buffers may be more or less
in order in memory, so may be written out in order anyway.
>
>
> medium2: scale=300 shared_buffers=5GB checkpoint_timeout=30min
>   max_wal_size=4GB warmup=1200 time=7500
>
>   flsh |  full speed tps   | percent of late tx, 4 clients
>   /srt |  1 client   |  4 clients  |   100 |   200 |   400 |
>N/N | 173 +- 289* | 198 +- 531* | 27.61 | 43.92 | 61.16 |
>N/Y | 458 +- 327* | 743 +- 920* |  7.05 | 14.24 | 24.07 |
>Y/N | 169 +- 166* | 187 +- 302* |  4.01 | 39.84 | 65.70 |
>Y/Y | 546 +- 143  | 681 +- 459  |  1.55 |  3.51 |  2.84 |
>
> The effect of sorting is very positive (+150% to 270% tps). On this run,
flushing has a positive (+20% with 1 client) or negative (-8 % with 4
clients) on throughput, and late transactions are reduced by 92-95% when
both options are activated.
>

Why there is dip in performance with multiple clients, can it be
due to reason that we started doing more stuff after holding bufhdr
lock in below code?

BufferSync()
{
..
for (buf_id = 0; buf_id < NBuffers; buf_id++)
  {
  volatile BufferDesc *bufHdr = GetBufferDescriptor(buf_id);
@@ -1621,32 +1719,185 @@ BufferSync(int flags)

  if ((bufHdr->flags & mask) == mask)
  {
+ Oid spc;
+ TableSpaceCountEntry * entry;
+ bool found;
+
  bufHdr->flags |= BM_CHECKPOINT_NEEDED;
+ CheckpointBufferIds[num_to_write] = buf_id;
  num_to_write++;
+
+ /* keep track of per tablespace buffers */
+ spc = bufHdr->tag.rnode.spcNode;
+ entry = (TableSpaceCountEntry *)
+ hash_search(spcBuffers, (void *) &spc, HASH_ENTER, &found);
+
+ if (found) entry->count++;
+ else entry->count = 1;
  }
..
}


-
BufferSync()
{
..
- buf_id = StrategySyncStart(NULL, NULL);
- num_to_scan = NBuffers;
+ active_spaces = nb_spaces;
+ space = 0;
  num_written = 0;
- while (num_to_scan-- > 0)
+
+ while (active_spaces != 0)
..
}

The changed code doesn't seems to give any consideration to
clock-sweep point which might not be helpful for cases when checkpoint
could have flushed soon-to-be-recycled buffers.  I think flushing the
sorted buffers w.r.t tablespaces is a good idea, but not giving any
preference to clock-sweep point seems to me that we would loose in
some cases by this new change.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H

2015-06-22 Thread Tomas Vondra


Hi,

On 06/22/2015 07:47 AM, Jeff Janes wrote:

On Sat, Jun 20, 2015 at 8:28 AM, Tomas Vondra
mailto:tomas.von...@2ndquadrant.com>> wrote:

Hi Tomas,


I've lobotomized the sampling a bit to really produce a random set
of blocks first, and that produces way better estimates:

statistics target estimate   random
-
100 429491 (1x)   334430766 (14x)
1000   4240418  (1000x)   439010499 (10x)

Also, the number of sampled blocks is not that different. With
target 100, the current sampler reads ~2900 blocks, while a
completely random sampler uses 3000 blocks. So, where's the benefit?


I don't know you did.  The block sampling is already random, unless
there is some overlooked bug, it can't be made more random.  Could you
post the patch?


Attached is an early experimental version of the sampling with density 
correction. It breaks the SYSTEM sampling for TABLESAMPLE, because it 
changes signature of one of the BlockSampler methods, but that can be 
fixed later. I also suspect it fails with some small tables that would 
get sampled completely with the current patch.


The density-correction idea is quite simple, and the sampling works like 
this:


  (1) randomly sample the blocks

  Generate random sequence of block numbers, and track how many
  times a particular block was sampled - so if we got block number
  K twice, we know we should sample 2 tuples from that block.

  (2) sample the tuples

  For each of the blocks, sample as many tuples as needed. Also
  note the number of tuples on the block.

  (3) determine maximum density

  Find how what is the maximum number of tuples per block.

  (4) resample the tuples

  For each block, resample the tuples using ratio vs. maximum
  density. So if a block has 50% density, each sampled tuple will
  be evicted with ~50% probability.

  (5) replace the evicted tuples (not implemented)

  In step (4) some of the tuples will get evicted from the sample,
  so we should replace them with other tuples. Not yet done, so the
  sample may get smaller than requested.

And it seems to be working quite fine. I've done various tests with 
tables specifically crafted to use different tuple widths for different 
values, like for example this:


create table test as
select
(case when i < 50 then 1 else 2 end) AS id,
(case when i < 50 then 1 else null end)::bigint as val1,
(case when i < 50 then 1 else null end)::bigint as val2,
(case when i < 50 then 1 else null end)::bigint as val3
from generate_series(1,100) s(i);

which creates a table with tuples containing 50% id=1 and 50% id=2, with 
the tuples id=1 being much wider (as id=2 have NULLs at the end). And it 
works fine - the numbers are pretty much the same as current estimates.


It also significantly improves the ndistinct estimates - for example on 
the TPC-H data set, I do get ndistinct=-1 for the l_orderkey column even 
with statistics target=100, which is way better than the 1x 
under-estimate produced by current algorithm.


I do take this as a demonstration that our ndistinct estimator is not 
such crap as some claim it to be, but that we're merely feeding it 
skewed samples. I still think the adaptive estimator is a neat idea, but 
it can't be fixed while keeping the same skewed sampling. That has to be 
addressed first ...


regards

--
Tomas Vondra   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 861048f..8669bd8 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -97,6 +97,8 @@ static void update_attstats(Oid relid, bool inh,
 static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 
+static int perform_density_correction(int samplerows, HeapTuple *rows,
+	  BlockSampler bs);
 
 /*
  *	analyze_rel() -- analyze one relation
@@ -974,18 +976,15 @@ acquire_sample_rows(Relation onerel, int elevel,
 	HeapTuple *rows, int targrows,
 	double *totalrows, double *totaldeadrows)
 {
-	int			numrows = 0;	/* # rows now in reservoir */
-	double		samplerows = 0; /* total # rows collected */
+	int			samplerows = 0; /* total # rows collected */
 	double		liverows = 0;	/* # live rows seen */
 	double		deadrows = 0;	/* # dead rows seen */
-	double		rowstoskip = -1;	/* -1 means not set yet */
 	BlockNumber totalblocks;
 	TransactionId OldestXmin;
 	BlockSamplerData bs;
-	ReservoirStateData rstate;
 
 	Assert(targrows > 0);
-
+// pg_usleep(15*1000*1000);
 	totalblocks = RelationGetNumberOfBlocks(onerel);
 
 	/* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
@@ -993,20 +992,28 @@ acquire_s

Re: [HACKERS] upper planner path-ification

2015-06-22 Thread Kouhei Kaigai

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
> Sent: Thursday, May 14, 2015 10:39 AM
> To: pgsql-hackers@postgresql.org; Tom Lane
> Subject: [HACKERS] upper planner path-ification
> 
> Hi,
> 
> I've been pulling over Tom's occasional remarks about redoing
> grouping_planner - and maybe further layers of the planner - to work
> with Paths instead of Plans.  I've had difficulty locating all of the
> relevant threads.  Here's everything I've found so far, which I'm
> pretty sure is not everything:
> 
> http://www.postgresql.org/message-id/17400.1311716...@sss.pgh.pa.us
> http://www.postgresql.org/message-id/2614.1375730...@sss.pgh.pa.us
> http://www.postgresql.org/message-id/22721.1385048...@sss.pgh.pa.us
> http://www.postgresql.org/message-id/BANLkTinDjjFHNOzESG2J2U4GOkqLu69Zqg@mai
> l.gmail.com
> http://www.postgresql.org/message-id/8479.1418420...@sss.pgh.pa.us
> 
> I think there are two separate problems here.  First, there's the
> problem that grouping_planner() is complicated.  It's doing cost
> comparisons, but all in ad-hoc fashion rather than using a consistent
> mechanic the way add_path() does.  Generally, we can plan an aggregate
> using either (1) a hashed aggregate, (2) a sorted aggregate, or (3)
> for min or max, an index scan that just grabs the highest or lowest
> value in lieu of a full table scan.  Instead of generating a plan for
> each of these possibilities, we'd like to generate paths for each one,
> and then pick one to turn into a plan.  AIUI, the hope is that this
> would simplify the cost calculations, and also make it easier to
> inject other paths, such as a path where an FDW performs the
> aggregation step on the remote server.
> 
> Second, there's the problem that we might like to order aggregates
> with respect to joins.  If we have something like SELECT DISTINCT ON
> (foo.x) foo.x, foo.y, bar.y FROM foo, bar WHERE foo.x = bar.x, then
> (a) if foo.x has many duplicates, it will be better to DISTINCT-ify
> foo and then join to bar afterwards but (b) if foo.x = bar.x is highly
> selective, it will be better to join to bar first and then
> DISTINCT-ify the result.  Currently, aggregation is always last;
> that's not great.  Hitoshi Harada's proposed strategy of essentially
> figuring out where the aggregation steps can go and then re-planning
> for each one is also not great, because each join problem will be a
> subtree of the one we use for the aggregate-last strategy, and thus
> we're wasting effort by planning the same subtrees multiple times.
> Instead, we might imagine letting grouping planner fish out the best
> paths for the joinrels that represent possible aggregation points,
> generate aggregation paths for each of those, and then work out what
> additional rels need to be joined afterwards.  That sounds hard, but
> not as hard as doing something sensible with what we have today.
>
Once we support to add aggregation path during path consideration,
we need to pay attention morphing of the final target-list according
to the intermediate path combination, tentatively chosen.
For example, if partial-aggregation makes sense from cost perspective;
like SUM(NRows) of partial COUNT(*) AS NRows instead of COUNT(*) on
billion rows, planner also has to adjust the final target-list according
to the interim paths. In this case, final output shall be SUM(), instead
of COUNT().

I think, this planner enhancement needs a mechanism to inform planner
expected final output for each Path. It means, individual Path node
can have a replacement rule: which target-entry should have what
expression instead, once this path is applied.
In above case, PartialAggPath will have an indication; SUM(nrows)
shall be applied on the location where "COUNT(*)" is originally
placed.

In case of relations join that involves paths with this replacement
rules, it may be a bit complicated if an expression takes argument
come from both relations; like CORR(inner_rel.X, outer_rel.Y),
I'm not sure whether we can have more broken-down partial aggregate.
However, relations on either of inner/outer side are responsible to
the columns in inner/outer side. Unless all the arguments in aggregate
function come from same side, it should not block aggregate push-
down as long as estimated cost makes sense.

That is a probably makes a problem when we allow to add aggregate
path as a part of partial plan tree.

Thanks,


> I'm inclined to think that it would be useful to solve the first
> problem even if we didn't solve the second one right away (but that
> might be wrong).  As a preparatory step, I'm thinking it would be
> sensible to split grouping_planner() into an outer function that would
> handle the addition of Limit and LockRows nodes and maybe planning of
> set operations, and an inner function that would handle GROUP BY,
> DISTINCT, and possibly window function planning.
> 
> Thoughts?
>
--
NEC Business Creation Division

Re: [HACKERS] less log level for success dynamic background workers for 9.5

Michael Paquier  writes:
> On Tue, Jun 23, 2015 at 10:07 AM, Robert Haas wrote:
>> On Mon, Jun 22, 2015 at 8:19 PM, Jim Nasby wrote:
>>> Anything ever happen with this? I agree that LOG is to high for reporting
>>> most (if not all) of these things.

>> I think we should consider having a flag for this behavior rather than
>> changing the behavior across the board.
>> But then again, maybe we should just change it.
>> 
>> What do others think?

> A GUC just for that looks like an overkill to me, this log is useful
> when debugging. And one could always have its bgworker call elog by
> itself at startup and before leaving to provide more or less similar
> information.

I agree that we don't need YAGUC here, particularly not one that applies
indiscriminately to all bgworkers.  I'd vote for just decreasing the log
level.  The current coding is appropriate for a facility that's basically
experimental; but as it moves towards being something that would be used
routinely in production, the argument for being noisy in the log gets
weaker and weaker.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Further issues with jsonb semantics, documentation

2015-06-22 Thread Peter Geoghegan

On Wed, Jun 10, 2015 at 11:48 AM, Andrew Dunstan  wrote:
> Please submit a patch to adjust the treatment of negative integers in the
> old functions to be consistent with their treatment in the new functions.
> i.e. in the range [-n,-1] they should refer to the corresponding element
> counting from the right.

This patch is attached, along with a separate patch which adds a
release note compatibility item.

See commit message for full details.

Thanks
-- 
Peter Geoghegan
From 03372a3611be4a8acbf79024aa3626f5ee4ac369 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Mon, 22 Jun 2015 16:00:54 -0700
Subject: [PATCH 2/2] Release note compatibility item

Note that json and jsonb extraction operators no longer consider a
negative subscript to be invalid.
---
 doc/src/sgml/release-9.5.sgml | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/doc/src/sgml/release-9.5.sgml b/doc/src/sgml/release-9.5.sgml
index 4e2ea45..cbe20ca 100644
--- a/doc/src/sgml/release-9.5.sgml
+++ b/doc/src/sgml/release-9.5.sgml
@@ -125,6 +125,17 @@

+
+ 
+  Allow json and jsonb extraction operators to
+  accept negative subscripts, which count from the end of JSON
+  arrays.  Historically, these operators yielded NULL
+  in the event of a negative subscript, because negative
+  subscripts were considered invalid.  (Peter Geoghegan, Andrew
+  Dunstan)
+ 
+
+

-- 
1.9.1

From f65b2af71b103b2b5802ac20771e59a285bb4a7f Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Sat, 6 Jun 2015 20:07:46 -0700
Subject: [PATCH 1/2] Support JSON negative array subscripts everywhere

Previously, there was an inconsistency across json/jsonb operators that
operate on datums containing JSON arrays -- only some operators
supported negative array count-from-the-end subscripting.  Specifically,
only a new-to-9.5 jsonb deletion operator had support (the new "jsonb -
integer" operator).  This inconsistency seemed likely to be
counter-intuitive to users.  To fix, allow all places where the user can
supply an integer subscript to accept a negative subscript value,
including path-orientated operators and functions, as well as other
extraction operators.  This will need to be called out as an
incompatibility in the 9.5 release notes, since it's possible that users
are relying on certain established extraction operators changed here
yielding NULL in the event of a negative subscript.

For the json type, this requires adding a way of cheaply getting the
total JSON array element count ahead of time when parsing arrays with a
negative subscript involved, necessitating an ad-hoc lex and parse.
This is followed by a "conversion" from a negative subscript to its
equivalent positive-wise value using the count.  From there on, it's as
if a positive-wise value was originally provided.

Note that there is still a minor inconsistency here across jsonb
deletion operators.  Unlike the aforementioned new "-" deletion operator
that accepts an integer on its right hand side, the new "#-" path
orientated deletion variant does not throw an error when it appears like
an array subscript (input that could be recognized by as an integer
literal) is being used on an object, which is wrong-headed.  The reason
for not being stricter is that it could be the case that an object pair
happens to have a key value that looks like an integer; in general,
these two possibilities are impossible to differentiate with rhs path
text[] argument elements.  However, we still don't allow the "#-"
path-orientated deletion operator to perform array-style subscripting.
Rather, we just return the original left operand value in the event of a
negative subscript (which seems analogous to how the established
"jsonb/json #> text[]" path-orientated operator may yield NULL in the
event of an invalid subscript).

In passing, make SetArrayPath() stricter about not accepting cases where
there is trailing non-numeric garbage bytes rather than a clean NUL
byte.  This means, for example, that strings like "10e10" are now not
accepted as an array subscript of 10 by some new-to-9.5 path-orientated
jsonb operators (e.g. the new #- operator).  Finally, remove dead code
for jsonb subscript deletion; arguably, this should have been done in
commit b81c7b409.
---
 doc/src/sgml/func.sgml|  16 --
 src/backend/utils/adt/json.c  |  39 +
 src/backend/utils/adt/jsonfuncs.c | 100 +-
 src/include/utils/jsonapi.h   |   7 +++
 src/test/regress/expected/json.out|  14 +
 src/test/regress/expected/json_1.out  |  14 +
 src/test/regress/expected/jsonb.out   |  30 ++
 src/test/regress/expected/jsonb_1.out |  30 ++
 src/test/regress/sql/json.sql |   5 ++
 src/test/regress/sql/jsonb.sql|   5 ++
 10 files changed, 231 insertions(+), 29 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 650051b..aa3e989 100644
--- a/

Re: [HACKERS] less log level for success dynamic background workers for 9.5

On Tue, Jun 23, 2015 at 10:07 AM, Robert Haas wrote:
> On Mon, Jun 22, 2015 at 8:19 PM, Jim Nasby wrote:
>> Anything ever happen with this? I agree that LOG is to high for reporting
>> most (if not all) of these things.
>
> I think we should consider having a flag for this behavior rather than
> changing the behavior across the board.
> But then again, maybe we should just change it.
>
> What do others think?

A GUC just for that looks like an overkill to me, this log is useful
when debugging. And one could always have its bgworker call elog by
itself at startup and before leaving to provide more or less similar
information.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] less log level for success dynamic background workers for 9.5

On Mon, Jun 22, 2015 at 8:19 PM, Jim Nasby  wrote:
> On 6/14/15 12:25 AM, Pavel Stehule wrote:
>> I am working on scheduler extension for 9.5. It use bgworkers
>> intensively for any task. This is reason, why I need to decrease a log
>> level - and I am thinking so parallel computing needs it due high number
>> of created and finished workers.
>>
>> It should be fixed in 9.5 - because it is limiting factor for bgworkers
>> usage in 9.5.
>
>
> Anything ever happen with this? I agree that LOG is to high for reporting
> most (if not all) of these things.

I think we should consider having a flag for this behavior rather than
changing the behavior across the board.

But then again, maybe we should just change it.

What do others think?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_*_columns?

On Sun, Jun 21, 2015 at 12:52 PM, Andres Freund  wrote:
> On 2015-06-21 12:40:50 -0400, Tom Lane wrote:
>> Andres Freund  writes:
>> > We could also just mmap() the stats file into memory in various
>> > processes. With a bit care it should be quite possible to only mmap a
>> > subsets of the file at once, taking care of the address space issues.
>>
>> I think we should go into this with the mindset of re-using the DSM
>> infrastructure, rather than inventing a new mechanism of uncertain
>> portability.
>
> Maybe. I'm rather doubtful that it's a good idea to make a choice
> that'll basically force all the stats to always be in memory though.8
>
> mmap()ing a file is one of the mechanisms for dsm, so it'd not be
> totally unproven.

But it hasn't been made to work on Windows, and is probably pretty
lightly tested elsewhere.  Besides, memory mapping a disk file has no
real advantages over a DSM that doesn't get written to disk.  I/O is a
serious problem where the stats file is concerned, and more to the
point, the process that reads the file goes around and puts it back
into memory anyway.

> In totally different crazy way we could just use the existing buffer
> manager we have and simply put the stats file in
> shared_buffers. Inventing a per-database relfilenode that doesn't
> conflict doesn't seem impossible. With some care it shouldn't be hard to
> make that stats file readable from all sessions, even if they're not
> connected to the database (e.g. autovacuum launcher).

Interesting idea.  We could consider the set of stats files a database
unto itself and reserve a low-numbered OID for it.  The obvious thing
to do is use the database's OID as the relfilenode, but then how do
you replace the stats file?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_*_columns?

On Sun, Jun 21, 2015 at 11:43 AM, Magnus Hagander  wrote:
> On Sat, Jun 20, 2015 at 11:55 PM, Robert Haas  wrote:
>> On Sat, Jun 20, 2015 at 7:01 PM, Tom Lane  wrote:
>> >> But if the structure
>> >> got too big to map (on a 32-bit system), then you'd be sort of hosed,
>> >> because there's no way to attach just part of it.  That might not be
>> >> worth worrying about, but it depends on how big it's likely to get - a
>> >> 32-bit system is very likely to choke on a 1GB mapping, and maybe even
>> >> on a much smaller one.
>> >
>> > Yeah, I'm quite worried about assuming that we can map a data structure
>> > that might be of very significant size into shared memory on 32-bit
>> > machines.  The address space just isn't there.
>>
>> Considering the advantages of avoiding message queues, I think we
>> should think a little bit harder about whether we can't find some way
>> to skin this cat.  As I think about this a little more, I'm not sure
>> there's really a problem with one stats DSM per database.  Sure, the
>> system might have 100,000 databases in some crazy pathological case,
>> but the maximum number of those that can be in use is bounded by
>> max_connections, which means the maximum number of stats file DSMs we
>> could ever need at one time is also bounded by max_connections.  There
>> are a few corner cases to think about, like if the user writes a
>> client that connects to all 100,000 databases in very quick
>> succession, we've got to jettison the old DSMs fast enough to make
>> room for the new DSMs before we run out of slots, but that doesn't
>> seem like a particularly tough nut to crack.  If the stats collector
>> ensures that it never attaches to more than MaxBackends stats DSMs at
>> a time, and each backend ensures that it never attaches to more than
>> one stats DSM at a time, then 2 * MaxBackends stats DSMs is always
>> enough.  And that's just a matter of bumping
>> PG_DYNSHMEM_SLOTS_PER_BACKEND from 2 to 4.
>>
>> In more realistic cases, it will probably be normal for many or all
>> backends to be connected to the same database, and the number of stats
>> DSMs required will be far smaller.
>
> What about a combination in the line of something like this: stats collector
> keeps the statistics in local memory as before. But when a backend needs to
> get a snapshot of it's data, it uses a shared memory queue to request it.
> What the stats collector does in this case is allocate a new DSM, copy the
> data into that DSM, and hands the DSM over to the backend. At this point the
> stats collector can forget about it, and it's up to the backend to get rid
> of it when it's done with it.

Well, there seems to be little point in having the stats collector
forget about a DSM that it could equally well have shared with the
next guy who wants a stats snapshot for the same database.  That case
is surely *plenty* common enough to be worth optimizing for.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash index creation warning

On Tue, Jun 23, 2015 at 9:06 AM, Jim Nasby  wrote:
> On 6/12/15 5:00 PM, Thom Brown wrote:
>>
>> On 18 October 2014 at 15:36, Bruce Momjian  wrote:
>>>
>>> On Fri, Oct 17, 2014 at 02:36:55PM -0400, Bruce Momjian wrote:

 On Fri, Oct 17, 2014 at 12:56:52PM -0400, Tom Lane wrote:
>
> David G Johnston  writes:
>>
>> The question is whether we explain the implications of not being
>> WAL-logged
>> in an error message or simply state the fact and let the documentation
>> explain the hazards - basically just output:
>> "hash indexes are not WAL-logged and their use is discouraged"
>
>
> +1.  The warning message is not the place to be trying to explain all
> the
> details.


 OK, updated patch attached.
>>>
>>>
>>> Patch applied.
>>
>>
>> I only just noticed this item when I read the release notes.  Should
>> we bother warning when used on an unlogged table?
>
>
> Not really; but I think the bigger question at this point is if we want to
> change it this late in the game.

Changing it even during beta looks acceptable to me. I think that it
is mainly a matter to have a patch (here is one), and someone to push
it as everybody here seem to agree that for UNLOGGED tables this
warning has little sense.
-- 
Michael
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 7340a1f..49ad9d6 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -490,7 +490,8 @@ DefineIndex(Oid relationId,
 	accessMethodId = HeapTupleGetOid(tuple);
 	accessMethodForm = (Form_pg_am) GETSTRUCT(tuple);
 
-	if (strcmp(accessMethodName, "hash") == 0)
+	if (strcmp(accessMethodName, "hash") == 0 &&
+		rel->rd_rel->relpersistence != RELPERSISTENCE_UNLOGGED)
 		ereport(WARNING,
 (errmsg("hash indexes are not WAL-logged and their use is discouraged")));
 
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 5c2e67d..b72e65d 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2342,6 +2342,9 @@ CREATE INDEX hash_txt_index ON hash_txt_heap USING hash (random text_ops);
 WARNING:  hash indexes are not WAL-logged and their use is discouraged
 CREATE INDEX hash_f8_index ON hash_f8_heap USING hash (random float8_ops);
 WARNING:  hash indexes are not WAL-logged and their use is discouraged
+CREATE UNLOGGED TABLE unlogged_hash_table (id int4);
+CREATE INDEX unlogged_hash_index ON unlogged_hash_table USING hash (id int4_ops);
+DROP TABLE unlogged_hash_table;
 -- CREATE INDEX hash_ovfl_index ON hash_ovfl_heap USING hash (x int4_ops);
 --
 -- Test functional index
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 67dd2f0..ff86953 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -684,6 +684,10 @@ CREATE INDEX hash_txt_index ON hash_txt_heap USING hash (random text_ops);
 
 CREATE INDEX hash_f8_index ON hash_f8_heap USING hash (random float8_ops);
 
+CREATE UNLOGGED TABLE unlogged_hash_table (id int4);
+CREATE INDEX unlogged_hash_index ON unlogged_hash_table USING hash (id int4_ops);
+DROP TABLE unlogged_hash_table;
+
 -- CREATE INDEX hash_ovfl_index ON hash_ovfl_heap USING hash (x int4_ops);
 
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] less log level for success dynamic background workers for 9.5


On 6/14/15 12:25 AM, Pavel Stehule wrote:

Hi

I am working on scheduler extension for 9.5. It use bgworkers
intensively for any task. This is reason, why I need to decrease a log
level - and I am thinking so parallel computing needs it due high number
of created and finished workers.

It should be fixed in 9.5 - because it is limiting factor for bgworkers
usage in 9.5.


Anything ever happen with this? I agree that LOG is to high for 
reporting most (if not all) of these things.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash index creation warning


On 6/12/15 5:00 PM, Thom Brown wrote:

On 18 October 2014 at 15:36, Bruce Momjian  wrote:

On Fri, Oct 17, 2014 at 02:36:55PM -0400, Bruce Momjian wrote:

On Fri, Oct 17, 2014 at 12:56:52PM -0400, Tom Lane wrote:

David G Johnston  writes:

The question is whether we explain the implications of not being WAL-logged
in an error message or simply state the fact and let the documentation
explain the hazards - basically just output:
"hash indexes are not WAL-logged and their use is discouraged"


+1.  The warning message is not the place to be trying to explain all the
details.


OK, updated patch attached.


Patch applied.


I only just noticed this item when I read the release notes.  Should
we bother warning when used on an unlogged table?


Not really; but I think the bigger question at this point is if we want 
to change it this late in the game.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: row_to_array function

On 6/22/15 2:46 AM, Pavel Stehule wrote:

FOREACH key, val IN RECORD myrow
LOOP
IF pg_typeof(val) IN ('int4', 'double precision', 'numeric') THEN
val := val + 1; -- these variables can be mutable
-- or maybe in futore
myrow[key] := val + 1;
END IF;
END LOOP;

What is important - "val" is automatic variable, and it can has
different type in any step.

It is little bit strange, but impossible to solve, so we cannot to
support row[var] as right value (without immutable casting). But we can
do it with left value.

Actually, you can (theoretically) solve it for the right value as well
with if val is an actual type and you have operators on that type that
know to search for a specific operator given the actual types that are
involved. So if val is int4, val + 1 becomes int4 + int4.

The problem I've run into with this is by the time you've added enough
casts to make this workable you've probably created a situation where
val + something is going to recurse back to itself. I've partially
solved this in [1], and intend to finish it by calling back in via SPI
to do the final resolution, the same way the RI triggers do.

What would be a lot better is if we had better control over function and
operator resolution.

[1]
https://github.com/decibel/variant/commit/2b99067744a405da8a325de1ebabd213106f794f#diff-8aa2db4a577ee4201d6eb9041c2a457eR846

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Progress bar for pg_dump/pg_restore


On 6/21/15 9:45 PM, Craig Ringer wrote:

And, I also understood your concern about "CREATE INDEX", but we have no way to get 
progress information of "CREATE INDEX".
>At present, I think it may be good to refer to the same time as estimated time to 
execute "COPY TO".

You could probably get a handwave-quality estimate by looking at the
average column widths for the cols included in the index plus the
number of tuples in the table. It'd be useless for expression indexes,
partial indexes, etc, but you can't have everything...


Jan Urbański demonstrated[1] getting progress stats for long running 
queries[2] at pgCon 2013. Perhaps some of that code would be useful here 
(or better yet perhaps we could include at least the measuring portion 
of his stuff in core... ;)


[1] https://www.pgcon.org/2013/schedule/events/576.en.html
[2] https://github.com/wulczer/pg-progress
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive


On 6/22/15 12:37 PM, Robert Haas wrote:

It's
not my goal here to create some kind of a performance counter system,
even though that would be valuable and could possibly be based on the
same infrastructure, but rather just to create a very simple system
that lets people know, without any developer tools, what is causing a
backend that has accepted a query and not yet returned a result to be
off-CPU rather than on-CPU.


Ilya Kosmodemiansky presented such a system at pgCon[1], and hopes to 
submit an initial patch in the coming weeks. The general idea was to do 
something similar to what you're describing (though, I believe even more 
granular) and have a bgworker accumulating that information.


[1] http://www.pgcon.org/2015/schedule/events/809.en.html
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] NULL passed as an argument to memcmp() in parse_func.c

Robert Haas  writes:
> On Mon, Jun 22, 2015 at 2:55 PM, Tom Lane  wrote:
>> If I recall that code correctly, the assumption was that if the third
>> argument is zero then memcmp() must not fetch any bytes (not should not,
>> but MUST not) and therefore it doesn't matter if we pass a NULL.  Are
>> you seeing any observable problem here, and if so what is it?

> I dunno, this seems like playing with fire to me.  A null-test would
> be pretty cheap insurance.

A null test would be a pretty cheap way of masking a bug in that logic,
if we ever introduced one; to wit, that it would cause a call with
argtypes==NULL to match anything.

Possibly saner is

if (nargs == 0 ||
memcmp(argtypes, best_candidate->args, nargs * sizeof(Oid)) == 0)
break;

I remain unconvinced that this is necessary, though.  It looks a *whole*
lot like the guards we have against old Solaris' bsearch-of-zero-entries
bug.  I maintain that what glibc has done is exactly to introduce a bug
for the zero-entries case, and that Piotr ought to complain to them
about it.  At the very least, if you commit this please annotate it
as working around a memcmp bug.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

Merlin Moncure  writes:
> On Mon, Jun 22, 2015 at 12:37 PM, Robert Haas  wrote:
>> ...  The basic idea is that pg_stat_activity.waiting would be
>> replaced by a new column pg_stat_activity.wait_event, which would
>> display the reason why that backend is waiting.

> Instead of changing the column, can't we add a new one?  Adjusting
> columns in PSA requires the innumerable queries written against it to
> be adjusted along with all the wiki instructions to dev ops for
> emergency stuck query detection etc etc.

+1.  Removing the boolean column seems like it will arbitrarily break
a whole lot of client-side code, for not-very-adequate reasons.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 release notes

2015-06-22 Thread Rajeev rastogi

On 11 June 2015 09:45, Bruce Momjian Wrote:

> 
> I have committed the first draft of the 9.5 release notes.  You can
> view the output here:
> 
>   http://momjian.us/pgsql_docs/release-9-5.html
> 
> and it will eventually appear here:
> 
>   http://www.postgresql.org/docs/devel/static/release.html
> 
> I am ready to make suggested adjustments, though I am traveling for
> conferences for the next ten days so there might a delay in my replies.

Below performance improvement in the "General Performance" category is missing:

Reduce btree scan overhead for < and > strategies

For <, <=, > and >= strategies, mark the first scan key
as already matched if scanning in an appropriate direction.
If index tuple contains no nulls we can skip the first
re-check for each tuple.

Author: Kumar Rajeev Rastogi
Committer: Simon Riggs

Thanks and Regards,
Kumar Rajeev Rastogi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] NULL passed as an argument to memcmp() in parse_func.c

On Mon, Jun 22, 2015 at 2:55 PM, Tom Lane  wrote:
> Piotr Stefaniak  writes:
>> There are two places in parse_func.c where memcmp() conditionally gets a
>> NULL as its first argument, which invokes undefined behavior. I guess
>> gcc -O2 will make some assumptions based on memcpy's __nonnull attribute.
>
> If I recall that code correctly, the assumption was that if the third
> argument is zero then memcmp() must not fetch any bytes (not should not,
> but MUST not) and therefore it doesn't matter if we pass a NULL.  Are
> you seeing any observable problem here, and if so what is it?

I dunno, this seems like playing with fire to me.  A null-test would
be pretty cheap insurance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

On Mon, Jun 22, 2015 at 4:40 PM, Merlin Moncure  wrote:
> Instead of changing the column, can't we add a new one?  Adjusting
> columns in PSA requires the innumerable queries written against it to
> be adjusted along with all the wiki instructions to dev ops for
> emergency stuck query detection etc etc.   I would also prefer to
> query 'waiting' in some cases, especially when in emergency
> situations; it's faster to type.

If people feel strongly about backward compatibility, yes, we can do
that.  However, if waiting continues to mean "on a heavyweight lock"
for backward compatibility, then you could sometimes have waiting =
false but wait_state non-null.  That seems confusing enough to be a
bad plan, at least to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-06-22 Thread Merlin Moncure

On Mon, Jun 22, 2015 at 12:37 PM, Robert Haas  wrote:
> When a PostgreSQL system wedges, or when it becomes dreadfully slow
> for some reason, I often find myself relying on tools like strace,
> gdb, or perf to figure out what is happening.  This doesn't tend to
> instill customers with confidence; they would like (quite
> understandably) a process that doesn't require installing developer
> tools on their production systems, and doesn't require a developer to
> interpret the results, and perhaps even something that they could
> connect up to PEM or Nagios or whatever alerting system they are
> using.
>
> There are obviously many ways that we might think about improving
> things here, but what I'd like to do is try to put some better
> information in pg_stat_activity, so that when a process is not
> running, users can get some better information about *why* it's not
> running.  The basic idea is that pg_stat_activity.waiting would be
> replaced by a new column pg_stat_activity.wait_event, which would
> display the reason why that backend is waiting.  This wouldn't be a
> free-form text field, because that would be too expensive to populate.
> Instead it would contain a "reason code" which would be chosen from a
> list of reason codes and translated to text for display.

Instead of changing the column, can't we add a new one?  Adjusting
columns in PSA requires the innumerable queries written against it to
be adjusted along with all the wiki instructions to dev ops for
emergency stuck query detection etc etc.   I would also prefer to
query 'waiting' in some cases, especially when in emergency
situations; it's faster to type.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-06-22 Thread David G. Johnston

On Mon, Jun 22, 2015 at 4:09 PM, Robert Haas  wrote:

> On Mon, Jun 22, 2015 at 1:59 PM, David G. Johnston
>  wrote:
> > In addition to the codes themselves I think it would aid less-experienced
> > operators if we would provide a meta-data categorization of the codes.
> > Something like, I/O Sub-System; Storage Maintenance; Concurrency, etc..
> >
> > There could be a section in the documentation with these topics as
> section
> > headings and a listing and explanation of each of the possible code would
> > then be described within.
> >
> > The meta-information is already embedded within the code/descriptions but
> > explicitly pulling them out would be, IMO, more user-friendly and likely
> > also aid in triage and speed-of-recognition when reading the
> corresponding
> > code/description.
>
> I was thinking that the codes would probably be fairly straightforward
> renderings of the underlying C identifiers, e.g.:
>
> Lock (Relation)
> Lock (Relation Extension)
> Lock (Page)
> ...
> ...
> LWLock (ShmemIndexLock)
> LWLock (OidGenLock)
> LWLock (XidGenLock)
> LWLock (ProcArrayLock)
> ...
> ...
> Spin Lock Delay
> Buffer Cleanup Lock
>
> We'd then have to figure out how to document that stuff.
>
>
Just tossing stuff at the wall...

CREATE TABLE pg_stat_wait_event_info (
wait_event integer PRIMARY KEY,
category text, --possibly a FK to a description-holding table too
event_type text, --Lock, LWLock, etc...or something higher-level; was
pondering whether ltree could be brought into core and used here...
event_documentation --asciidoc or something similar
);

Add \psql commands:

\dproc [ proc_id] --runs pg_stat_activity more-or-less
\dproc+ proc_id --shows the event information w/ description; and ideally
info from pg_locks among other possibilities

That said, the same documentation should be made available online as well -
but having this allows tools to make use the info to put the documentation
closer to the user.

David J.

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

On Mon, Jun 22, 2015 at 1:59 PM, David G. Johnston
 wrote:
> In addition to the codes themselves I think it would aid less-experienced
> operators if we would provide a meta-data categorization of the codes.
> Something like, I/O Sub-System; Storage Maintenance; Concurrency, etc..
>
> There could be a section in the documentation with these topics as section
> headings and a listing and explanation of each of the possible code would
> then be described within.
>
> The meta-information is already embedded within the code/descriptions but
> explicitly pulling them out would be, IMO, more user-friendly and likely
> also aid in triage and speed-of-recognition when reading the corresponding
> code/description.

I was thinking that the codes would probably be fairly straightforward
renderings of the underlying C identifiers, e.g.:

Lock (Relation)
Lock (Relation Extension)
Lock (Page)
...
...
LWLock (ShmemIndexLock)
LWLock (OidGenLock)
LWLock (XidGenLock)
LWLock (ProcArrayLock)
...
...
Spin Lock Delay
Buffer Cleanup Lock

We'd then have to figure out how to document that stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Time to get rid of PQnoPasswordSupplied?


On 6/22/15 9:00 AM, Tom Lane wrote:

Jim Nasby  writes:

On 6/19/15 10:35 AM, Tom Lane wrote:

On the other hand, you could argue that improving the string is going
to break clients that do the right thing (even if klugily) in order
to help clients that are doing the wrong thing (ie, failing without
offering the opportunity to enter a password).  Ideally no client app
would ever show this message to users and so its readability would not
matter.



Could we return a HINT? Or is that part of the same string?


Unfortunately no, there's no out-of-band additions possible here.

It strikes me that my argument above is too myopic, because I was only
thinking about cases where the client can plausibly do an interactive
prompt for password.  If it cannot (eg, psql --no-password, or perhaps
the process has no controlling terminal) then what you're going to see
is whatever message libpq returns.  So if people feel that this message
is not clear enough, maybe it's time to break compatibility and change it.

I do not follow Craig's argument that this is somehow connected to the
wire protocol version.  It's not; it's part of the libpq-to-client API.
If there ever is a protocol version 4, it will almost certainly not
trigger significant changes in that API --- there might be additions,
but not incompatible changes.  So if we think we can't change that
message now, then face it, we never will.


Yeah, looking at Craig's extensive review, it seems most of those places 
wouldn't actually prompt for a password, so I think it's best that we 
just change this.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] NULL passed as an argument to memcmp() in parse_func.c

Piotr Stefaniak  writes:
> On 06/22/2015 08:55 PM, Tom Lane wrote:
>> If I recall that code correctly, the assumption was that if the third
>> argument is zero then memcmp() must not fetch any bytes (not should not,
>> but MUST not) and therefore it doesn't matter if we pass a NULL.

> It's not about fetching any bytes, it's about passing an invalid pointer 
> (a null pointer in this case) which gives a compiler the opportunity to 
> apply an optimization. For example, glibc has memcpy marked as 
> __nonnull(1,2).

If they do, that's a glibc bug, per the above observation.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] NULL passed as an argument to memcmp() in parse_func.c

Piotr Stefaniak  writes:
> There are two places in parse_func.c where memcmp() conditionally gets a 
> NULL as its first argument, which invokes undefined behavior. I guess 
> gcc -O2 will make some assumptions based on memcpy's __nonnull attribute.

If I recall that code correctly, the assumption was that if the third
argument is zero then memcmp() must not fetch any bytes (not should not,
but MUST not) and therefore it doesn't matter if we pass a NULL.  Are
you seeing any observable problem here, and if so what is it?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Further issues with jsonb semantics, documentation


On 6/5/15 3:51 PM, Alvaro Herrera wrote:

Jim Nasby wrote:

On 6/5/15 2:08 PM, Petr Jelinek wrote:

That's a good point, and it won't get any better if/when we add the json
point support in 9.6 since the syntax would be something like select
jsonb '{"a":"1", "b":"2", "c": {"a": "2"}}' - '/c/a'; and we will again
silently do nothing. That's going to cause bugs in applications using this.


Yeah, this is a miniature version of the pain I've felt with variant: trying
to get sane casting for a data type that encompasses other types in current
Postgres is essentially impossible.


I'm not sure this is the same problem.  But anyway I think requiring
explicit casts in this stuff is a good thing -- relying on implicit
cast to text, when most useful behavior uses other types, seems bad.


I'm not sure I'm following, at least for jsonb. If there's only jsonb - 
json_pointer operator, why shouldn't we be able to resolve it? I suspect 
part of the answer to that problem is that we need to make the 
resolution of unknown smarter, or perhaps somehow configurable.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pg_upgrade fails when postgres/template1 isn't in default tablespace

On Fri, Jun 19, 2015 at 12:10 PM, Marti Raudsepp  wrote:
> One of my databases failed to upgrade successfully and produced this error
> in the copying phase:
>
> error while copying relation "pg_catalog.pg_largeobject"
> ("/srv/ssd/PG_9.3_201306121/1/12023" to "/PG_9.4_201409291/1/12130"): No
> such file or directory
>
> Turns out this happens when either the "postgres" or "template1" databases
> have been moved to a non-default tablespace. pg_dumpall does not dump
> attributes (such as tablespace) for these databases; pg_upgrade queries the
> new cluster about the tablespace for these relations and builds a broken
> destination path for the copy/link operation.
>
> The least bad solution seems to be generating ALTER DATBASE SET TABLESPACE
> commands for these from pg_dumpall. Previously a --globals-only dump didn't
> generate psql \connect commands, but you can't run SET TABLESPACE from
> within the same database you're connected to. So to move "postgres", it
> needs to connect to "template1" and vice versa. That seems fine for the
> purpose of pg_upgrade which can assume a freshly created cluster with both
> databases intact.
>
> I have implemented a proof of concept patch for this. Currently I'm only
> tackling the binary upgrade failure and not general pg_dumpall.

I haven't looked at the patch, but this seems like a good thing to
fix.  Hopefully Bruce will take a look soon.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog --create-slot-if-not-exists

On Fri, Jun 19, 2015 at 12:00 PM, Cédric Villemain
 wrote:
> +1 for it in 9.5

If we can do it soon, sure.  But not in September.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tab completion for TABLESAMPLE

On Fri, Jun 19, 2015 at 4:01 PM, Brendan Jurd  wrote:
> On Fri, 19 Jun 2015 at 21:05 Petr Jelinek  wrote:
>>
>> On 2015-06-19 09:08, Brendan Jurd wrote:
>> > I
>> > think it would be convenient and user-friendly to complete the opening
>> > bracket -- it would make it perfectly clear that an argument is required
>> > for the syntax to be valid.
>> >
>>
>> Agreed, here is updated version which does that.
>
>
> Thanks Petr, I have tested your v2 and it works well.  This is ready for
> committer.

Committed, thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-06-22 Thread David G. Johnston

On Mon, Jun 22, 2015 at 1:37 PM, Robert Haas  wrote:

>  and doesn't require a developer to
> interpret the results,

[...]

> We could
> also invent codes for things like "I'm doing a pg_usleep because I've
> exceeded max_spins_per_delay" and "I'm waiting for a cleanup lock on a
> buffer" and maybe a few others.
>
>
In addition to the codes themselves I think it would aid less-experienced
operators if we would provide a meta-data categorization of the codes.
Something like, I/O Sub-System; Storage Maintenance; Concurrency, etc..

There could be a section in the documentation with these topics as section
headings and a listing and explanation of each of the possible code would
then be described within.

The meta-information is already embedded within the code/descriptions but
explicitly pulling them out would be, IMO, more user-friendly and likely
also aid in triage and speed-of-recognition when reading the corresponding
code/description.

David J.

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-06-22 Thread Joshua D. Drake



On 06/22/2015 10:37 AM, Robert Haas wrote:


 I'm less sure about this next part, but I think we
might also want to report ourselves as waiting when we are doing an OS
read or an OS write, because it's pretty common for people to think
that a PostgreSQL bug is to blame when in fact it's the operating
system that isn't servicing our I/O requests very quickly.  We could
also invent codes for things like "I'm doing a pg_usleep because I've
exceeded max_spins_per_delay" and "I'm waiting for a cleanup lock on a
buffer" and maybe a few others.


This would be a great improvement. Many, many times the problem really 
has nothing to do with PostgreSQL. It is a relation falling out of 
cache, swapping, a process waiting on IO to be allocated to it. If it is 
possible to have a view within PostgreSQL that allows us to see that, it 
would be absolutely awesome.


It would be great if we could somehow monitor what the postgresql 
processes are doing within PostgreSQL. Imagine if we had pgsar ...


Sincerely,

jD


--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

When a PostgreSQL system wedges, or when it becomes dreadfully slow
for some reason, I often find myself relying on tools like strace,
gdb, or perf to figure out what is happening.  This doesn't tend to
instill customers with confidence; they would like (quite
understandably) a process that doesn't require installing developer
tools on their production systems, and doesn't require a developer to
interpret the results, and perhaps even something that they could
connect up to PEM or Nagios or whatever alerting system they are
using.

There are obviously many ways that we might think about improving
things here, but what I'd like to do is try to put some better
information in pg_stat_activity, so that when a process is not
running, users can get some better information about *why* it's not
running.  The basic idea is that pg_stat_activity.waiting would be
replaced by a new column pg_stat_activity.wait_event, which would
display the reason why that backend is waiting.  This wouldn't be a
free-form text field, because that would be too expensive to populate.
Instead it would contain a "reason code" which would be chosen from a
list of reason codes and translated to text for display.  Internally,
pgstat_report_waiting() would be changed to take an integer argument
rather than a Boolean (possibly uint8 would be enough, certainly
uint16 would be), and called from more places.  It would continue to
use an ordinary store into shared memory, with no atomic ops or
locking.

Currently, the only time we report a process as waiting is when it is
waiting for a heavyweight lock.  I'd like to make that somewhat more
fine-grained, by reporting the type of heavyweight lock it's awaiting
(relation, relation extension, transaction, etc.).  Also, I'd like to
report when we're waiting for a lwlock, and report either the specific
fixed lwlock for which we are waiting, or else the type of lock (lock
manager lock, buffer content lock, etc.) for locks of which there is
more than one.  I'm less sure about this next part, but I think we
might also want to report ourselves as waiting when we are doing an OS
read or an OS write, because it's pretty common for people to think
that a PostgreSQL bug is to blame when in fact it's the operating
system that isn't servicing our I/O requests very quickly.  We could
also invent codes for things like "I'm doing a pg_usleep because I've
exceeded max_spins_per_delay" and "I'm waiting for a cleanup lock on a
buffer" and maybe a few others.

I realize that in many cases these states will be quite transient and
you won't see them in pg_stat_activity for very long before they
vanish; whether you can catch them at all is quite uncertain.  It's
not my goal here to create some kind of a performance counter system,
even though that would be valuable and could possibly be based on the
same infrastructure, but rather just to create a very simple system
that lets people know, without any developer tools, what is causing a
backend that has accepted a query and not yet returned a result to be
off-CPU rather than on-CPU.  In the cases where there are many
backends, you may be able to see non-NULL results often enough to get
a sense of where the problem is; or in the case where there's one
backend that is persistently stuck, you will hopefully be able to tell
where it's stuck.

Comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Auto-vacuum is not running in 9.1.12

2015-06-22 Thread Prakash Itnal

Hi Tom/Alvaro,

Kindly let us know if the correction provided in previous mail is fine or
not! Current code any way handle scenario-1 whereas it is still vulnerable
to scenario-2.

>From previous mail:
*Scenario-1:* current_time (2015) -> changed_to_past (1995) ->
stays-here-for-half-day -> corrected to current_time (2015)
*Scenario-2:* current_time (2015) -> changed_to_future (2020) ->
stays-here-for-half-day -> corrected to current_time (2015)

We are waiting for your response.

On Sun, Jun 21, 2015 at 2:56 PM, Prakash Itnal  wrote:

> Hi,
>
> To my understanding it will probably not open doors for worst situations!
> Please correct if my below understanding is correct.
>
> The latch will wake up under below three situations:
> a) Socket error (=> result is set to negative number)
> b) timeout (=> result is set to TIMEOUT)
> c) some event arrived on socket (=> result is set to non-zero value, if
> caller registers for arrived events otherwise no value is set)
>
> Given the above conditions, the result can be zero only if there is an
> unregistered event which breaks the latch (*). In such case, current
> implementation evaluates the remaining sleep time. This calculation is
> making the situation worst, if time goes back.
>
> The time difference between cur_time (current time) and start_time (time
> when latch started) should always be a positive integer because cur_time is
> always greater than start_time under all normal conditions.
>
> delta_timeout = cur_time - start_time;
>
> The difference can be negative only if time shifts to past. So it is
> possible to detect if time shifted to past. When it is possible to detect
> can it be possible to correct? I think we can correct and prevent long
> sleeps due to time shifts.
>
> Currently I treat it as TIMEOUT, though conceptually it is not. The ideal
> solution would be to leave this decision to the caller of WaitLatch(). With
> my little knowledge of postgres code, I think TIMEOUT would be fine!
>
>
> (*) The above description is true only for timed wait. If latch is started
> with blocking wait (no timeout) then above logic is not applicable.
>
> On Sat, Jun 20, 2015 at 10:01 PM, Tom Lane  wrote:
>
>> Prakash Itnal  writes:
>> > Sorry for the late response. The current patch only fixes the scenario-1
>> > listed below. It will not address the scenario-2. Also we need a fix in
>> > unix_latch.c where the remaining sleep time is evaluated, if latch is
>> woken
>> > by other events (or result=0). Here to it is possible the latch might
>> go in
>> > long sleep if time shifts to past time.
>>
>> Forcing WL_TIMEOUT if the clock goes backwards seems like quite a bad
>> idea to me.  That seems like a great way to make a bad situation worse,
>> ie it induces failures where there were none before.
>>
>> regards, tom lane
>>
>
>
>
> --
> Cheers,
> Prakash
>



-- 
Cheers,
Prakash

Re: [HACKERS] pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H

2015-06-22 Thread Tomas Vondra




On 06/22/2015 07:21 AM, Jeff Janes wrote:

On Sat, Jun 20, 2015 at 9:55 AM, Tomas Vondra
mailto:tomas.von...@2ndquadrant.com>> wrote:

Hi,

On 06/20/2015 03:01 AM, Jeff Janes wrote:



 I don't think we need to really assume the density to be
constant,
 maybe we can verify that while collecting the sample? I
mean, we're
 already reading the pages, so we can track the density, and
either
 do the correction or not.


Maybe.  I don't know how that would work.


I was thinking about something like

   (1) compute average tuples per page

   (2) generate random sample of 'samplesize' blocks

   (3) for each block, see what is the actual number of tuples on the
   page and perform correction (e.g. if there's only 1/2 the average
   tuples, toss a coin and decide whether to really sample the
block)

   (4) repeat until you have sample of the requested size


Are you producing a block sample or a row sample?  If a block is
sampled, then what do you feed it to?


I'm not sure what's the question. The ultimate goal is to get a random 
row sample, but we need to choose which blocks to sample first. So you 
can just generate K random numbers (where K is the requested sample 
size). If you get the block number once, sample one row, if you get the 
block twice, sample two rows, etc.


Of course, this is based on the assumption of uniform tuple density, 
which allows you to choose block first and then independently a row from 
that block. Hence the correction idea I outlined in the previous message.



Are you just picking one row out of each block that survives step 3?
If so, that would be similar to my idea of counting a given value
only once per block it was in, except I was thinking of applying that
only to n_distinct, not to the entire stats collection process.


No, I'm not doing that, and I think it's a bad idea in general. The 
problem with the current sampling is that it does not produce a random 
sample of rows, but something skewed, which causes serious trouble in 
the estimators processing the sample. I think this 'count only once' 
would result in similar issues (e.g. for low-cardinality columns).




My answer was to take out the block sampling entirely and read the
whole table. That is what probably won't work for you. (Come to think
of it, I was hesitant to deploy custom code to production, so I never
actually deployed that. Instead I cranked up the stats target and let
the chips fall where they may)


Yeah, reading the whole table is not going to fly I guess. But the 
estimates would be very accurate! ;-)




I think you want to go back to the table to find new blocks to
replace ones which were included in the original block sample but
ended up having no tuples in the end tuple sample, either because
they were low density blocks, or just through the luck of the draw.
But I don't see how fixes the problem, unless you also prevent a
block from contributing more than 1 tuple. And at that point, I worry
about the uneven density again. If a block has 2 rows selected by
pure random chance, I think it would be fine to keep only one (or, go
find another random block to pull one row from as a replacement). But
if it has 2 rows selected because it has more rows than average, then
we would have to pull the replacement from a random block *of similar
density* to avoid swapping one bias for another one.


Hmmm, maybe limiting the sample to just 1 tuple is a good idea, but I 
have to think about that a bit more.


The basic idea behind density correction is that when you get a block 
with density significantly different from the average (let's say more 
than 10%?), you can either treat it as a partial block (density below 
average), or a collection of multiple blocks (density above average), 
and see if it really sampled.


Handling the 'partial block' seems rather simple - if the block has half 
the average density, treat do a random coin toss and either keep or 
discard the tuple.


With high density blocks, it's probably a complicated, and I think it 
really means treating it as multiple 'virtual' blocks, and only keep 
samples from the first one.


And of course, this needs to be repeated if some of the rows get evicted 
because of the correction mechanism. Which will lead to slightly larger 
samples, but I don't see that as a major problem (and the difference is 
not that big, at least not in the case discussed in this thread previously).


--
Tomas Vondra   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PGXS "check" target forcing an install ?

2015-06-22 Thread Sandro Santilli

I've noted that upgrading from PostgreSQL 9.3 to 9.5 I'm suddenly
unable to specify a "check" rule in the Makefile that includes the
PGXS one. The error is:

 $ make check
 rm -rf ''/tmp_install
 make -C '/home/postgresql-9.5/lib/pgxs/src/makefiles/../..' 
DESTDIR=''/tmp_install install
 make[1]: Entering directory `/home/postgresql-9.5/lib/pgxs'
 make[1]: *** No rule to make target `install'.  Stop.
 make[1]: Leaving directory `/home/postgresql-9.5/lib/pgxs'
 make: *** [temp-install] Error 2

I tracked the dangerous -rf to come from Makefile.global and it's empty
string being due to abs_top_builddir not being define in my own Makefile.

But beside that, which I can probably fix, it doesn't sound correct
that a "check" rule insists in finding an "install" rule. I'm also
surprised that there's no warning coming out from the "make" invocation
given I'm defining a "check" rule myself (after inclusion).

Minimal Makefile reproducing the error:

  PGXS := /home/postgresql-9.3/lib/pgxs/src/makefiles/pgxs.mk # succeeds
  PGXS := /home/postgresql-9.5/lib/pgxs/src/makefiles/pgxs.mk # fails
  include $(PGXS)
  check:
echo "Checking"

To verify, just run "make check"

--strk; 

  ()   Free GIS & Flash consultant/developer
  /\   http://strk.keybit.net/services.html


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Time to get rid of PQnoPasswordSupplied?

Jim Nasby  writes:
> On 6/19/15 10:35 AM, Tom Lane wrote:
>> On the other hand, you could argue that improving the string is going
>> to break clients that do the right thing (even if klugily) in order
>> to help clients that are doing the wrong thing (ie, failing without
>> offering the opportunity to enter a password).  Ideally no client app
>> would ever show this message to users and so its readability would not
>> matter.

> Could we return a HINT? Or is that part of the same string?

Unfortunately no, there's no out-of-band additions possible here.

It strikes me that my argument above is too myopic, because I was only
thinking about cases where the client can plausibly do an interactive
prompt for password.  If it cannot (eg, psql --no-password, or perhaps
the process has no controlling terminal) then what you're going to see
is whatever message libpq returns.  So if people feel that this message
is not clear enough, maybe it's time to break compatibility and change it.

I do not follow Craig's argument that this is somehow connected to the
wire protocol version.  It's not; it's part of the libpq-to-client API.
If there ever is a protocol version 4, it will almost certainly not
trigger significant changes in that API --- there might be additions,
but not incompatible changes.  So if we think we can't change that
message now, then face it, we never will.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension support for postgres_fdw

Jim Nasby  writes:
> On 6/20/15 12:19 PM, Tom Lane wrote:
>> Note that no matter what the details are, something like this is putting
>> the onus on the DBA to mark as transmittable only functions that actually
>> are safe to transmit, ie they exist*and have identical semantics*  on the
>> remote.  I think that's fine as long as it's clearly documented.

> That seems like potentially a lot of extra work. We have the actual 
> function body/definition for all but C functions, perhaps we could 
> automatically map calls when the definitions are identical.

Huh?  No, we don't have that, and even if we did,

(1) 95% of the functions of interest *are* C functions.

(2) It's probably unsafe ever to transmit non-C functions; there are too
many ways in which PL-language functions might be environment sensitive.
To take just one example, postgres_fdw runs the remote session with a
restricted search_path, which may not be what user-defined functions are
expecting.

(3) In general, determining the behavior of a function is equivalent to
solving the halting problem.  SQL-language functions with sufficiently
simple bodies might be an exception --- but those probably got inlined
into the query before the FDW ever saw them, so it's irrelevant whether
we could deduce that they're safe to transmit.

It's possible that at some point we'd let the DBA override (2) and mark PL
functions as transmittable anyway; but for certain, this will be on an
"if you break it you get to keep both pieces" basis.  We'd be nuts to do
it automatically.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] user space function "is_power_user"

2015-06-22 Thread Pavel Stehule

2015-06-22 11:20 GMT+02:00 Heikki Linnakangas :

> On 06/22/2015 09:51 AM, Pavel Stehule wrote:
>
>> Hi
>>
>> I often need a function for identification if current user is database
>> owner or is superuser.
>>
>> It can be pretty simply implemented in C level.
>>
>> Do you think it should be available in core?
>>
>
> current_setting('is_superuser');
>

there is second constraint

OR "is database owner"

Pavel


>
> - Heikki
>
>

Re: [HACKERS] user space function "is_power_user"

2015-06-22 Thread Heikki Linnakangas


On 06/22/2015 09:51 AM, Pavel Stehule wrote:

Hi

I often need a function for identification if current user is database
owner or is superuser.

It can be pretty simply implemented in C level.

Do you think it should be available in core?


current_setting('is_superuser');

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing






It'd be interesting to see numbers for tiny, without the overly small
checkpoint timeout value. 30s is below the OS's writeback time.


Here are some tests with longer timeout:

tiny2: scale=10 shared_buffers=1GB checkpoint_timeout=5min
 max_wal_size=1GB warmup=600 time=4000

  flsh |  full speed tps  | percent of late tx, 4 clients, for tps:
  /srt |  1 client  |  4 clients  |  100 |  200 |  400 |  800 | 1200 | 1600
  N/N  | 930 +- 124 | 2560 +- 394 | 0.70 | 1.03 | 1.27 | 1.56 | 2.02 | 2.38
  N/Y  | 924 +- 122 | 2612 +- 326 | 0.63 | 0.79 | 0.94 | 1.15 | 1.45 | 1.67
  Y/N  | 907 +- 112 | 2590 +- 315 | 0.58 | 0.83 | 0.68 | 0.71 | 0.81 | 1.26
  Y/Y  | 915 +- 114 | 2590 +- 317 | 0.60 | 0.68 | 0.70 | 0.78 | 0.88 | 1.13

There seems to be a small 1-2% performance benefit with 4 clients, this is 
reversed for 1 client, there are significantly and consistently less late 
transactions when options are activated, the performance is more stable

(standard deviation reduced by 10-18%).

The db is about 200 MB ~ 25000 pages, at 2500+ tps it is written 40 times 
over in 5 minutes, so the checkpoint basically writes everything in 220 
seconds, 0.9 MB/s. Given the preload phase the buffers may be more or less 
in order in memory, so may be written out in order anyway.



medium2: scale=300 shared_buffers=5GB checkpoint_timeout=30min
  max_wal_size=4GB warmup=1200 time=7500

  flsh |  full speed tps   | percent of late tx, 4 clients
  /srt |  1 client   |  4 clients  |   100 |   200 |   400 |
   N/N | 173 +- 289* | 198 +- 531* | 27.61 | 43.92 | 61.16 |
   N/Y | 458 +- 327* | 743 +- 920* |  7.05 | 14.24 | 24.07 |
   Y/N | 169 +- 166* | 187 +- 302* |  4.01 | 39.84 | 65.70 |
   Y/Y | 546 +- 143  | 681 +- 459  |  1.55 |  3.51 |  2.84 |

The effect of sorting is very positive (+150% to 270% tps). On this run, 
flushing has a positive (+20% with 1 client) or negative (-8 % with 4 
clients) on throughput, and late transactions are reduced by 92-95% when 
both options are activated.


At 550 tps checkpoints are xlog-triggered and write about 1/3 of the 
database, (17 buffers to write very 220-260 seconds, 4 MB/s).


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: row_to_array function

2015-06-22 Thread Pavel Stehule

Hi

2015-06-22 5:18 GMT+02:00 Craig Ringer :

> On 2 April 2015 at 01:59, Merlin Moncure  wrote:
> > On Sun, Mar 29, 2015 at 1:27 PM, Tom Lane  wrote:
> >> Pavel Stehule  writes:
> >>> here is rebased patch.
> >>> It contains both patches - row_to_array function and foreach array
> support.
> >>
> >> While I don't have a problem with hstore_to_array, I don't think that
> >> row_to_array is a very good idea; it's basically encouraging people to
> >> throw away SQL datatypes altogether and imagine that everything is text.
> >> They've already bought into that concept if they are using hstore or
> >> json, so smashing elements of those containers to text is not a problem.
> >> But that doesn't make this version a good thing.
> >>
> >> (In any case, those who insist can get there through row_to_json, no?)
> >
> > You have a point.  What does attached do that to_json does not do
> > besides completely discard type information?  Our json api is pretty
> > rich and getting richer.  For better or ill, we dumped all json
> > support into the already stupendously bloated public namespace and so
> > it's always available.
>
>
> I can see plenty of utility for a function like Pavel speaks of, but
> I'd personally rather see it as a function that returns table (colname
> name, coltype regtype, coltypmod integer, coltextvalue text,
> colordinal integer) so it can carry more complete information and
> there's no need to worry about foreach(array). The main use of a
> function that includes text representations of the values would IMO be
> using it from plain SQL, rather than PL/PgSQL, when faced with
> anonymous records.
>
> I'd find it more useful to have lvalue-expressions for dynamic access
> to record fields and a function to get record metadata - field names,
> types and typmods. Some kind of "pg_get_record_info(record) returns
> table(fieldname text, fieldtype regtype, fieldtypmod integer)" and a
> PL/PgSQL lvalue-expression for record field access like
> "RECORD_FIELD(therecord, fieldname)". I would _certainly_ want to be
> able to get the type metadata without the values.
>
> That way you could interact natively with the fields in their true
> types, without forcing conversion into and out of 'text', which is a
> known performance pain-point with PL/PgSQL. (PL/PgSQL doesn't have a
> VARIANT type or support for using 'anyelement', which would be the
> other way to solve the type flattening problem IMO).
>
> Think:
>
> DECLARE
> myrow record;
> fi record;
> BEGIN
> EXECUTE user_supplied_dynamic_query INTO myrow;
> FOR fi IN
> SELECT fieldname, fieldtype, fieldtypmod
> FROM pg_get_record_info(myrow)
> LOOP
> IF fi.fieldtype == 'int4'::regtype THEN
> RECORD_FIELD(myrow, fi.fieldname) := RECORD_FIELD(myrow,
> fi.fieldname) + 1;
> END IF;
> END LOOP;
> END;
>

I am thinking so this is separate task, that should not be solved simply
too. I wrote a set functions for working with record (
https://github.com/okbob/pltoolbox/blob/master/record.c). But it doesn't
solve the basic issues:

1. speed - FOR IN SELECT FROM is more expensive then just unpacking row or
record
2. unclean game with creating more code path for any special type.

I have little bit different idea. FOR IN RECORD can change type of any
automatic variable in any iteration. Internally we can do more code paths -
so your code can be rewritten to

FOREACH key, val IN RECORD myrow
LOOP
  IF pg_typeof(val) IN ('int4', 'double precision', 'numeric') THEN
val := val + 1; -- these variables can be mutable
-- or maybe in futore
   myrow[key] := val + 1;
  END IF;
END LOOP;

What is important - "val" is automatic variable, and it can has different
type in any step.

It is little bit strange, but impossible to solve, so we cannot to support
row[var] as right value (without immutable casting). But we can do it with
left value.




>
> OK, so it's a stupid example - increment all int4 fields by one. It
> conveys the rough idea though - native use of the field types.
>
> Note that RECORD_FIELD is distinct from the existing support for
>
> EXECUTE format('SELECT $1.%I', fieldname) USING therecord;
>
> in that that approach doesn't work for all ways that a record can be
> produced, it's slow, it doesn't have a good way to enumerate field
> names, and there's no equivalent to write to the field. Current
> approaches for that are ghastly:
> http://stackoverflow.com/q/7711432/398670 .
>
>
>
>
>
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: [HACKERS] checkpointer continuous flushing