Re: [HACKERS] PATCH: Reducing lock strength of trigger and foreign key DDL

2014-12-24 Thread Michael Paquier
On Sun, Dec 14, 2014 at 5:45 AM, Andreas Karlsson andr...@proxel.se wrote:
 get_rule_expr() relies heavily on the catcache which to me does not look
 like it could easily be (and probably not even should be) made to use the
 current snapshot. Refactoring ruleutils.c to rely less no the catcache seems
 like a reasonable thing to do if we want to reduce weirdness of how it
 ignores MVCC but it is quite a bit of work and I fear it could give us
 performance regressions.
 Do you have any ideas for how to fix get_rule_expr()?
Agreed. There are 20 calls to SearchSysCache in ruleutils.c, let's not
focus on that for now though and get things right for this patch.

 Is this patch worthwhile even without reducing the lock levels of the drop 
 commands?
Yes. IMV, it is better to do this work slowly and incrementally.

Here are some comments about this patch:
1) There is no documentation. Could you update mvcc.sgml for SHARE ROW
EXCLUSIVE?
2) Some isolation tests would be welcome, the point of the feature is
to test if SELECT and SELECT FOR SHARE/UPDATE are allowed while
running one of the command mentioned above.
3) This patch breaks the isolation test alter-table-1.
Regards,
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: hash_create API changes (was Re: [HACKERS] speedup tidbitmap patch: hash BlockNumber)

2014-12-24 Thread Andres Freund
On 2014-12-24 00:27:39 -0600, Jim Nasby wrote:
 pgbench -S -T10 -c 4 -j 4
 master:
 tps = 9556.356145 (excluding connections establishing)
 tps = 9897.324917 (excluding connections establishing)
 tps = 9287.286907 (excluding connections establishing)
 tps = 10210.130833 (excluding connections establishing)
 
 XXH32:
 tps = 32462.754437 (excluding connections establishing)
 tps = 33232.144511 (excluding connections establishing)
 tps = 33082.436760 (excluding connections establishing)
 tps = 33597.904532 (excluding connections establishing)

FWIW, I don't believe these results for one second. It's quite plausible
that there's a noticeable performance benefit, but a factor of three is
completely unrealistic... Could you please recheck?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-12-24 Thread Dilip kumar
On 19 December 2014 16:41, Amit Kapila Wrote,

One idea is to send all the stages and corresponding Analyze commands
to server in one go which means something like
BEGIN; SET default_statistics_target=1; SET vacuum_cost_delay=0;
  Analyze t1; COMMIT;
BEGIN; SET default_statistics_target=10; RESET vacuum_cost_delay;
  Analyze t1; COMMIT;
BEGIN; RESET default_statistics_target;
 Analyze t1; COMMIT;

Case1:In Case for CompleteDB:

In base code first it will process all the tables in stage 1 then in stage2 and 
so on, so that at some time all the tables are analyzed at least up to certain 
stage.

But If we process all the stages for one table first, and then take the other 
table for processing the stage 1, then it may happen that for some table all 
the stages are processed,
but others are waiting for even first stage to be processed, this will affect 
the functionality for analyze-in-stages.

Case2: In case for independent tables like –t “t1” –t “t2”

In base code also currently we are processing all the stages for first table 
and processing same for next table and so on.

I think, if user is giving multiple tables together then his purpose might be 
to analyze those tables together stage by stage,
but in our code we analyze table1 in all stages and then only considering the 
next table.

So for tables also it should be like
Stage1:
T1
T2
..
Stage2:
T1
T2
…

Thoughts?

Now, still parallel operations in other backends could lead to
page misses, but I think the impact will be minimized.

Regards,
Dilip







Re: [HACKERS] replicating DROP commands across servers

2014-12-24 Thread Andres Freund
On 2014-12-24 21:54:20 +1300, David Rowley wrote:
 On 24 December 2014 at 07:41, Alvaro Herrera alvhe...@2ndquadrant.com
 wrote:
 
  I have pushed patches 0001 through 0004, with some revisions.  Only the
  final part is missing.
 
 
 Hi Alvaro,
 
 Would you be able to commit the attached? It just fixes a new compiler
 warning that I'm seeing on MSVC.
 
 src\backend\parser\parse_type.c(795): warning C4715: 'typeStringToTypeName'
 : not all control paths return a value [D:\Postgres\a\postgres.vcxproj]

Pushed.

I really wonder if we can't make msvc reliably recognize this kind of
scenario - especially this case is pretty trivial?

Which of:
#if defined(HAVE__BUILTIN_UNREACHABLE)  !defined(USE_ASSERT_CHECKING)
#define pg_unreachable() __builtin_unreachable()
#elif defined(_MSC_VER)  !defined(USE_ASSERT_CHECKING)
#define pg_unreachable() __assume(0)
#else
#define pg_unreachable() abort()
#endif

does your build end up using? Does changing things around make it
recognize this and similar cases?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-24 Thread Fujii Masao
On Fri, Dec 19, 2014 at 12:19 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Thu, Dec 18, 2014 at 5:27 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Thanks!
 Thanks for your input.

 +else
 +memcpy(compression_scratch, page, page_len);

 I don't think the block image needs to be copied to scratch buffer here.
 We can try to compress the page directly.
 Check.

 +#include utils/pg_lzcompress.h
  #include utils/memutils.h

 pg_lzcompress.h should be after meutils.h.
 Oops.

 +/* Scratch buffer used to store block image to-be-compressed */
 +static char compression_scratch[PGLZ_MAX_BLCKSZ];

 Isn't it better to allocate the memory for compression_scratch in
 InitXLogInsert()
 like hdr_scratch?
 Because the OS would not touch it if wal_compression is never used,
 but now that you mention it, it may be better to get that in the
 context of xlog_insert..

 +uncompressed_page = (char *) palloc(PGLZ_RAW_SIZE(header));

 Why don't we allocate the buffer for uncompressed page only once and
 keep reusing it like XLogReaderState-readBuf? The size of uncompressed
 page is at most BLCKSZ, so we can allocate the memory for it even before
 knowing the real size of each block image.
 OK, this would save some cycles. I was trying to make process allocate
 a minimum of memory only when necessary.

 -printf( (FPW); hole: offset: %u, length: %u\n,
 -   record-blocks[block_id].hole_offset,
 -   record-blocks[block_id].hole_length);
 +if (record-blocks[block_id].is_compressed)
 +printf( (FPW); hole offset: %u, compressed length 
 %u\n,
 +   record-blocks[block_id].hole_offset,
 +   record-blocks[block_id].bkp_len);
 +else
 +printf( (FPW); hole offset: %u, length: %u\n,
 +   record-blocks[block_id].hole_offset,
 +   record-blocks[block_id].bkp_len);

 We need to consider what info about FPW we want pg_xlogdump to report.
 I'd like to calculate how much bytes FPW was compressed, from the report
 of pg_xlogdump. So I'd like to see also the both length of uncompressed FPW
 and that of compressed one in the report.
 OK, so let's add a parameter in the decoder for the uncompressed
 length. Sounds fine?

 In pg_config.h, the comment of BLCKSZ needs to be updated? Because
 the maximum size of BLCKSZ can be affected by not only itemid but also
 XLogRecordBlockImageHeader.
 Check.

  boolhas_image;
 +boolis_compressed;

 Doesn't ResetDecoder need to reset is_compressed?
 Check.

 +#wal_compression = off# enable compression of full-page writes
 Currently wal_compression compresses only FPW, so isn't it better to place
 it after full_page_writes in postgresql.conf.sample?
 Check.

 +uint16extra_data;/* used to store offset of bytes in
 hole, with
 + * last free bit used to check if block is
 + * compressed */
 At least to me, defining something like the following seems more easy to
 read.
 uint16hole_offset:15,
 is_compressed:1
 Check++.

 Updated patches addressing all those things are attached.

Thanks for updating the patch!

Firstly I'm thinking to commit the
0001-Move-pg_lzcompress.c-to-src-common.patch.

pg_lzcompress.h still exists in include/utils, but it should be moved to
include/common?

Do we really need PGLZ_Status? I'm not sure whether your categorization of
the result status of compress/decompress functions is right or not. For example,
pglz_decompress() can return PGLZ_INCOMPRESSIBLE status, but which seems
invalid logically... Maybe this needs to be revisited when we introduce other
compression algorithms and create the wrapper function for those compression
and decompression functions. Anyway making pg_lzdecompress return
the boolean value seems enough.

I updated 0001-Move-pg_lzcompress.c-to-src-common.patch accordingly.
Barring objections, I will push the attached patch firstly.

Regards,

-- 
Fujii Masao
From 92a7c2ac8a6a8ec6ae84c6713f8ad30b7958de94 Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Tue, 25 Nov 2014 14:05:59 +0900
Subject: [PATCH] Move pg_lzcompress.c to src/common

Exposing compression and decompression APIs of pglz makes possible its
use by extensions and contrib modules. pglz_decompress contained a call
to elog to emit an error message in case of corrupted data. This function
is changed to return a status code to let its callers return an error
instead. Compression function is changed similarly to make the whole set
consistent.
---
 src/backend/access/heap/tuptoaster.c  |   11 +-
 src/backend/utils/adt/Makefile|4 +-
 src/backend/utils/adt/pg_lzcompress.c |  779 
 src/common/Makefile   |3 +-
 

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-24 Thread Michael Paquier
On Wed, Dec 24, 2014 at 8:44 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Dec 19, 2014 at 12:19 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 Firstly I'm thinking to commit the
 0001-Move-pg_lzcompress.c-to-src-common.patch.

 pg_lzcompress.h still exists in include/utils, but it should be moved to
 include/common?
You are right. This is a remnant of first version of this patch where
pglz was added in port/ and not common/.

 Do we really need PGLZ_Status? I'm not sure whether your categorization of
 the result status of compress/decompress functions is right or not. For 
 example,
 pglz_decompress() can return PGLZ_INCOMPRESSIBLE status, but which seems
 invalid logically... Maybe this needs to be revisited when we introduce other
 compression algorithms and create the wrapper function for those compression
 and decompression functions. Anyway making pg_lzdecompress return
 the boolean value seems enough.
Returning only a boolean is fine for me (that's what my first patch
did), especially if we add at some point hooks for compression and
decompression calls.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] speedup tidbitmap patch: cache page

2014-12-24 Thread David Rowley
On 24 December 2014 at 00:24, Teodor Sigaev teo...@sigaev.ru wrote:

 I've also attached some benchmark results using your original table from

 up-thread. It seems that the caching if the page was seen as lossy is not
 much
 of a help in this test case. Did you find another one where you saw some
 better
 gains?


 All what I found is about 0.5%...  v3 contains your comments but it
 doesn't use
 lossy_page cache.


Ok, I've performed some more benchmarks (attached) using your original
table. I'm thinking the v2.3 version is not worth the extra complexity. It
seems the extra caching in v2.3, going by my benchmarks, is more likely to
add overhead than save from hash lookups.

With the query used in my tests using v2.3 I didn't even see a speed up
with 5 million records and 64KB of work_mem.

So I think v3 is the one to go with, and I can't see any problems with it,
so I'm marking it as ready for committer.

Nice work

Kind Regards

David Rowley


tidbench2.xlsx
Description: MS-Excel 2007 spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Moving RestoreBlockImage from xlogreader.c to xlogutils.c

2014-12-24 Thread Michael Paquier
Hi all,

Commit 2c03216d has introduced RestoreBlockImage to restore a page
from a given decoding state. ISTM that this is a backend-only
operation but it has been added in xlogreader.c which could be used as
well by frontend utilities like pg_xlogdump.
Wouldn't it be better to declare it as a static routine in
xlogutils.c? If we keep it in xlogreader.c, I think that we should at
least wrap it with ifndef FRONTEND.
Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit timestamp abbreviations

2014-12-24 Thread Alvaro Herrera
Bruce Momjian wrote:
 I noticed this when looking at the allocated shared memory structures in
 head:
 
   shared memory alignment 64-byte of CommitTs Ctl:  0
   shared memory alignment 64-byte of CommitTs shared:  0
 
 I thought we got rid of the idea that 'Ts' means timestamp.  Was this
 part forgotten?

Do you have a specific reference?  That's not the concern I remember,
and I sure don't want to re-read that whole thread again.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup fails with long tablespace paths

2014-12-24 Thread Peter Eisentraut
On 12/22/14 5:40 PM, Oskari Saarenmaa wrote:
 I think we should just use the UStar tar format
 (http://en.wikipedia.org/wiki/Tar_%28computing%29#UStar_format) and
 allow long file names; all actively used tar implementations should be
 able to handle them.  I'll try to write a patch for that soonish.

UStar doesn't handle long link targets, only long file names (and then
only up to 255 characters, which doesn't seem satisfactory).

AFAICT, to allow long link targets, the available solutions are either
pax extended headers or GNU-specific long-link extra headers.

When I create a symlink with a long target and call tar on it, GNU tar
by default creates the GNU long-link header and BSD tar by default
creates a pax header.  But they are both able to extract either one.

As a demo for how this might look, attached is a wildly incomplete patch
to produce GNU long-link headers.
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index fbcecbb..f0cffcf 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1233,13 +1233,15 @@ static void
 _tarWriteHeader(const char *filename, const char *linktarget,
 struct stat * statbuf)
 {
-	char		h[512];
+	char		*h;
 
-	tarCreateHeader(h, filename, linktarget, statbuf-st_size,
+	h = tarCreateHeader(filename, linktarget, statbuf-st_size,
 	statbuf-st_mode, statbuf-st_uid, statbuf-st_gid,
 	statbuf-st_mtime);
 
 	pq_putmessage('d', h, 512);
+
+	free(h);
 }
 
 /*
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index e7c2939..490dd98 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -911,10 +911,10 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
 
 			if (basetablespace  writerecoveryconf)
 			{
-char		header[512];
+char		*header;
 int			padding;
 
-tarCreateHeader(header, recovery.conf, NULL,
+header = tarCreateHeader(recovery.conf, NULL,
 recoveryconfcontents-len,
 0600, 04000, 02000,
 time(NULL));
@@ -925,6 +925,8 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
 WRITE_TAR_DATA(recoveryconfcontents-data, recoveryconfcontents-len);
 if (padding)
 	WRITE_TAR_DATA(zerobuf, padding);
+
+free(header);
 			}
 
 			/* 2 * 512 bytes empty data at end of file */
diff --git a/src/bin/pg_dump/pg_backup_tar.c b/src/bin/pg_dump/pg_backup_tar.c
index f48d369..60ae013 100644
--- a/src/bin/pg_dump/pg_backup_tar.c
+++ b/src/bin/pg_dump/pg_backup_tar.c
@@ -1300,11 +1300,13 @@ _tarGetHeader(ArchiveHandle *AH, TAR_MEMBER *th)
 static void
 _tarWriteHeader(TAR_MEMBER *th)
 {
-	char		h[512];
+	char		*h;
 
-	tarCreateHeader(h, th-targetFile, NULL, th-fileLen, 0600, 04000, 02000, time(NULL));
+	h = tarCreateHeader(th-targetFile, NULL, th-fileLen, 0600, 04000, 02000, time(NULL));
 
 	/* Now write the completed header. */
 	if (fwrite(h, 1, 512, th-tarFH) != 512)
 		WRITE_ERROR_EXIT;
+
+	free(h);
 }
diff --git a/src/include/pgtar.h b/src/include/pgtar.h
index 2dd6f6b..c3e0eb5 100644
--- a/src/include/pgtar.h
+++ b/src/include/pgtar.h
@@ -11,5 +11,5 @@
  *
  *-
  */
-extern void tarCreateHeader(char *h, const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime);
+extern char * tarCreateHeader(const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime);
 extern int	tarChecksum(char *header);
diff --git a/src/port/tar.c b/src/port/tar.c
index 8ef4f9c..d8183d7 100644
--- a/src/port/tar.c
+++ b/src/port/tar.c
@@ -49,10 +49,26 @@ tarChecksum(char *header)
  * must always have space for 512 characters, which is a requirement by
  * the tar format.
  */
-void
-tarCreateHeader(char *h, const char *filename, const char *linktarget,
+char *
+tarCreateHeader(const char *filename, const char *linktarget,
 size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime)
 {
+	char *h;
+	char *ext_header = NULL;
+	char *ext_content = NULL;
+
+	if (0  linktarget  strlen(linktarget)  99)
+	{
+		ext_header = calloc(512, 1);
+		print_val(ext_header[124], 512, 8, 11);
+		sprintf(ext_header[156], K);
+		sprintf(ext_header[148], %06o , tarChecksum(ext_header));
+
+		ext_content = calloc(512, 1);
+		strcpy(ext_content, linktarget);
+	}
+	h = malloc(512);
+
 	/*
 	 * Note: most of the fields in a tar header are not supposed to be
 	 * null-terminated.  We use sprintf, which will write a null after the
@@ -141,4 +157,18 @@ tarCreateHeader(char *h, const char *filename, const char *linktarget,
 	 * 6 digits, a space, and a null, which is legal per POSIX.
 	 */
 	sprintf(h[148], %06o , tarChecksum(h));
+
+	if (ext_header || ext_content)
+	{
+		char *ret = malloc(3 * 512);
+		memcpy(ret, ext_header, 512);
+		memcpy(ret + 512, ext_content, 512);
+		memcpy(ret + 1024, h, 512);
+		free(ext_header);
+		

Re: [HACKERS] pg_basebackup fails with long tablespace paths

2014-12-24 Thread Peter Eisentraut
On 12/22/14 10:00 PM, Amit Kapila wrote:
 There is already a patch [1] in this CF which will handle both cases, so
 I am
 not sure if it is very good idea to go with a new tar format to handle this
 issue.   

I think it would still make sense to have proper symlinks in the
basebackup if possible, for clarity.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving RestoreBlockImage from xlogreader.c to xlogutils.c

2014-12-24 Thread Fujii Masao
On Wed, Dec 24, 2014 at 9:42 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Hi all,

 Commit 2c03216d has introduced RestoreBlockImage to restore a page
 from a given decoding state. ISTM that this is a backend-only
 operation but it has been added in xlogreader.c which could be used as
 well by frontend utilities like pg_xlogdump.
 Wouldn't it be better to declare it as a static routine in
 xlogutils.c? If we keep it in xlogreader.c, I think that we should at
 least wrap it with ifndef FRONTEND.
 Thoughts?

If we do this, pg_lzcompress.c doesn't need to be moved to common for
FPW compression patch which we're talking about in other thread. Right?

DecodeXLogRecord() seems also a backend-only, so we should treat it
in the same way as you proposed? Or pg_rewind uses that?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving RestoreBlockImage from xlogreader.c to xlogutils.c

2014-12-24 Thread Michael Paquier
On Wed, Dec 24, 2014 at 10:16 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Dec 24, 2014 at 9:42 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 Wouldn't it be better to declare it as a static routine in
 xlogutils.c? If we keep it in xlogreader.c, I think that we should at
 least wrap it with ifndef FRONTEND.

 If we do this, pg_lzcompress.c doesn't need to be moved to common for
 FPW compression patch which we're talking about in other thread. Right?
Yes. This refactoring came to my mind while re-thinking about the WAL
compression. This would also make more straight-forward the
implementation of hooks for compression and decompression.

 DecodeXLogRecord() seems also a backend-only, so we should treat it
 in the same way as you proposed? Or pg_rewind uses that?
DecodeXLogRecord is used by XLogReadRecord, the latter being called by
pg_xlogdump and also pg_rewind, so it is not backend-only. IMO, only
exposing to the frontends the pointer to the beginning of the block
image in the decoder with its extra data, like hole length and hole
offset (or block length in record with WAL compression, be it
compressed or uncompressed), is just but fine. It looks weird to keep
in the xlog reader facility something that performs other operations
than reading a WAL record and organizing it in a readable state for
caller.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving RestoreBlockImage from xlogreader.c to xlogutils.c

2014-12-24 Thread Fujii Masao
On Wed, Dec 24, 2014 at 10:41 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Wed, Dec 24, 2014 at 10:16 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Dec 24, 2014 at 9:42 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 Wouldn't it be better to declare it as a static routine in
 xlogutils.c? If we keep it in xlogreader.c, I think that we should at
 least wrap it with ifndef FRONTEND.

 If we do this, pg_lzcompress.c doesn't need to be moved to common for
 FPW compression patch which we're talking about in other thread. Right?
 Yes. This refactoring came to my mind while re-thinking about the WAL
 compression. This would also make more straight-forward the
 implementation of hooks for compression and decompression.

Fair enough. Anyway I wait for applying the patch which moves pg_lzcompress.c
until we will have reached any consensus about this.

 DecodeXLogRecord() seems also a backend-only, so we should treat it
 in the same way as you proposed? Or pg_rewind uses that?
 DecodeXLogRecord is used by XLogReadRecord, the latter being called by
 pg_xlogdump and also pg_rewind, so it is not backend-only.

Yeah, you're right.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replicating DROP commands across servers

2014-12-24 Thread David Rowley
On 25 December 2014 at 00:34, Andres Freund and...@2ndquadrant.com wrote:

 On 2014-12-24 21:54:20 +1300, David Rowley wrote:
  On 24 December 2014 at 07:41, Alvaro Herrera alvhe...@2ndquadrant.com
  wrote:
 
   I have pushed patches 0001 through 0004, with some revisions.  Only the
   final part is missing.
  
  
  Hi Alvaro,
 
  Would you be able to commit the attached? It just fixes a new compiler
  warning that I'm seeing on MSVC.
 
  src\backend\parser\parse_type.c(795): warning C4715:
 'typeStringToTypeName'
  : not all control paths return a value [D:\Postgres\a\postgres.vcxproj]

 Pushed.

 Thanks


 I really wonder if we can't make msvc reliably recognize this kind of
 scenario - especially this case is pretty trivial?

 Which of:
 #if defined(HAVE__BUILTIN_UNREACHABLE)  !defined(USE_ASSERT_CHECKING)
 #define pg_unreachable() __builtin_unreachable()
 #elif defined(_MSC_VER)  !defined(USE_ASSERT_CHECKING)
 #define pg_unreachable() __assume(0)
 #else
 #define pg_unreachable() abort()
 #endif


I don't think the problem is here. The problem is the the elevel being set
to a variable in the elog macro to prevent the multiple evaluation problem,
then since it does int elevel_ = elevel ...  if (elevel_ = ERROR) that's
not constant, or at least the microsoft compiler is not smart enough to see
that it is.

The attached patch removes the warning, but likely can't be used in case
someone somewhere is doing elog(var++, my error);

Compiling with the attached shaves almost 1% off the size of postgres.exe:

before; 5,882,368 bytes
after: 5,830,656 bytes

I've been trawling around to try to see if anything
like __builtin_constant_p() exists for MSVC, but so far I've not found
anything useful.

Kind Regards

David Rowley
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 87438b8..8be68dd 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -121,10 +121,9 @@
 #else  /* 
!HAVE__BUILTIN_CONSTANT_P */
 #define ereport_domain(elevel, domain, rest)   \
do { \
-   const int elevel_ = (elevel); \
-   if (errstart(elevel_, __FILE__, __LINE__, PG_FUNCNAME_MACRO, 
domain)) \
+   if (errstart(elevel, __FILE__, __LINE__, PG_FUNCNAME_MACRO, 
domain)) \
errfinish rest; \
-   if (elevel_ = ERROR) \
+   if (elevel = ERROR) \
pg_unreachable(); \
} while(0)
 #endif   /* HAVE__BUILTIN_CONSTANT_P */
@@ -258,12 +257,10 @@ extern intgetinternalerrposition(void);
 #else  /* 
!HAVE__BUILTIN_CONSTANT_P */
 #define elog(elevel, ...)  \
do { \
-   int elevel_; \
-   elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO); \
-   elevel_ = (elevel); \
-   elog_finish(elevel_, __VA_ARGS__); \
-   if (elevel_ = ERROR) \
-   pg_unreachable(); \
+   elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO); \
+   elog_finish(elevel, __VA_ARGS__); \
+   if (elevel = ERROR) \
+   pg_unreachable(); \
} while(0)
 #endif   /* HAVE__BUILTIN_CONSTANT_P */
 #else  /* !HAVE__VA_ARGS */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit timestamp abbreviations

2014-12-24 Thread Bruce Momjian
On Tue, Dec 23, 2014 at 06:00:21PM -0300, Alvaro Herrera wrote:
 Bruce Momjian wrote:
  I noticed this when looking at the allocated shared memory structures in
  head:
  
  shared memory alignment 64-byte of CommitTs Ctl:  0
  shared memory alignment 64-byte of CommitTs shared:  0
  
  I thought we got rid of the idea that 'Ts' means timestamp.  Was this
  part forgotten?
 
 Do you have a specific reference?  That's not the concern I remember,
 and I sure don't want to re-read that whole thread again.

I remember the issue of using _ts and 'ts' inconsistently, and I thought
we were going to spell out timestamp in more places, but maybe I am
remembering incorrectly.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2014-12-24 Thread Bruce Momjian
On Wed, Dec 24, 2014 at 10:30:19AM +0100, Andres Freund wrote:
 On 2014-12-23 22:51:22 -0500, Bruce Momjian wrote:
  Many of these are 64-byte aligned, including Buffer Descriptors.
 
 In that case you need to change max_connections, some settings will lead
 to unaligned BufferDescriptors.

Well, isn't my second patch that misaligns the buffers sufficient for
testing?

  I tested pgbench with these commands:
  
  $ pgbench -i -s 95 pgbench
  $ pgbench -S -c 95 -j 95 -t 10 pgbench
  
  on a 16-core Xeon server and got 84k tps.  I then applied another patch,
  attached, which causes all the structures to be non-64-byte aligned, but
  got the same tps number.
 
 'Xeon' itself doesn't say much. It's been applied to widly different
 CPUs over the years. I guess that was a single socket server? You're
 much more likely to see significant problems on a multi node NUMA
 servers where the penalties for cache misses/false sharing are a
 magnitude or three higher.

Sorry, the server has 2 x Intel Xeon E5620 2.4GHz Quad-Core Processors;
the full details are here:

http://momjian.us/main/blogs/pgblog/2012.html#January_20_2012

  Can someone test these patches on an AMD CPU and see if you see a
  difference?  Thanks.
 
 I don't think you'll see a bigger difference there.

Uh, I thought AMD showed a huge difference for misalignment:


http://www.postgresql.org/message-id/20140202151319.gd32...@awork2.anarazel.de

and that email is from you.

I ended up running pgbench using 16-scale and got 90k tps:

pgbench -S -c 16 -j 16 -t 10 pgbench

but again could not see any difference between aligned and misaligned.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replicating DROP commands across servers

2014-12-24 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 I really wonder if we can't make msvc reliably recognize this kind of
 scenario - especially this case is pretty trivial?

Even if MSVC did understand pg_unreachable(), there would be other
compilers that didn't, so we'd need to worry about suppressing such
warnings anyhow.  Personally I'm just as happy that we have instances
of this case in the buildfarm where we can check it easily.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replicating DROP commands across servers

2014-12-24 Thread Tom Lane
David Rowley dgrow...@gmail.com writes:
 On 25 December 2014 at 00:34, Andres Freund and...@2ndquadrant.com wrote:
 I really wonder if we can't make msvc reliably recognize this kind of
 scenario - especially this case is pretty trivial?

 The attached patch removes the warning, but likely can't be used in case
 someone somewhere is doing elog(var++, my error);

Yeah, we're *not* doing that.  There are definitely places where
ereport/elog are used with nonconstant elevel.

It's curious though that MSVC fails to notice that the variable never
changes.  I wonder whether we could get away with changing the elog
macro to do
  const int elevel_ = (elevel);
as ereport does, and whether it would help if so.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit timestamp abbreviations

2014-12-24 Thread Petr Jelinek

On 24/12/14 15:15, Bruce Momjian wrote:

On Tue, Dec 23, 2014 at 06:00:21PM -0300, Alvaro Herrera wrote:

Bruce Momjian wrote:

I noticed this when looking at the allocated shared memory structures in
head:

shared memory alignment 64-byte of CommitTs Ctl:  0
shared memory alignment 64-byte of CommitTs shared:  0

I thought we got rid of the idea that 'Ts' means timestamp.  Was this
part forgotten?


Do you have a specific reference?  That's not the concern I remember,
and I sure don't want to re-read that whole thread again.


I remember the issue of using _ts and 'ts' inconsistently, and I thought
we were going to spell out timestamp in more places, but maybe I am
remembering incorrectly.



The change was from committs to commit_ts + CommitTs depending on place.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2014-12-24 Thread Andres Freund
On 2014-12-24 10:00:05 -0500, Bruce Momjian wrote:
 On Wed, Dec 24, 2014 at 10:30:19AM +0100, Andres Freund wrote:
  On 2014-12-23 22:51:22 -0500, Bruce Momjian wrote:
   Many of these are 64-byte aligned, including Buffer Descriptors.
  
  In that case you need to change max_connections, some settings will lead
  to unaligned BufferDescriptors.
 
 Well, isn't my second patch that misaligns the buffers sufficient for
 testing?

I hadn't looked at it. Note that you're quite likely to overlap the
allocated region into the next one with the trivial method you're using.

I just verified that I can still reproduce the problem:

# aligned case (max_connections=401)
afreund@axle:~$ pgbench -P 1 -h /tmp/ -p5440 postgres -n -M prepared -c 96 -j 
96 -T 100 -S
progress: 1.0 s, 405170.2 tps, lat 0.195 ms stddev 0.928
progress: 2.0 s, 467011.1 tps, lat 0.204 ms stddev 0.140
progress: 3.0 s, 462832.1 tps, lat 0.205 ms stddev 0.154
progress: 4.0 s, 471035.5 tps, lat 0.202 ms stddev 0.154
progress: 5.0 s, 500329.0 tps, lat 0.190 ms stddev 0.132

BufferDescriptors is at 0x7f63610a6960 (which is 32byte aligned)

# unaligned case (max_connections=400)
afreund@axle:~$ pgbench -P 1 -h /tmp/ -p5440 postgres -n -M prepared -c 96 -j 
96 -T 100 -S
progress: 1.0 s, 202271.1 tps, lat 0.448 ms stddev 1.232
progress: 2.0 s, 223823.4 tps, lat 0.427 ms stddev 3.007
progress: 3.0 s, 227584.5 tps, lat 0.414 ms stddev 4.760
progress: 4.0 s, 221095.6 tps, lat 0.410 ms stddev 4.390
progress: 5.0 s, 217430.6 tps, lat 0.454 ms stddev 7.913
progress: 6.0 s, 210275.9 tps, lat 0.411 ms stddev 0.606
BufferDescriptors is at 0x7f1718aeb980 (which is 64byte aligned)

This is on a quad E5-4620 with 256GB RAM on a scale 100 pgbench
instance.

   Can someone test these patches on an AMD CPU and see if you see a
   difference?  Thanks.
  
  I don't think you'll see a bigger difference there.
 
 Uh, I thought AMD showed a huge difference for misalignment:
 
   
 http://www.postgresql.org/message-id/20140202151319.gd32...@awork2.anarazel.de

Ugh, yes. Forgot that... There was another patch that wasn't showing big
differences on AMD, and I mixed it up.

 I ended up running pgbench using 16-scale and got 90k tps:
 
   pgbench -S -c 16 -j 16 -t 10 pgbench
 
 but again could not see any difference between aligned and misaligned.

At the very least you should use -M prepared, otherwise you'll be
bottlenecked by the parser.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: hash_create API changes (was Re: [HACKERS] speedup tidbitmap patch: hash BlockNumber)

2014-12-24 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-12-24 00:27:39 -0600, Jim Nasby wrote:
 pgbench -S -T10 -c 4 -j 4
 master:
 tps = 9556.356145 (excluding connections establishing)
 tps = 9897.324917 (excluding connections establishing)
 tps = 9287.286907 (excluding connections establishing)
 tps = 10210.130833 (excluding connections establishing)
 
 XXH32:
 tps = 32462.754437 (excluding connections establishing)
 tps = 33232.144511 (excluding connections establishing)
 tps = 33082.436760 (excluding connections establishing)
 tps = 33597.904532 (excluding connections establishing)

 FWIW, I don't believe these results for one second. It's quite plausible
 that there's a noticeable performance benefit, but a factor of three is
 completely unrealistic... Could you please recheck?

A possible theory is that the hash change moved some locks into
different partitions causing a large reduction in contention,
but even then 3X seems unlikely.  And of course if that was
the mechanism, the result is still pure luck; other examples
might get worse by the same amount.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional role attributes superuser review

2014-12-24 Thread Adam Brightwell
All,

I want to revive this thread and continue to move these new role attributes
forward.

In summary, the ultimate goal is to include new role attributes for common
operations which currently require superuser privileges.

Initially proposed were the following attributes:

* BACKUP - allows role to perform backup operations
* LOGROTATE - allows role to rotate log files
* MONITOR - allows role to view pg_stat_* details
* PROCSIGNAL - allows role to signal backend processes

It seems that PROCSIGNAL and MONITOR were generally well received and
probably don't warrant much more discussion at this point.

However, based on previous discussions, there seemed to be some uncertainty
on how to handle BACKUP and LOGROTATE.

Concerns:

* LOGROTATE - only associated with one function/operation.
* BACKUP - perceived to be too broad of a permission as it it would provide
the ability to run pg_start/stop_backend and the xlog related functions.
It is general sentiment is that these should be handled as separate
privileges.
* BACKUP - preferred usage is with pg_dump to giving a user the ability to
run pg_dump on the whole database without being superuser.

Previous Recommendations:

* LOGROTATE - Use OPERATOR - concern was expressed that this might be too
general of an attribute for this purpose.  Also, concern for privilege
'upgrades' as it includes more capabilities in later releases.
* LOGROTATE - Use LOG_OPERATOR - generally accepted, but concern was raise
for using extraneous descriptors such as '_OPERATOR' and '_ADMIN', etc.
* BACKUP - Use WAL_CONTROL for pg_start/stop_backup - no major
disagreement, though same concern regarding extraneous descriptors.
* BACKUP - Use XLOG_OPERATOR for xlog operations - no major disagreement,
though same concern regarding extraneous descriptors.
* BACKUP - Use BACKUP for granting non-superuser ability to run pg_dump on
whole database.

Given the above and previous discussions:

I'd like to propose the following new role attributes:

BACKUP - allows role to perform pg_dump* backups of whole database.
WAL - allows role to execute pg_start_backup/pg_stop_backup functions.
XLOG - allows role to execute xlog operations.
LOG - allows role to rotate log files - remains broad enough to consider
future log related operations.
MONITOR - allows role to view pg_stat_* details.
PROCSIGNAL - allows role to signal backend processes.

If these seem reasonable, then I'll begin updating the initial/current
patch submitted.  But in either case, feedback and suggestions are
certainly welcome and appreciated.

Thanks,
Adam

--
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com


[HACKERS] nls and server log

2014-12-24 Thread Euler Taveira
Hi,

Currently the same message goes to server log and client app. Sometimes
it bothers me since I have to analyze server logs and discovered that
lc_messages is set to pt_BR and to worse things that stup^H^H^H
application parse some error messages in portuguese. My solution has
been a modified version of pgBadger (former was pgfouine) -- that has
its problems: (i) translations are not as stable as english messages,
(ii) translations are not always available and it means there is a mix
of translated and untranslated messages and (iii) it is minor version
dependent. I'm tired to fight against those problems and started to
research if there is a good solution for backend.

I'm thinking to carry both translated and untranslated messages if we
ask to. We store the untranslated messages if the new GUC (say
server_lc_messages) is set. The cost will be copy to new five variables
(message, detail, detail_log, hint, and context) in ErrorData struct
that will be used iif server_lc_messages is set. A possible optimization
is not to use the new variables if the lc_messages and
server_lc_messages does not match. My use case is a server log in
english but I'm perfect fine allowing server log in spanish and client
messages in french. Is it an acceptable plan? Ideas?


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal VACUUM SCHEMA

2014-12-24 Thread Jim Nasby

On 12/23/14, 8:49 PM, Fabrízio de Royes Mello wrote:

Em terça-feira, 23 de dezembro de 2014, Jim Nasby jim.na...@bluetreble.com 
mailto:jim.na...@bluetreble.com escreveu:

On 12/23/14, 8:54 AM, Fabrízio de Royes Mello wrote:

   Right now a lot of people just work around this with things like DO 
blocks, but as mentioned elsewhere in the thread that fails for commands that 
can't be in a transaction.
  

I use dblink to solve it. :-)


So... how about instead of solving this only for vacuum we create something 
generic? :) Possibly using Robert's background worker work?


  Interesting idea.

But and what about the idea of improve the --table option from clients: 
vaccumdb and clusterdb?


Seems reasonable.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning recovery.conf into GUCs

2014-12-24 Thread Alex Shulgin
Alex Shulgin a...@commandprompt.com writes:

 Petr Jelinek p...@2ndquadrant.com writes:

 I had a quick look, the patch does not apply cleanly anymore but it's
 just release notes so nothing too bad.

 Yes, there were some ongoing changes that touched some parts of this and
 I must have missed the latest one (or maybe I was waiting for it to
 settle down).

The rebased version is attached.

 - the StandbyModeRequested and StandbyMode should be lowercased like
 the rest of the GUCs

 Yes, except that standby_mode is linked to StandbyModeRequested, not the
 other one.  I can try see if there's a sane way out of this.

As I see it, renaming these will only add noise to this patch, and there
is also ArchiveRecoveryRequested / InArchiveRecovery.  This is going to
be tricky and I'm not sure it's really worth the effort.

 - The whole CopyErrorData and memory context logic is not needed in
 check_recovery_target_time() as the FlushErrorState() is not called
 there

 You must be right.  I recall having some trouble with strings being
 free'd prematurely, but if it's ERROR, then there should be no need for
 that.  I'll check again.

Hrm, after going through this again I'm pretty sure that was correct:
the only way to obtain the current error message is to use
CopyErrorData(), but that requires you to switch back to non-error
memory context (via Assert).

The FlushErrorState() call is not there because it's handled by the hook
caller (or it can exit via ereport() with elevel==ERROR).

 - The new code in StartupXLOG() like
 +if (recovery_target_xid_string != NULL)
 +InitRecoveryTarget(RECOVERY_TARGET_XID);
 +
 +if (recovery_target_time_string != NULL)
 +InitRecoveryTarget(RECOVERY_TARGET_TIME);
 +
 +if (recovery_target_name != NULL)
 +InitRecoveryTarget(RECOVERY_TARGET_NAME);
 +
 +if (recovery_target_string != NULL)
 +InitRecoveryTarget(RECOVERY_TARGET_IMMEDIATE);

 would probably be better in separate function that you then call
 StartupXLOG() as StartupXLOG() is crazy long already so I think it's
 preferable to not make it worse.

 We can move it at top of CheckStartingAsStandby() obviously.

Moved.

--
Alex



recovery_guc_v5.6.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: hash_create API changes (was Re: [HACKERS] speedup tidbitmap patch: hash BlockNumber)

2014-12-24 Thread Jim Nasby

On 12/24/14, 10:58 AM, Tom Lane wrote:

Andres Freund and...@2ndquadrant.com writes:

On 2014-12-24 00:27:39 -0600, Jim Nasby wrote:

pgbench -S -T10 -c 4 -j 4
master:
tps = 9556.356145 (excluding connections establishing)
tps = 9897.324917 (excluding connections establishing)
tps = 9287.286907 (excluding connections establishing)
tps = 10210.130833 (excluding connections establishing)

XXH32:
tps = 32462.754437 (excluding connections establishing)
tps = 33232.144511 (excluding connections establishing)
tps = 33082.436760 (excluding connections establishing)
tps = 33597.904532 (excluding connections establishing)



FWIW, I don't believe these results for one second. It's quite plausible
that there's a noticeable performance benefit, but a factor of three is
completely unrealistic... Could you please recheck?


A possible theory is that the hash change moved some locks into
different partitions causing a large reduction in contention,
but even then 3X seems unlikely.  And of course if that was
the mechanism, the result is still pure luck; other examples
might get worse by the same amount.


I must have screwed something up, because if anything I see a small loss for 
XXH now (but my laptop isn't consistent enough to be sure).

This surprises me given that SMHasher shows XXH to be 50% faster than Spooky 
for 20 byte keys.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replicating DROP commands across servers

2014-12-24 Thread David Rowley
On 25 December 2014 at 04:47, Tom Lane t...@sss.pgh.pa.us wrote:

 David Rowley dgrow...@gmail.com writes:
  On 25 December 2014 at 00:34, Andres Freund and...@2ndquadrant.com
 wrote:
  I really wonder if we can't make msvc reliably recognize this kind of
  scenario - especially this case is pretty trivial?

  The attached patch removes the warning, but likely can't be used in case
  someone somewhere is doing elog(var++, my error);

 Yeah, we're *not* doing that.  There are definitely places where
 ereport/elog are used with nonconstant elevel.


Agreed. The patch was intended as a demo of where the problem is. Although
I don't see why non-const elevel matters. Non-stable expressions are what
actually matter.


 It's curious though that MSVC fails to notice that the variable never
 changes.  I wonder whether we could get away with changing the elog
 macro to do
   const int elevel_ = (elevel);
 as ereport does, and whether it would help if so.


Unlikely, as the one that was just fixed above is an ereport.

I'll dig around a little more and see if there's some way to get MSVC to
optimise this somehow. The 1% reduction in the postgres.exe seems worth a
little bit of investigation time.

Regards

David Rowley


Re: hash_create API changes (was Re: [HACKERS] speedup tidbitmap patch: hash BlockNumber)

2014-12-24 Thread Tom Lane
Jim Nasby jim.na...@bluetreble.com writes:
 On 12/24/14, 10:58 AM, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
 FWIW, I don't believe these results for one second. It's quite plausible
 that there's a noticeable performance benefit, but a factor of three is
 completely unrealistic... Could you please recheck?

 A possible theory is that the hash change moved some locks into
 different partitions causing a large reduction in contention,
 but even then 3X seems unlikely.  And of course if that was
 the mechanism, the result is still pure luck; other examples
 might get worse by the same amount.

 I must have screwed something up, because if anything I see a small loss for 
 XXH now (but my laptop isn't consistent enough to be sure).

 This surprises me given that SMHasher shows XXH to be 50% faster than
 Spooky for 20 byte keys.

The speed of the hash calculation itself is unlikely to move the needle as
much as 1% either direction, because I've seldom seen any profile in which
hash_any accounted for more than a percent or so of total runtime.  What
is far more likely to be causing visible performance changes is downstream
effects, ie changes in the distribution of entries across hash buckets,
causing changes in bucket list search times and/or changes in contention
(for shared memory tables that are locked on a hash-partition basis).
But even then it's hard to credit 3X.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving RestoreBlockImage from xlogreader.c to xlogutils.c

2014-12-24 Thread Michael Paquier
On Wed, Dec 24, 2014 at 10:51 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Fair enough. Anyway I wait for applying the patch which moves pg_lzcompress.c
 until we will have reached any consensus about this.
Just to be clear (after sleeping on it), we still need pglz stuff in
src/common to offer to the frontends the possibility to uncompress
block data. My point is simply that we should only provide in the xlog
reader facility enough data to do operations on them, but not directly
APIs to operate them. So ISTM that you could still push the patch to
have pglz in common library to clear the way, and let's use this
thread to discuss if we want the API to rebuild blocks in the reader
facility or not.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ctidscan as an example of custom-scan (Re: [HACKERS] [v9.5] Custom Plan API)

2014-12-24 Thread Kouhei Kaigai
  On Mon, Dec 15, 2014 at 4:55 PM, Kouhei Kaigai kai...@ak.jp.nec.com
 wrote:
   I'm not certain whether we should have this functionality in contrib
   from the perspective of workload that can help, but its major worth
   is for an example of custom-scan interface.
  worker_spi is now in src/test/modules. We may add it there as well, no?
 
 Hmm, it makes sense for me. Does it also make sense to add a test-case to
 the core regression test cases?
 
The attached patch adds ctidscan module at test/module instead of contrib.
Basic portion is not changed from the previous post, but file locations
and test cases in regression test are changed.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


pgsql-v9.5-test-ctidscan.v2.patch
Description: pgsql-v9.5-test-ctidscan.v2.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Securing make check (CVE-2014-0067)

2014-12-24 Thread David Rowley
On 30 November 2014 at 15:02, Noah Misch n...@leadboat.com wrote:

 On Sun, Sep 21, 2014 at 02:31:15AM -0400, Noah Misch wrote:
  It then dawned on me that every Windows build of PostgreSQL already has
 a way
  to limit connections to a particular OS user.  SSPI authentication is
  essentially the Windows equivalent of peer authentication.  A brief trial
  thereof looked promising.  Regression runs will need a pg_ident.conf
 listing
  each role used in the regression tests.  That's not ideal, but the
 buildfarm
  will quickly reveal any omissions.  Unless someone sees a problem here,
 I will
  look at fleshing this out into a complete patch.  I bet it will even
 turn out
  to be back-patchable.

 That worked out nicely.  pg_regress --temp-install rewrites pg_ident.conf
 and pg_hba.conf such that the current OS user may authenticate as the
 bootstrap superuser and as any user named in --create-role.  Suites not
 using
 --temp-install (pg_upgrade, TAP) call pg_regress --config-auth=DATADIR to
 pick up those same configuration changes.  My hope is that out-of-tree test
 harnesses wanting this hardening can do likewise.  On non-Windows systems,
 pg_regress --config-auth does nothing.



f6dc6dd seems to have broken vcregress check for me:

D:\Postgres\a\src\tools\msvcvcregress check
== removing existing temp installation==
== creating temporary installation==
== initializing database system   ==
== starting postmaster==

pg_regress: postmaster did not respond within 60 seconds
Examine D:/Postgres/a/src/test/regress/log/postmaster.log for the reason

The postmaster.log reads:

LOG:  database system was shut down at 2014-12-25 15:26:33 NZDT
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
FATAL:  no pg_hba.conf entry for host ::1, user David, database
postgres
...
FATAL:  no pg_hba.conf entry for host ::1, user David, database
postgres


Having a look at the pg_hba.conf that's been generated by pgregress, it
looks like it only adds a line for IPv4 addresses.

I'll admit that I don't have a great understanding of what the SSPI stuff
is about, but at least the attached patch seems to fix the problem for me.

Regards

David Rowley
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index cb092f9..e2adaca 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -1085,6 +1085,8 @@ config_sspi_auth(const char *pgdata)
CW(fputs(# Configuration written by config_sspi_auth()\n, hba) = 0);
CW(fputs(host all all 127.0.0.1/32  sspi include_realm=1 
map=regress\n,
 hba) = 0);
+   CW(fputs(host all all ::1/128  sspi include_realm=1 map=regress\n,
+hba) = 0);
CW(fclose(hba) == 0);
 
snprintf(fname, sizeof(fname), %s/pg_ident.conf, pgdata);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Securing make check (CVE-2014-0067)

2014-12-24 Thread Michael Paquier
On Thu, Dec 25, 2014 at 11:55 AM, David Rowley dgrowle...@gmail.com wrote:
 f6dc6dd seems to have broken vcregress check for me:
 Having a look at the pg_hba.conf that's been generated by pgregress, it
 looks like it only adds a line for IPv4 addresses.
Indeed. I can see this problem as well on my win7 box, and your patch fixes it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Securing make check (CVE-2014-0067)

2014-12-24 Thread Noah Misch
On Thu, Dec 25, 2014 at 03:55:02PM +1300, David Rowley wrote:
 f6dc6dd seems to have broken vcregress check for me:

 FATAL:  no pg_hba.conf entry for host ::1, user David, database
 postgres
 ...
 FATAL:  no pg_hba.conf entry for host ::1, user David, database
 postgres

Thanks.  I bet this is the reason buildfarm members hamerkop, jacana and
bowerbird have not been reporting in.

 @@ -1085,6 +1085,8 @@ config_sspi_auth(const char *pgdata)
   CW(fputs(# Configuration written by config_sspi_auth()\n, hba) = 0);
   CW(fputs(host all all 127.0.0.1/32  sspi include_realm=1 
 map=regress\n,
hba) = 0);
 + CW(fputs(host all all ::1/128  sspi include_realm=1 map=regress\n,
 +  hba) = 0);

This needs to be conditional on whether the platform supports IPv6, like we do
in setup_config().  The attached patch works on these configurations:

64-bit Windows Server 2003, 32-bit VS2010
64-bit Windows Server 2003, MinGW (always 32-bit)
64-bit Windows Server 2008, 64-bit VS2012
64-bit Windows Server 2008, 64-bit MinGW-w64

If the patch looks reasonable, I will commit it.
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index cb092f9..e8c644b 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -1035,6 +1035,7 @@ config_sspi_auth(const char *pgdata)
   *domainname;
const char *username;
char   *errstr;
+   boolhave_ipv6;
charfname[MAXPGPATH];
int res;
FILE   *hba,
@@ -1054,6 +1055,28 @@ config_sspi_auth(const char *pgdata)
exit(2);
}
 
+   /*
+* Like initdb.c:setup_config(), determine whether the platform 
recognizes
+* ::1 (IPv6 loopback) as a numeric host address string.
+*/
+   {
+   struct addrinfo *gai_result;
+   struct addrinfo hints;
+   WSADATA wsaData;
+
+   hints.ai_flags = AI_NUMERICHOST;
+   hints.ai_family = AF_UNSPEC;
+   hints.ai_socktype = 0;
+   hints.ai_protocol = 0;
+   hints.ai_addrlen = 0;
+   hints.ai_canonname = NULL;
+   hints.ai_addr = NULL;
+   hints.ai_next = NULL;
+
+   have_ipv6 = (WSAStartup(MAKEWORD(2, 2), wsaData) == 0 
+getaddrinfo(::1, NULL, hints, 
gai_result) == 0);
+   }
+
/* Check a Write outcome and report any error. */
 #define CW(cond)   \
do { \
@@ -1085,6 +1108,9 @@ config_sspi_auth(const char *pgdata)
CW(fputs(# Configuration written by config_sspi_auth()\n, hba) = 0);
CW(fputs(host all all 127.0.0.1/32  sspi include_realm=1 
map=regress\n,
 hba) = 0);
+   if (have_ipv6)
+   CW(fputs(host all all ::1/128  sspi include_realm=1 
map=regress\n,
+hba) = 0);
CW(fclose(hba) == 0);
 
snprintf(fname, sizeof(fname), %s/pg_ident.conf, pgdata);
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 004942c..4506739 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -345,6 +345,7 @@ sub mkvcbuild
$pgregress_ecpg-AddIncludeDir('src\test\regress');
$pgregress_ecpg-AddDefine('HOST_TUPLE=i686-pc-win32vc');
$pgregress_ecpg-AddDefine('FRONTEND');
+   $pgregress_ecpg-AddLibrary('ws2_32.lib');
$pgregress_ecpg-AddDirResourceFile('src\interfaces\ecpg\test');
$pgregress_ecpg-AddReference($libpgcommon, $libpgport);
 
@@ -372,6 +373,7 @@ sub mkvcbuild
$pgregress_isolation-AddIncludeDir('src\test\regress');
$pgregress_isolation-AddDefine('HOST_TUPLE=i686-pc-win32vc');
$pgregress_isolation-AddDefine('FRONTEND');
+   $pgregress_isolation-AddLibrary('ws2_32.lib');
$pgregress_isolation-AddDirResourceFile('src\test\isolation');
$pgregress_isolation-AddReference($libpgcommon, $libpgport);
 
@@ -605,6 +607,8 @@ sub mkvcbuild
$pgregress-AddFile('src\test\regress\pg_regress_main.c');
$pgregress-AddIncludeDir('src\port');
$pgregress-AddDefine('HOST_TUPLE=i686-pc-win32vc');
+   $pgregress-AddDefine('FRONTEND');
+   $pgregress-AddLibrary('ws2_32.lib');
$pgregress-AddDirResourceFile('src\test\regress');
$pgregress-AddReference($libpgcommon, $libpgport);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What exactly is our CRC algorithm?

2014-12-24 Thread Abhijit Menon-Sen
Hi.

Here's a proposed patch to use CPUID at startup to determine if the
SSE4.2 CRC instructions are available, to use them instead of the
slice-by-8 implementation (posted earlier).

A few notes:

1. GCC has included cpuid.h since 4.3.0, so I figured it was safe to
   use. It can be replaced with some inline assembly otherwise.

2. I've also used the crc32b/crc32q instructions directly rather than
   using .bytes to encode the instructions; bintuils versions since
   2007 or so have supported them.

3. I've included the MSVC implementation mostly as an example of how to
   extend this to different compilers/platforms. It's written according
   to the documentation for MSVC intrinsics, but I have not tested it.
   Suggestions/improvements are welcome.

-- Abhijit
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index 73c30c5..ae34876 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -37,6 +37,7 @@
 #include utils/memutils.h
 #include utils/pg_locale.h
 #include utils/ps_status.h
+#include utils/pg_crc.h
 
 
 const char *progname;
@@ -76,6 +77,12 @@ main(int argc, char *argv[])
 	argv = save_ps_display_args(argc, argv);
 
 	/*
+	 * Select the fastest available CRC32 implementation for the
+	 * platform.
+	 */
+	pg_init_comp_crc32c();
+
+	/*
 	 * If supported on the current platform, set up a handler to be called if
 	 * the backend/postmaster crashes with a fatal signal or exception.
 	 */

diff --git a/src/include/utils/pg_crc.h b/src/include/utils/pg_crc.h
index 55934e5..c59c05b 100644
--- a/src/include/utils/pg_crc.h
+++ b/src/include/utils/pg_crc.h
@@ -41,7 +41,8 @@
 
 typedef uint32 pg_crc32;
 
-extern pg_crc32 pg_comp_crc32c(pg_crc32 crc, const void *data, size_t len);
+extern void pg_init_comp_crc32c(void);
+extern pg_crc32 (*pg_comp_crc32c)(pg_crc32 crc, const void *data, size_t len);
 
 /*
  * CRC calculation using the CRC-32C (Castagnoli) polynomial.

diff --git a/src/port/pg_crc.c b/src/port/pg_crc.c
index 2f9857b..6be17b0 100644
--- a/src/port/pg_crc.c
+++ b/src/port/pg_crc.c
@@ -21,6 +21,13 @@
 #include utils/pg_crc.h
 #include utils/pg_crc_tables.h
 
+#if defined(HAVE_CPUID_H)
+#include cpuid.h
+#elif defined(_MSC_VER)
+#include intrin.h
+#include nmmintrin.h
+#endif
+
 static inline uint32 bswap32(uint32 x)
 {
 #if defined(__GNUC__) || defined(__clang__)
@@ -39,8 +46,8 @@ static inline uint32 bswap32(uint32 x)
 #define cpu_to_le32(x) x
 #endif
 
-pg_crc32
-pg_comp_crc32c(pg_crc32 crc, const void *data, size_t len)
+static pg_crc32
+pg_comp_crc32c_sb8(pg_crc32 crc, const void *data, size_t len)
 {
 	const unsigned char *p = data;
 	const uint32 *p8;
@@ -61,7 +68,6 @@ pg_comp_crc32c(pg_crc32 crc, const void *data, size_t len)
 	 */
 
 	p8 = (const uint32 *) p;
-
 	while (len = 8)
 	{
 		uint32 a = *p8++ ^ cpu_to_le32(crc);
@@ -101,8 +107,102 @@ pg_comp_crc32c(pg_crc32 crc, const void *data, size_t len)
 	 */
 
 	p = (const unsigned char *) p8;
-	while (len--  0)
+	while (len  0)
+	{
 		crc = pg_crc32c_table[0][(crc ^ *p++)  0xFF] ^ (crc  8);
+		len--;
+	}
+
+	return crc;
+}
 
+static pg_crc32
+pg_asm_crc32b(pg_crc32 crc, unsigned char data)
+{
+#ifdef __GNUC__
+	__asm__ (crc32b %[data], %[crc]\n : [crc] +r (crc) : [data] rm (data));
 	return crc;
+#elif defined(_MSC_VER)
+	return _mm_crc32_u8(crc, data);
+#else
+#error Don't know how to generate crc32b instruction
+#endif
 }
+
+static pg_crc32
+pg_asm_crc32q(uint64 crc, unsigned long long data)
+{
+#ifdef __GNUC__
+	__asm__ (crc32q %[data], %[crc]\n : [crc] +r (crc) : [data] rm (data));
+	return crc;
+#elif defined(_MSC_VER)
+	return _mm_crc32_u64(crc, data);
+#else
+#error Don't know how to generate crc32q instruction
+#endif
+}
+
+static pg_crc32
+pg_comp_crc32c_sse(pg_crc32 crc, const void *data, size_t len)
+{
+	const unsigned char *p = data;
+	const uint64 *p8;
+
+	/*
+	 * Handle initial bytes one at a time if necessary to ensure that
+	 * the loop below starts with a pointer aligned to four bytes.
+	 */
+
+	while (len  0  ((uintptr_t) p  3))
+	{
+		crc = pg_asm_crc32b(crc, *p++);
+		len--;
+	}
+
+	/*
+	 * Process eight bytes of data at a time.
+	 */
+
+	p8 = (const uint64 *) p;
+	while (len = 8)
+	{
+		crc = pg_asm_crc32q(crc, *p8++);
+		len -= 8;
+	}
+
+	/*
+	 * Handle any remaining bytes one at a time.
+	 */
+
+	p = (const unsigned char *) p8;
+	while (len  0)
+	{
+		crc = pg_asm_crc32b(crc, *p++);
+		len--;
+	}
+
+	return crc;
+}
+
+/*
+ * If (we can tell that) the CPU supports SSE4.2 instructions, we can
+ * use the CRC instruction, otherwise we fall back to slice-by-8 in
+ * software.
+ */
+
+void
+pg_init_comp_crc32c(void)
+{
+	unsigned int exx[4] = { 0, 0, 0, 0 };
+
+#if defined(__GNUC__)  defined(HAVE_CPUID_H)
+	__get_cpuid(1, exx[0], exx[1], exx[2], exx[3]);
+#elif defined(_MSC_VER)
+	__cpuid(exx, 1);
+#endif
+
+	if (exx[2]  (1  20))
+		pg_comp_crc32c = pg_comp_crc32c_sse;
+}
+
+pg_crc32 (*pg_comp_crc32c)(pg_crc32 crc, const void *data, size_t len) = pg_comp_crc32c_sb8;

diff --git a/configure