Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2016-03-29 Thread Kyotaro HORIGUCHI
No one should care of this but to make shure..

At Tue, 29 Mar 2016 20:12:03 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20160329.201203.78219296.horiguchi.kyot...@lab.ntt.co.jp>
> By the way, memory blocks that readline sees are freed by it but
> blocks allocated by the additional pstrdup seems abandoned
> without being freed. Actually, the address of newly allocated
> blocks seems increasing time by time slowly *even without this
> patch*..

psql doesn't leak memory. The increment was the result of new
history entry. src/common/exec.c does the following thing,

./common/exec.c:586:putenv(strdup(env_path));
./common/exec.c:597:putenv(strdup(env_path));

valgrind complains that the memory block allocated there is
definitely lost but it seems to be called once in lifetime of a
process.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-29 Thread Tatsuo Ishii
>>  with statistics without statistics
>> case10.980.01
>> case298/01/0
> 
> The case2 shows that functional dependencies assume that the
> conditions used in queries won't be incompatible - that's something
> this type of statistics can't fix.

It would be nice if that's mentioned in the manual to avoid user's
confusion.

>> case31.050.01
>> case41/0 103/0
>> case518.50   18.33
>> case623/0123/0
> 
> The last two lines (case5 + case6) seem a bit suspicious. I believe
> those are for the histogram data, and I do get these numbers:
> 
> case50.93 (5517 / 5949) 42.0 (249943 / 5949)
> case6100/0  100/0
> 
> Perhaps you've been using the version before the bugfix, with ANALYZE
> on the wrong table?

You are right. I accidentally ANALYZE t2, not t3. Now I get these
numbers:

case51.23 (7367 / 5968) 41.7 (249118 / 5981)
case6117/0  162092/0

>> 2) following comments by me are not addressed in the v18 patch.
>>
>>> - There's no docs for pg_mv_statistic (should be added to "49. System
>>>   Catalogs")
>>>
>>> - The word "multivariate statistics" or something like that should
>>>   appear in the index.
>>>
>>> - There are some explanation how to deal with multivariate statistics
>>>   in "14.1 Using Explain" and "14.2 Statistics used by the Planner"
>>>   section.
> 
> Yes, those are valid omissions. I plan to address them, and I'd also
> considering adding a section to 65.1 (How the Planner Uses
> Statistics), explaining more thoroughly how the planner uses
> multivariate stats.

Great.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OOM in libpq and infinite loop with getCopyStart()

2016-03-29 Thread Amit Kapila
On Tue, Mar 29, 2016 at 8:31 PM, David Steele  wrote:

> Hi Amit,
>
> On 3/22/16 3:37 AM, Michael Paquier wrote:
>
> Hope this brings some light in.
>>
>
> Do you know when you'll have time to respond to Michael's last email? I've
> marked this patch "waiting on author" in the meantime.
>
>
I think this patch needs to be looked upon by committer now.  I have done
review and added some code in this patch as well long back, just see the
e-mail [1], patch is just same as it was in October 2015.  I think myself
and Michael are in agreement that this patch solves the reported problem.
There is one similar problem [2] reported by Heikki which I think can be
fixed separately.

I think the main reason of moving this thread to hackers from bugs is to
gain some more traction which I see that it achieves its purpose to some
extent, but I find that more or less we are at same situation as we were
back in October.

Let me know if you think anything more from myside can help in moving patch.


[1] -
http://www.postgresql.org/message-id/cab7npqtnrw8lr1hd7zlnojjc1bp1evw_emadgorv+s-sbcr...@mail.gmail.com
[2] - http://www.postgresql.org/message-id/566ef84f.1030...@iki.fi


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2016-03-29 Thread Pavel Stehule
2016-03-30 5:43 GMT+02:00 Kyotaro HORIGUCHI :

> Hello,
>
> At Tue, 29 Mar 2016 13:12:06 +0200, Pavel Stehule 
> wrote in 

Re: [HACKERS] Sequence Access Method WIP

2016-03-29 Thread Fabrízio de Royes Mello
On Tue, Mar 29, 2016 at 5:58 PM, Petr Jelinek  wrote:
>
> On 29/03/16 22:08, Fabrízio de Royes Mello wrote:
>>
>>
>>
>> On Tue, Mar 29, 2016 at 4:59 PM, Petr Jelinek > > wrote:
>>  >
>>  > On 29/03/16 19:46, Fabrízio de Royes Mello wrotez
>>  >>
>>  >>
>>  >>  >
>>  >>  > Hmm I am unable to reproduce this. What OS? Any special configure
>>  >> flags you use?
>>  >>  >
>>  >>
>>  >> In my environment the error remains with your last patches.
>>  >>
>>  >> I didn't use any special.
>>  >>
>>  >> ./configure --prefix=/home/fabrizio/pgsql --enable-cassert
>>  >> --enable-coverage --enable-tap-tests --enable-depend
>>  >> make -s -j8 install
>>  >> make check-world
>>  >>
>>  >>
>>  >> My environment:
>>  >>
>>  >> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>>  >> $ uname -a
>>  >> Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37
UTC
>>  >> 2016 x86_64 x86_64 x86_64 GNU/Linux
>>  >>
>>  >> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>>  >> $ gcc --version
>>  >> gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
>>  >> Copyright (C) 2013 Free Software Foundation, Inc.
>>  >> This is free software; see the source for copying conditions.  There
>> is NO
>>  >> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
>> PURPOSE.
>>  >>
>>  >
>>  > Hmm nothing special indeed, still can't reproduce, I did one blind
>> try for a fix. Can you test with attached?
>>  >
>>
>> Same error... Now I'll leave in a trip but when I arrive I'll try to
>> figure out what happening debugging the code.
>>
>
> Okay, thanks.
>

Hi,

After some debugging seems that the "seqstate->xid" in "wait_for_sequence"
isn't properly initialized or initialized with the wrong value (1258291200).

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] [PATCH v1] GSSAPI encryption support

2016-03-29 Thread Robbie Harwood
Hello friends,

A new version of my GSSAPI encryption patchset is available, both in
this email and on my github:
https://github.com/frozencemetery/postgres/tree/feature/gssencrypt9

This version is intended to address David's review suggestions:

- The only code change (not counting SGML and comments) is that I've
  resolved the pre-existing "XXX: Should we loop and read all messages?"
  comment in the affirmative by... looping and reading all messages in
  pg_GSS_error.  Though commented on as part of the first patch, this is
  bundled with the rest of the auth cleanup since the first patch is
  code motion only.

- Several changes to comments, documentation rewordings, and whitespace
  additions.  I look forward to finding out what needs even more of the
  same treatment.  Of all the changesets I've posted thus far, this
  might be the one for which it makes the most sense to see what changed
  by diffing from the previous changeset.

Thanks!
From 3b62e99de16f2c4600d0bb02f3626e5157ecdc6c Mon Sep 17 00:00:00 2001
From: Robbie Harwood 
Date: Fri, 26 Feb 2016 16:07:05 -0500
Subject: [PATCH 1/3] Move common GSSAPI code into its own files

On both the frontend and backend, prepare for GSSAPI encryption suport
by moving common code for error handling into a common file.  Other than
build-system changes, no code changes occur in this patch.
---
 configure   |  2 +
 configure.in|  1 +
 src/Makefile.global.in  |  1 +
 src/backend/libpq/Makefile  |  4 ++
 src/backend/libpq/auth.c| 63 +---
 src/backend/libpq/be-gssapi-common.c| 74 +
 src/include/libpq/be-gssapi-common.h| 26 
 src/interfaces/libpq/Makefile   |  4 ++
 src/interfaces/libpq/fe-auth.c  | 48 +
 src/interfaces/libpq/fe-gssapi-common.c | 63 
 src/interfaces/libpq/fe-gssapi-common.h | 21 ++
 11 files changed, 198 insertions(+), 109 deletions(-)
 create mode 100644 src/backend/libpq/be-gssapi-common.c
 create mode 100644 src/include/libpq/be-gssapi-common.h
 create mode 100644 src/interfaces/libpq/fe-gssapi-common.c
 create mode 100644 src/interfaces/libpq/fe-gssapi-common.h

diff --git a/configure b/configure
index b3f3abe..a5bd629 100755
--- a/configure
+++ b/configure
@@ -713,6 +713,7 @@ with_systemd
 with_selinux
 with_openssl
 krb_srvtab
+with_gssapi
 with_python
 with_perl
 with_tcl
@@ -5491,6 +5492,7 @@ $as_echo "$with_gssapi" >&6; }
 
 
 
+
 #
 # Kerberos configuration parameters
 #
diff --git a/configure.in b/configure.in
index 0bd90d7..4fd8f05 100644
--- a/configure.in
+++ b/configure.in
@@ -636,6 +636,7 @@ PGAC_ARG_BOOL(with, gssapi, no, [build with GSSAPI support],
   krb_srvtab="FILE:\$(sysconfdir)/krb5.keytab"
 ])
 AC_MSG_RESULT([$with_gssapi])
+AC_SUBST(with_gssapi)
 
 
 AC_SUBST(krb_srvtab)
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index e94d6a5..3dbc5c2 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -183,6 +183,7 @@ with_perl	= @with_perl@
 with_python	= @with_python@
 with_tcl	= @with_tcl@
 with_openssl	= @with_openssl@
+with_gssapi	= @with_gssapi@
 with_selinux	= @with_selinux@
 with_systemd	= @with_systemd@
 with_libxml	= @with_libxml@
diff --git a/src/backend/libpq/Makefile b/src/backend/libpq/Makefile
index 09410c4..a8cc9a0 100644
--- a/src/backend/libpq/Makefile
+++ b/src/backend/libpq/Makefile
@@ -21,4 +21,8 @@ ifeq ($(with_openssl),yes)
 OBJS += be-secure-openssl.o
 endif
 
+ifeq ($(with_gssapi),yes)
+OBJS += be-gssapi-common.o
+endif
+
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 57c2f48..73d493e 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -136,11 +136,7 @@ bool		pg_krb_caseins_users;
  *
  */
 #ifdef ENABLE_GSS
-#if defined(HAVE_GSSAPI_H)
-#include 
-#else
-#include 
-#endif
+#include "libpq/be-gssapi-common.h"
 
 static int	pg_GSS_recvauth(Port *port);
 #endif   /* ENABLE_GSS */
@@ -715,63 +711,6 @@ recv_and_check_password_packet(Port *port, char **logdetail)
  */
 #ifdef ENABLE_GSS
 
-#if defined(WIN32) && !defined(WIN32_ONLY_COMPILER)
-/*
- * MIT Kerberos GSSAPI DLL doesn't properly export the symbols for MingW
- * that contain the OIDs required. Redefine here, values copied
- * from src/athena/auth/krb5/src/lib/gssapi/generic/gssapi_generic.c
- */
-static const gss_OID_desc GSS_C_NT_USER_NAME_desc =
-{10, (void *) "\x2a\x86\x48\x86\xf7\x12\x01\x02\x01\x02"};
-static GSS_DLLIMP gss_OID GSS_C_NT_USER_NAME = _C_NT_USER_NAME_desc;
-#endif
-
-
-static void
-pg_GSS_error(int severity, char *errmsg, OM_uint32 maj_stat, OM_uint32 min_stat)
-{
-	gss_buffer_desc gmsg;
-	OM_uint32	lmin_s,
-msg_ctx;
-	char		msg_major[128],
-msg_minor[128];
-
-	/* Fetch major status 

Re: [HACKERS] PATCH: index-only scans with partial indexes

2016-03-29 Thread Kyotaro HORIGUCHI
Thank you for polishing this.

At Tue, 29 Mar 2016 13:31:19 -0500, Kevin Grittner  wrote in 

> I tried to whip this into shape, but there were a few areas I
> didn't feel I had the necessary understanding to feel comfortable
> taking on the committer role for it.  I've cleaned it up the best I
> could, fixing whitespace and typos, eliminating an unnecessary
> addition of an include, improving C comments (at least to my eye),
> and adding an Assert where it seemed like a good idea to me.

Especially for this one,

===
@@ -2697,6 +2697,7 @@ check_partial_indexes(PlannerInfo *root, RelOptInfo *rel)
continue;   /* don't repeat work if 
already proven OK */
 
have_partial = true;
+   break;
}
if (!have_partial)
return;
===

The initialization has been moved to set_rel_size so the break
can be restored. 


> My own tests and those of others show performance improvements (up
> to 10x for some tests) and no significant performance degradation,
> even with attempts to create "worst case" scenarios.
> 
> The only changes to the regression tests are to change an "out"
> file to eliminate re-checking the index predicate against the heap
> on every matching row, which seems like a good thing.
> 
> I'm taking my name off as committer and marking it "Ready for
> Committer".  If someone else wants to comment on the issues where
> Tom and Kyotaro-san still seem unsatisfied to the point where I
> can get my head around it, I could maybe take it back on as
> committer -- if anyone feels that could be a net win.

FWIW, as mentioned upthread, I added the following condition to
decline ripping index predicates from base restrictinfo without
understanding the reason to do so.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2016-03-29 Thread Kyotaro HORIGUCHI
Hello,

At Tue, 29 Mar 2016 13:12:06 +0200, Pavel Stehule  
wrote in 

Re: [HACKERS] Combining Aggregates

2016-03-29 Thread David Rowley
On 26 March 2016 at 15:07, David Rowley  wrote:

Many thanks Robert for committing the serialize states portion.

> 0005:
> Haribabu's patch; no change from last time.

Just in case you jump ahead. I just wanted to re-highlight something
Haribabu mentioned a while ago about combining floating point states
[1]. This discussion did not happen on this thread, so to make sure it
does not get lost...

As of today aggregate calculations for floating point types can vary
depending on the order in which values are aggregated.

For example:

create table f8 (num float8);
insert into f8 select x/100.0 from generate_series(1,1) x(x);
select stddev(num order by num) from f8;
  stddev
--
 28.8689567990717
(1 row)

select stddev(num order by num desc) from f8;
  stddev
--
 28.8689567990716
(1 row)

select stddev(num order by random()) from f8;
  stddev
--
 28.8689567990715
(1 row)

And of course the execution plan can determine the order in which rows
are aggregated, even if the underlying data does not change.

Parallelising these aggregates increases the chances of seeing these
variations as the number of rows aggregated in each worker is going to
vary on each run, so the numerical anomalies will also vary between
each run.

I wrote in [1]:
> We do also warn about this in the manual: "Inexact means that some
> values cannot be converted exactly to the internal format and are
> stored as approximations, so that storing and retrieving a value might
> show slight discrepancies. Managing these errors and how they
> propagate through calculations is the subject of an entire branch of
> mathematics and computer science and will not be discussed here,
> except for the following points:" [1]
> [1] http://www.postgresql.org/docs/devel/static/datatype-numeric.html

Does this text in the documents stretch as far as the variable results
from parallel aggregate for floating point types? or should we be more
careful and not parallelise these, similar to how we didn't end up
with inverse aggregate transition functions for floating point types?

I'm personally undecided, and would like to hear what others think.

[1] 
http://www.postgresql.org/message-id/cakjs1f_hplfhkd2ylfrsrmumbzwqkgvjcwx21b_xg1a-0pz...@mail.gmail.com
-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Please correct/improve wiki page about abbreviated keys bug

2016-03-29 Thread Tom Lane
Peter Geoghegan  writes:
> Do you think it would be okay if the SQL query to detect potentially
> affected indexes only considered the leading attribute? Since that's
> the only attribute that could use abbreviated keys, it ought to be
> safe to not require users to REINDEX indexes that happen to have a
> second-or-subsequent text/varchar(n) attribute that doesn't use the C
> locale. Maybe it's not worth worrying about.

Sure, if that's the case it seems like a useful thing to know.  I'm
sure there are lots of examples of mixed int-and-text columns in
indexes, for example.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-29 Thread Dilip Kumar
On Wed, Mar 30, 2016 at 7:51 AM, Dilip Kumar  wrote:

> + if (lockWaiters)
> + /*
> + * Here we are using same freespace for all the Blocks, but that
> + * is Ok, because all are newly added blocks and have same freespace
> + * And even some block which we just added to FreespaceMap above, is
> + * used by some backend and now freespace is not same, will not harm
> + * anything, because actual freespace will be calculated by user
> + * after getting the page.
> + */
> + UpdateFreeSpaceMap(relation, firstBlock, blockNum, freespace);
>
>
> Does this look good ?


Done in better way..

+ lockWaiters = RelationExtensionLockWaiterCount(relation);
+
+ if (lockWaiters == 0)
+ return;
+


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


multi_extend_v21.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2016-03-29 Thread Peter Geoghegan
On Tue, Mar 29, 2016 at 6:02 PM, Tomas Vondra
 wrote:
> And why not? I mean, why should it be acceptable to slow down?

My point was that over 80% of execution time was spent in the
HashAggregate, which outputs tuples to the sort. That, and the huge
i5/Xeon inconsistency (in the extent to which this is regressed --
it's not at all, or it's regressed a lot) makes me suspicious that
there is something else going on. Possibly involving the scheduling of
I/O.

> That may be easily due to differences between the CPUs and configuration.
> For example the Xeon uses a way older CPU with different amounts of CPU
> cache, and it's also a multi-socket system. And so on.

We're talking about a huge relative difference with that HashAggregate
plan, though. I don't think that those relative differences are
explained by differing CPU characteristics. But I guess we'll find out
soon enough.

>> If there is ever a regression, it is only really sensible to talk
>> about it while looking at trace_sort output (and, I guess, the query
>> plan). I've asked Tomas for trace_sort output in all relevant cases.
>> There is no point in "flying blind" and speculating what the problem
>> was from a distance.
>
>
> The updated benchmarks are currently running. I'm out of office until
> Friday, and I'd like to process the results over the weekend. FWIW I'll have
> results for these cases:
>
> 1) unpatched (a414d96a)
> 2) patched, default settings
> 3) patched, replacement_sort_mem=64
>
> Also, I'll have trace_sort=on output for all the queries, so we can
> investigate further.

Thanks! That will tell us a lot more.

> Yeah. That was one of the goals of the benchmark, to come up with some
> tuning recommendations. On some systems significantly increasing memory GUCs
> may not be possible, though - say, on very small systems with very limited
> amounts of RAM.

Fortunately, such systems will probably mostly use external sorts for
CREATE INDEX cases, and there seems to be very little if any downside
there, at least according to your similarly, varied tests of CREATE
INDEX.

>> I don't think they are representative. Greg Stark characterized the
>> regressions as being fairly limited, mostly at the very low end. And
>> that was *before* all the memory fragmentation stuff made that
>> better. I haven't done any analysis of how much better that made the
>> problem *across the board* yet, but for int4 cases I could make 1MB
>> work_mem queries faster with gigabytes of data on my laptop. I
>> believe I tested various datum sort cases there, like "select
>> count(distinct(foo)) from bar"; those are a very pure test of the
>> patch.
>>
>
> Well, I'd guess those conclusions may be a bit subjective.

I think that the conclusion that we should do something or not do
something based on this information is subjective. OTOH, whether and
to what extent these tests are representative of real user workloads
seems much less subjective. This is not a criticism of the test cases
you came up with, which rightly emphasized possibly regressed cases. I
think everyone already understood that the picture was very positive
at the high end, in memory rich environments.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Please correct/improve wiki page about abbreviated keys bug

2016-03-29 Thread Peter Geoghegan
On Tue, Mar 29, 2016 at 7:31 PM, Tom Lane  wrote:
> A corrupt index could easily fail to detect uniqueness violations (because
> searches fail to find entries they should find).  Not sure I believe that
> it would make false reports of a uniqueness conflict that's not really
> there.

Sure. But looking at how texteq() is implemented, it certainly seems
impossible that that could happen. Must have been a miscommunication
somewhere. I'll fix it.

Do you think it would be okay if the SQL query to detect potentially
affected indexes only considered the leading attribute? Since that's
the only attribute that could use abbreviated keys, it ought to be
safe to not require users to REINDEX indexes that happen to have a
second-or-subsequent text/varchar(n) attribute that doesn't use the C
locale. Maybe it's not worth worrying about.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Please correct/improve wiki page about abbreviated keys bug

2016-03-29 Thread Tom Lane
Peter Geoghegan  writes:
> I made a pass over this, and changed some things. I noticed it said
> something about incorrect unique violations on affected systems. Is
> that really possible?

A corrupt index could easily fail to detect uniqueness violations (because
searches fail to find entries they should find).  Not sure I believe that
it would make false reports of a uniqueness conflict that's not really
there.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-29 Thread Dilip Kumar
On Wed, Mar 30, 2016 at 7:19 AM, Robert Haas  wrote:

Thanks for review and better comments..


> hio.c: In function ‘RelationGetBufferForTuple’:
> hio.c:231:20: error: ‘freespace’ may be used uninitialized in this
> function [-Werror=uninitialized]
> hio.c:185:7: note: ‘freespace’ was declared here
> hio.c:231:20: error: ‘blockNum’ may be used uninitialized in this
> function [-Werror=uninitialized]
> hio.c:181:14: note: ‘blockNum’ was declared here
>

I have fixed those in v20



>
> There's nothing whatsoever to prevent RelationExtensionLockWaiterCount
> from returning 0.
>
> It's also rather ugly that the call to UpdateFreeSpaceMap() assumes
> that the last value returned by PageGetHeapFreeSpace() is as good as
> any other, but maybe we can just install a comment explaining that
> point; there's not an obviously better approach that I can see.
>

Added comments..

+ if (lockWaiters)
+ /*
+ * Here we are using same freespace for all the Blocks, but that
+ * is Ok, because all are newly added blocks and have same freespace
+ * And even some block which we just added to FreespaceMap above, is
+ * used by some backend and now freespace is not same, will not harm
+ * anything, because actual freespace will be calculated by user
+ * after getting the page.
+ */
+ UpdateFreeSpaceMap(relation, firstBlock, blockNum, freespace);


Does this look good ?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


multi_extend_v20.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-03-29 Thread David Steele
On 3/29/16 2:09 PM, Magnus Hagander wrote:

> I had a chat with Heikki, and here's another suggestion:
> 
> 1. We don't touch the current exclusive backups at all, as previously
> discussed, other than deprecating their use. For backwards compat.
> 
> 2. For new backups, we return the contents of pg_control as a bytea from
> pg_stop_backup(). We tell backup programs they are supposed to write
> this out as pg_control.backup, *not* as pg_control.
> 
> 3a. On recovery, if it's an exclusive backup, we do as we did before.
> 
> 3b. on recovery, in non-exclusive backups (determined from
> backup_label), we check that pg_control.backup exists *and* that
> pg_control does *not* exist. That guards us reasonably against backup
> programs that do the wrong thing, and we know we get the correct version
> of pg_control.
> 
> 4. (we can still add the stop location to the backup_label file in case
> backup programs find it useful, but we don't use it in recovery)
> 
> Thoughts about this approach?

This certainly looks like it would work but it raises the barrier for
implementing backups by quite a lot.  It's fine for backrest or barman
but it won't be pleasant for anyone who has home-grown scripts.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Please correct/improve wiki page about abbreviated keys bug

2016-03-29 Thread Peter Geoghegan
On Tue, Mar 29, 2016 at 5:31 PM, Josh berkus  wrote:
> For Thursday's release, I've added a wiki page to give users more
> information about the strxfrm() issue, especially since we're going to
> ask them to do a bunch of REINDEXing.

I made a pass over this, and changed some things. I noticed it said
something about incorrect unique violations on affected systems. Is
that really possible? I can't imagine how it could be, although that
is left in place largely unchanged for now.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Allow to_timestamp(float8) to convert float infinity to timestam

2016-03-29 Thread Michael Paquier
On Wed, Mar 30, 2016 at 10:47 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Wed, Mar 30, 2016 at 10:13 AM, Tom Lane  wrote:
>>> I'm inclined to just drop the out-of-range test cases.  They're not that
>>> useful IMO, and alternate expected-files are a real PITA for maintenance.
>
>> Hm. Actually, they are quite useful to check error boundaries, so why
>> not just simplifying the error message to "timestamp out of range" and
>> remove the value from it?
>
> Meh.  I realize that there are a lot of places where we just say
> "timestamp out of range" rather than trying to give a specific value,
> but it's really contrary to our message style guidelines to not print
> the complained-of value.  I think we should leave the ereport calls as-is
> and remove the test cases; to do otherwise is putting the regression tests
> ahead of users.  The upper-boundary test is quite dubiously useful anyway,
> because it has to test a value that's very far away from where the
> boundary actually is in most builds.

OK, I won't fight more on that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-29 Thread Robert Haas
On Tue, Mar 29, 2016 at 1:29 PM, Dilip Kumar  wrote:
> Attaching new version v18
>
> - Some cleanup work on v17.
> - Improved  UpdateFreeSpaceMap function.
> - Performance and space utilization are same as V17

Looks better.  Here's a v19 that I hacked on a bit.

Unfortunately, one compiler I tried this with had a pretty legitimate complaint:

hio.c: In function ‘RelationGetBufferForTuple’:
hio.c:231:20: error: ‘freespace’ may be used uninitialized in this
function [-Werror=uninitialized]
hio.c:185:7: note: ‘freespace’ was declared here
hio.c:231:20: error: ‘blockNum’ may be used uninitialized in this
function [-Werror=uninitialized]
hio.c:181:14: note: ‘blockNum’ was declared here

There's nothing whatsoever to prevent RelationExtensionLockWaiterCount
from returning 0.

It's also rather ugly that the call to UpdateFreeSpaceMap() assumes
that the last value returned by PageGetHeapFreeSpace() is as good as
any other, but maybe we can just install a comment explaining that
point; there's not an obviously better approach that I can see.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 8140418..aeed40b 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -169,6 +169,69 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 }
 
 /*
+ * Extend a relation by multiple blocks to avoid future contention on the
+ * relation extension lock.  Our goal is to pre-extend the relation by an
+ * amount which ramps up as the degree of contention ramps up, but limiting
+ * the result to some sane overall value.
+ */
+static void
+RelationAddExtraBlocks(Relation relation, BulkInsertState bistate)
+{
+	Page		page;
+	BlockNumber	blockNum,
+firstBlock;
+	int			extraBlocks = 0;
+	int			lockWaiters = 0;
+	Size		freespace;
+	Buffer		buffer;
+
+	/*
+	 * We use the length of the lock wait queue to judge how much to extend.
+	 * It might seem like multiplying the number of lock waiters by as much
+	 * as 20 is too aggressive, but benchmarking revealed that smaller numbers
+	 * were insufficient.  512 is just an arbitrary cap to prevent pathological
+	 * results (and excessive wasted disk space).
+	 */
+	lockWaiters = RelationExtensionLockWaiterCount(relation);
+	extraBlocks = Min(512, lockWaiters * 20);
+
+	firstBlock = InvalidBlockNumber;
+
+	while (extraBlocks-- >= 0)
+	{
+		/* Ouch - an unnecessary lseek() each time through the loop! */
+		buffer = ReadBufferBI(relation, P_NEW, bistate);
+
+		/* Extend by one page. */
+		LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+		page = BufferGetPage(buffer);
+		PageInit(page, BufferGetPageSize(buffer), 0);
+		MarkBufferDirty(buffer);
+		blockNum = BufferGetBlockNumber(buffer);
+		freespace = PageGetHeapFreeSpace(page);
+		UnlockReleaseBuffer(buffer);
+
+		if (firstBlock == InvalidBlockNumber)
+			firstBlock = blockNum;
+
+		/*
+		 * Immediately update the bottom level of the FSM.  This has a good
+		 * chance of making this page visible to other concurrently inserting
+		 * backends, and we want that to happen without delay.
+		 */
+		RecordPageWithFreeSpace(relation, blockNum, freespace);
+	}
+
+	/*
+	 * Updating the upper levels of the free space map is too expensive
+	 * to do for every block, but it's worth doing once at the end to make
+	 * sure that subsequent insertion activity sees all of those nifty free
+	 * pages we just inserted.
+	 */
+	UpdateFreeSpaceMap(relation, firstBlock, blockNum, freespace);
+}
+
+/*
  * RelationGetBufferForTuple
  *
  *	Returns pinned and exclusive-locked buffer of a page in given relation
@@ -233,8 +296,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	bool		use_fsm = !(options & HEAP_INSERT_SKIP_FSM);
 	Buffer		buffer = InvalidBuffer;
 	Page		page;
-	Size		pageFreeSpace,
-saveFreeSpace;
+	Size		pageFreeSpace = 0,
+saveFreeSpace = 0;
 	BlockNumber targetBlock,
 otherBlock;
 	bool		needLock;
@@ -308,6 +371,7 @@ RelationGetBufferForTuple(Relation relation, Size len,
 		}
 	}
 
+loop:
 	while (targetBlock != InvalidBlockNumber)
 	{
 		/*
@@ -440,10 +504,46 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	 */
 	needLock = !RELATION_IS_LOCAL(relation);
 
+	/*
+	 * If we need the lock but are not able to acquire it immediately, we'll
+	 * consider extending the relation by multiple blocks at a time to manage
+	 * contention on the relation extension lock.  However, this only makes
+	 * sense if we're using the FSM; otherwise, there's no point.
+	 */
 	if (needLock)
-		LockRelationForExtension(relation, ExclusiveLock);
+	{
+		if (!use_fsm)
+			LockRelationForExtension(relation, ExclusiveLock);
+		else if (!ConditionalLockRelationForExtension(relation, ExclusiveLock))
+		{
+			/* Couldn't get the lock immediately; wait for it. */
+			LockRelationForExtension(relation, ExclusiveLock);
+
+			/*
+			 * Check if some other backend has 

Re: [HACKERS] [COMMITTERS] pgsql: Allow to_timestamp(float8) to convert float infinity to timestam

2016-03-29 Thread Tom Lane
Michael Paquier  writes:
> On Wed, Mar 30, 2016 at 10:13 AM, Tom Lane  wrote:
>> I'm inclined to just drop the out-of-range test cases.  They're not that
>> useful IMO, and alternate expected-files are a real PITA for maintenance.

> Hm. Actually, they are quite useful to check error boundaries, so why
> not just simplifying the error message to "timestamp out of range" and
> remove the value from it?

Meh.  I realize that there are a lot of places where we just say
"timestamp out of range" rather than trying to give a specific value,
but it's really contrary to our message style guidelines to not print
the complained-of value.  I think we should leave the ereport calls as-is
and remove the test cases; to do otherwise is putting the regression tests
ahead of users.  The upper-boundary test is quite dubiously useful anyway,
because it has to test a value that's very far away from where the
boundary actually is in most builds.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-29 Thread Dilip Kumar
On Tue, Mar 29, 2016 at 10:43 PM, Andres Freund  wrote:

> My gut feeling is that we should do both 1) and 2).
>
> Dilip, could you test performance of reducing ppc's spinlock to 1 byte?
> Cross-compiling suggest that doing so "just works".  I.e. replace the
> #if defined(__ppc__) typedef from an int to a char.
>

I set that, but after that it hangs, even Initdb hangs..

 int

   │164 s_lock(volatile slock_t *lock, const char *file, int line)


   │165 {


   │166 SpinDelayStatus delayStatus;


   │167


   │168 init_spin_delay(, (Pointer)lock, file,
line);

   │169


*   │170 while (TAS_SPIN(lock))

 *
*   │171 {

   *
*  >│172 make_spin_delay();

  *
*   │173 }*


   │174

I did not try to find the reason, but just built in debug mode and found it
never come out of this loop.

I clean build multiple times but problem persist,

Does it have dependency of some other variable of defined under PPC in some
other place ? I don't know..

/* PowerPC */

*#if* defined(__ppc__) || defined(__powerpc__) || defined(__ppc64__) ||
defined(__powerpc64__)

*#define* HAS_TEST_AND_SET

*typedef* *unsigned* *int* slock_t;  --> changed like this

*#define* TAS(lock) tas(lock)

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-29 Thread Robert Haas
On Tue, Mar 29, 2016 at 9:29 PM, Andrew Dunstan  wrote:
> I am currently travelling, but my intention is to deal with the remaining
> patches when I'm back home this weekend, unless someone beats me to it.

Cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2016-03-29 Thread Robert Haas
On Tue, Mar 29, 2016 at 5:47 PM, Thomas Munro
 wrote:
> On Wed, Mar 30, 2016 at 6:04 AM, Robert Haas  wrote:
>> On Tue, Mar 29, 2016 at 3:17 AM, Michael Paquier
>>  wrote:
>>> OK, so I am switching this patch as "Ready for committer", for 0001.
>>> It is in better shape now.
>>
>> Well...  I have a few questions yet.
>>
>> The new argument to SyncRepWaitForLSN is called "bool commit", but
>> RecordTransactionAbortPrepared passes true.  Either it should be
>> passing false, or the parameter is misnamed or at the least in need of
>> a better comment.
>>
>> I don't understand why this patch is touching the abort paths at all.
>> XactLogAbortRecord sets XACT_COMPLETION_SYNC_APPLY_FEEDBACK, and
>> xact_redo_abort honors it.  But surely it makes no sense to wait for
>> an abort to become visible.
>
> You're right, that was totally unnecessary.  Here is a version that
> removes that (ie XactLogAbortRecord doesn't request apply feedback
> from the standby, xact_redo_abort doesn't send apply feedback to the
> primary and RecordTransactionAbortPrepared now passes false to
> SyncRepWaitForLSN so it doesn't wait for apply feedback from the
> standby).  Also I fixed a silly bug in SyncRepWaitForLSN when capping
> the mode.  I have also renamed  XACT_COMPLETION_SYNC_APPLY_FEEDBACK to
> the more general XACT_COMPLETION_APPLY_FEEDBACK, because the later
> 0004 patch will use it for a more general purpose than
> synchronous_commit.

OK, I committed this, with a few tweaks.  In particular, I added a
flag variable instead of relying on "latch set" == "need to send
reply"; the other changes were cosmetic.

I'm not sure how much more of this we can realistically get into 9.6;
the latter patches haven't had much review yet.  But I'll set this
back to Needs Review in the CommitFest and we'll see where we end up.
But even if we don't get anything more than this, it's still rather
nice: remote_apply turns out to be only slightly slower than remote
flush, and it's a guarantee that a lot of people are looking for.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-29 Thread Andrew Dunstan



On 03/29/2016 08:48 PM, Michael Paquier wrote:

On Wed, Mar 30, 2016 at 12:45 AM, Robert Haas  wrote:

On Tue, Mar 29, 2016 at 11:31 AM, Petr Jelinek  wrote:

I have machine ready, waiting for animal name and secret.

Great!

Nice. Thanks.


It will obviously
fail until we push the 0002 and 0004 though.

I think it would be a shame if we shipped 9.6 without MSVC 2015
support, but it'd be nice if Andrew or Magnus could do the actual
committing, 'cuz I really don't want to be responsible for breaking
the Windows build.

I'm fine to back you up as needed. That's not a committer guarantee,
still better than nothing I guess. And I make a specialty of looking
at VS-related bugs lately :)



I am currently travelling, but my intention is to deal with the 
remaining patches when I'm back home this weekend, unless someone beats 
me to it.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Allow to_timestamp(float8) to convert float infinity to timestam

2016-03-29 Thread Michael Paquier
On Wed, Mar 30, 2016 at 10:13 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Wed, Mar 30, 2016 at 6:09 AM, Tom Lane  wrote:
>>> Allow to_timestamp(float8) to convert float infinity to timestamp infinity.
>
>> Some of the tests introduced are making MSVC unhappy, because they
>> depend on the three-digit behavior that Windows is using, leading to
>> those failures:
>
> Ah, I was wondering about that.  The patch as-submitted used "%lf" which
> seemed even less likely to be portable, but evidently %g isn't that much
> better.

Yep.

>> If the those tests are kept, an alternate output file is necessary (I
>> can send a patch if needed, I see the failure locally as well).
>
> I'm inclined to just drop the out-of-range test cases.  They're not that
> useful IMO, and alternate expected-files are a real PITA for maintenance.

Hm. Actually, they are quite useful to check error boundaries, so why
not just simplifying the error message to "timestamp out of range" and
remove the value from it?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Allow to_timestamp(float8) to convert float infinity to timestam

2016-03-29 Thread Tom Lane
Michael Paquier  writes:
> On Wed, Mar 30, 2016 at 6:09 AM, Tom Lane  wrote:
>> Allow to_timestamp(float8) to convert float infinity to timestamp infinity.

> Some of the tests introduced are making MSVC unhappy, because they
> depend on the three-digit behavior that Windows is using, leading to
> those failures:

Ah, I was wondering about that.  The patch as-submitted used "%lf" which
seemed even less likely to be portable, but evidently %g isn't that much
better.

> If the those tests are kept, an alternate output file is necessary (I
> can send a patch if needed, I see the failure locally as well).

I'm inclined to just drop the out-of-range test cases.  They're not that
useful IMO, and alternate expected-files are a real PITA for maintenance.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [BUGS] Re: [HACKERS] BUG #13854: SSPI authentication failure: wrong realm name used

2016-03-29 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> Anyway, as things stand, elog(ERROR) will abort the session safely but
>> you won't necessarily get the kind of logging you want, so expected
>> auth-failure cases should try to go the STATUS_ERROR route.

> In other words, the use of palloc() and friends (psprintf in the patch)
> should be acceptable here.

Sure, no problem with that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2016-03-29 Thread Tomas Vondra

Hi,

On 03/29/2016 09:43 PM, Peter Geoghegan wrote:

On Tue, Mar 29, 2016 at 9:11 AM, Robert Haas  wrote:

One test that kind of bothers me in particular is the "SELECT DISTINCT
a FROM numeric_test ORDER BY a" test on the high_cardinality_random
data set.  That's a wash at most work_mem values, but at 32MB it's
more than 3x slower.  That's very strange, and there are a number of
other results like that, where one particular work_mem value triggers
a large regression.  That's worrying.


That case is totally invalid as a benchmark for this patch. Here is
the query plan I get (doesn't matter if I run analyze) when I follow
Tomas' high_cardinality_random 10M instructions (including setting
work_mem to 32MB):

postgres=# explain analyze select distinct a from numeric_test order by a;
   QUERY
PLAN
───
  Sort  (cost=268895.39..270373.10 rows=591082 width=8) (actual
time=3907.917..4086.174 rows=999879 loops=1)
Sort Key: a
Sort Method: external merge  Disk: 18536kB
->  HashAggregate  (cost=206320.50..212231.32 rows=591082 width=8)
(actual time=3109.619..3387.599 rows=999879 loops=1)
  Group Key: a
  ->  Seq Scan on numeric_test  (cost=0.00..175844.40
rows=12190440 width=8) (actual time=0.025..601.295 rows=1000
loops=1)
  Planning time: 0.088 ms
  Execution time: 4120.656 ms
(8 rows)

Does that seem like a fair test of this patch?


And why not? I mean, why should it be acceptable to slow down?



I must also point out an inexplicable differences between the i5 and
Xeon in relation to this query. It took about took 10% less time on
the patched Xeon 10M case, not ~200% more (line 53 of the summary page
in each 10M case). So even if this case did exercise the patch well,
it's far from clear that it has even been regressed at all. It's far
easier to imagine that there was some problem with the i5 tests.


That may be easily due to differences between the CPUs and 
configuration. For example the Xeon uses a way older CPU with different 
amounts of CPU cache, and it's also a multi-socket system. And so on.



A complete do-over from Tomas would be best, here. He has already
acknowledged that the i5 CREATE INDEX results were completely invalid.
Pending a do-over from Tomas, I recommend ignoring the i5 tests
completely. Also, I should once again point out that many of the
work_mem cases actually had internal sorts at the high end, so once
the code in the patches simply wasn't exercised at all at the high end
(the 1024MB cases, where the numbers might be expected to get really
good).

If there is ever a regression, it is only really sensible to talk
about it while looking at trace_sort output (and, I guess, the query
plan). I've asked Tomas for trace_sort output in all relevant cases.
There is no point in "flying blind" and speculating what the problem
was from a distance.


The updated benchmarks are currently running. I'm out of office until 
Friday, and I'd like to process the results over the weekend. FWIW I'll 
have results for these cases:


1) unpatched (a414d96a)
2) patched, default settings
3) patched, replacement_sort_mem=64

Also, I'll have trace_sort=on output for all the queries, so we can 
investigate further.





Also, it's pretty clear that the patch has more large wins than it
does large losses, but it seems pretty easy to imagine people who
haven't tuned any GUCs writing in to say that 9.6 is way slower on
their workload, because those people are going to be at
work_mem=4MB, maintenance_work_mem=64MB. At those numbers, if
Tomas's data is representative, it's not hard to imagine that the
number of people who see a significant regression might be quite a
bit larger than the number who see a significant speedup.


Yeah. That was one of the goals of the benchmark, to come up with some 
tuning recommendations. On some systems significantly increasing memory 
GUCs may not be possible, though - say, on very small systems with very 
limited amounts of RAM.




I don't think they are representative. Greg Stark characterized the
regressions as being fairly limited, mostly at the very low end. And
that was *before* all the memory fragmentation stuff made that
better. I haven't done any analysis of how much better that made the
problem *across the board* yet, but for int4 cases I could make 1MB
work_mem queries faster with gigabytes of data on my laptop. I
believe I tested various datum sort cases there, like "select
count(distinct(foo)) from bar"; those are a very pure test of the
patch.



Well, I'd guess those conclusions may be a bit subjective.

regards


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make 

Re: [HACKERS] Using quicksort for every external sort run

2016-03-29 Thread Peter Geoghegan
On Tue, Mar 29, 2016 at 12:43 PM, Peter Geoghegan  wrote:
> A complete do-over from Tomas would be best, here. He has already
> acknowledged that the i5 CREATE INDEX results were completely invalid.

The following analysis is all based on Xeon numbers, which as I've
said we should focus on pending a do-over from Tomas. Especially
important here is the larget set -- the 10M numbers from
results-xeon-10m.ods.

I think that abbreviation distorts things here. We also see distortion
from "padding" cases.

Rather a lot of "padding" is used, FWIW. From Tomas' script:

INSERT INTO numeric_test_padding SELECT a, repeat(md5(a::text),10)
FROM data_float ORDER BY a;

This makes the tests have TOAST overhead.

Some important observations on results-xeon-10m:

* There are almost no regressions for types that don't use
abbreviation. There might be one exception when there is both padding
and presorted input -- the 32MB
high_cardinality_almost_asc/high_cardinality_sorted/unique_sorted
"SELECT * FROM int_test_padding ORDER BY a", which takes 26% - 35%
longer (those are all basically the same cases). But it's a big win in
the high_cardinality_random, unique_random, and even unique_almost_asc
categories, or when DESC order was requested in all categories (I note
that there is certainly an emphasis on pre-sorted cases in the choice
of categories). Other than that, no regressions from non-abbreviated
types.

* No CREATE INDEX case is ever appreciably regressed, even with
maintenance_work_mem at 8MB, 1/4 of its default value of 64MB. (Maybe
we lose 1% - 3% with the other (results-xeon-1m.ods) cases, where
maintenance_work_mem is close to or actually high enough to get an
internal sort). It's a bit odd that "CREATE INDEX x ON
text_test_padding (a)" is about a wash for
high_cardinality_almost_asc, but I think that's just because we're
super I/O bound for this presorted case, but cannot make up for it
with quicksort's "bubble sort best case" precheck for presortedness,
so replacement selection does better in a way that might even result
in a clean draw. CREATE INDEX looks very good in general. I think
abbreviation might abort in one or two cases for text, but the picture
for the patch is still solid.

* "Padding" can really distort low-end cases, that become more about
moving big tuples around than actual sorting. If you really want to
see how high_cardinality_almost_asc queries like "SELECT * FROM
text_test_padding ORDER BY a" are testing the wrong thing, consider
the best and worst case for the master branch with any amount of
work_mem. The 10 million tuple high_cardinality_almost_asc case takes
40.16 seconds, 39.95 seconds, 40.98 seconds, and 41.28 seconds, and
42.1 seconds for respective work_mems of 8MB, 32MB, 128MB, 512MB, and
1024MB. This is a very narrow case because it totally deemphasizes
comparison cost and emphasizes moving tuples around, involves
abbreviation of text with a merge phase that cannot use abbreviation
that only the patch has due to RS best-case on master. The case is
seriously short changed by the memory batching refund thing in
practice. When is *high cardinality text* (not dates or something)
ever likely to be found in pre-sorted order for 10 million tuples in
the real world? Besides, we just stopped trusting strxfrm(), so the
case would probably be a wash now at worst.

* The more plausible padding + presorted + abbreviation case that is
sometimes regressed is "SELECT * FROM numeric_test_padding ORDER BY
a". But that's regressed a lot less than the aforementioned "SELECT *
FROM text_test_padding ORDER BY a" case, and only at the low end. It
is sometimes faster where the original case I mentioned is slower.

* Client overhead may distort things in the case of queries like
"SELECT * FROM foo ORDER BY bar". This could be worse for the patch,
which does relatively more computation during the final on-the-fly
merge phase (which is great when you can overlap that with I/O;
perhaps not when you get more icache misses with other computation).
Aside from just adding a lot of noise, this could unfairly make the
patch look a lot worse than master.

Now, I'm not saying all of this doesn't matter. But these are all
fairly narrow, pathological cases, often more about moving big tuples
around (in memory and on disk) than about sorting. These regressions
are well worth it. I don't think I can do any more than I already have
to fix these cases; it may be impossible. It's a very difficult thing
to come up with an algorithm that's unambiguously better in every
possible case. I bent over backwards to fix low-end regressions
already.

In memory rich environments with lots of I/O bandwidth, I've seen this
patch make CREATE INDEX ~2.5x faster for int4, on a logged table. More
importantly, the patch makes setting maintenance_work_mem easy. Users'
intuition for how sizing it ought to work now becomes more or less
correct: In general, for each individual utility command bound by
maintenance_work_mem, more 

Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-29 Thread Michael Paquier
On Wed, Mar 30, 2016 at 12:45 AM, Robert Haas  wrote:
> On Tue, Mar 29, 2016 at 11:31 AM, Petr Jelinek  wrote:
>> I have machine ready, waiting for animal name and secret.
>
> Great!

Nice. Thanks.

>> It will obviously
>> fail until we push the 0002 and 0004 though.
>
> I think it would be a shame if we shipped 9.6 without MSVC 2015
> support, but it'd be nice if Andrew or Magnus could do the actual
> committing, 'cuz I really don't want to be responsible for breaking
> the Windows build.

I'm fine to back you up as needed. That's not a committer guarantee,
still better than nothing I guess. And I make a specialty of looking
at VS-related bugs lately :)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extend pgbench expressions with functions

2016-03-29 Thread Michael Paquier
On Wed, Mar 30, 2016 at 1:23 AM, Fabien COELHO  wrote:
>
> Hello Robert,
>
>>> If we don't nuke it, it'll never die.
>>
>>
>> Hearing no objections, BOOM.
>
>
> FIZZ! :-)
>
> Thanks for the commits, and apology for the portability bugs.

Thanks for the additions, Fabien. This has been a long trip.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Please correct/improve wiki page about abbreviated keys bug

2016-03-29 Thread Josh berkus
Hackers,

For Thursday's release, I've added a wiki page to give users more
information about the strxfrm() issue, especially since we're going to
ask them to do a bunch of REINDEXing.

Please help me improve this.  Particularly, I need help on the following:

* is my explanation of the issue correct?
* how does this affect non-Linux platforms?  Windows?
* can someone write a search query for columns with indexes in non-C
collation?

Thanks!

http://wiki.postgresql.org/wiki/Abbreviatedkeys_issue

(if you have editing rights, please just edit the wiki page instead of
commenting here)

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OOM in libpq and infinite loop with getCopyStart()

2016-03-29 Thread David Steele
On 3/29/16 8:21 PM, Michael Paquier wrote:
> On Wed, Mar 30, 2016 at 12:01 AM, David Steele  wrote:
>> Hi Amit,
>>
>> On 3/22/16 3:37 AM, Michael Paquier wrote:
>>
>>> Hope this brings some light in.
>>
>>
>> Do you know when you'll have time to respond to Michael's last email? I've
>> marked this patch "waiting on author" in the meantime.
> 
> Shouldn't it be "needs review" instead? I am marked as the author of this 
> patch.

Whoops!  I got the author and reviewer backwards.  Set back to "needs
review".

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OOM in libpq and infinite loop with getCopyStart()

2016-03-29 Thread Michael Paquier
On Wed, Mar 30, 2016 at 12:01 AM, David Steele  wrote:
> Hi Amit,
>
> On 3/22/16 3:37 AM, Michael Paquier wrote:
>
>> Hope this brings some light in.
>
>
> Do you know when you'll have time to respond to Michael's last email? I've
> marked this patch "waiting on author" in the meantime.

Shouldn't it be "needs review" instead? I am marked as the author of this patch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Allow to_timestamp(float8) to convert float infinity to timestam

2016-03-29 Thread Michael Paquier
On Wed, Mar 30, 2016 at 6:09 AM, Tom Lane  wrote:
> Allow to_timestamp(float8) to convert float infinity to timestamp infinity.
>
> With the original SQL-function implementation, such cases failed because
> we don't support infinite intervals.  Converting the function to C lets
> us bypass the interval representation, which should be a bit faster as
> well as more flexible.

+-- The upper boundary differs between integer and float timestamps,
so check the biggest one
+SELECT to_timestamp(185331707078400::float8);  -- error, out of range
+ERROR:  timestamp out of range: "1.85332e+14"
Some of the tests introduced are making MSVC unhappy, because they
depend on the three-digit behavior that Windows is using, leading to
those failures:

  -- The upper boundary differs between integer and float timestamps,
so check the biggest one
  SELECT to_timestamp(185331707078400::float8);  -- error, out of range
! ERROR:  timestamp out of range: "1.85332e+14"
  -- nonfinite values
  SELECT to_timestamp(' Infinity'::float);
   to_timestamp
--- 2338,2344 

  -- The upper boundary differs between integer and float timestamps,
so check the biggest one
  SELECT to_timestamp(185331707078400::float8);  -- error, out of range
! ERROR:  timestamp out of range: "1.85332e+014"

If the those tests are kept, an alternate output file is necessary (I
can send a patch if needed, I see the failure locally as well).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Sync tzload() and tzparse() APIs with IANA release tzcode2016c.

2016-03-29 Thread Michael Paquier
On Tue, Mar 29, 2016 at 11:20 PM, Tom Lane  wrote:
> Christian Ullrich  writes:
>> * Tom Lane wrote:
>>> But then, should not this code make sure that errno *always* gets set?
>
>> A library function that does not fail does not touch errno.
>
> Right, I meant "always in the error path".
>
>>> I'd be inclined to think we should use _dosmaperr(), too, rather than
>>> hand-coding it.
>
>> Yes, of course. If only I had known about it ...
>> New patch attached.
>
> This looks good, will push shortly.

Thanks for digging more than I did regarding this issue, things are
fixed on my side, and the buildfarm looks happy. Seeing EEXIST being
returned for a non-existing entry when calling link() really bugged
me, and the errno my patch looked at was set from another function
call and not link() itself...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] standby_schedule

2016-03-29 Thread Michael Paquier
On Wed, Mar 30, 2016 at 6:44 AM, Alvaro Herrera
 wrote:
> I think we're at a point where we can translate the tests in
> src/test/regress/standby_schedule file into a PostgresNode-based test,
> or remove it (probably under src/test/recovery).  That way, it would get
> run all the time rather than just when somebody feels like it (which is
> probably almost never, if at all).

Having a test in src/test/recovery feels the most natural approach to me.

> Would somebody like to volunteer?

After the CF is closed and feature freeze happens, I guess I could
always find a couple of hours for cleanup patches. FWIW, I have run
this test suite manually from time to time, but running it always in
the buildfarm would be way better (buildfarm client code is not
patched yet to run tests in src/test/recovery btw).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] kqueue

2016-03-29 Thread Thomas Munro
Hi,

On the WaitEventSet thread I posted a small patch to add kqueue
support[1].  Since then I peeked at how some other software[2]
interacts with kqueue and discovered that there are platforms
including NetBSD where kevent.udata is an intptr_t instead of a void
*.  Here's a version which should compile there.  Would any NetBSD
user be interested in testing this?  (An alternative would be to make
configure to test for this with some kind of AC_COMPILE_IFELSE
incantation but the steamroller cast is simpler.)

[1] 
http://www.postgresql.org/message-id/CAEepm=1dZ_mC+V3YtB79zf27280nign8MKOLxy2FKhvc1RzN=g...@mail.gmail.com
[2] 
https://github.com/libevent/libevent/commit/5602e451ce872d7d60c640590113c5a81c3fc389

-- 
Thomas Munro
http://www.enterprisedb.com


kqueue-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v8] GSSAPI encryption support

2016-03-29 Thread David Steele
On 3/29/16 5:05 PM, Robbie Harwood wrote:
> David Steele  writes:
> 
>> On 3/20/16 12:09 AM, Robbie Harwood wrote:
>>
>>> A new version of my GSSAPI encryption patchset is available
>>
>> Here's a more thorough review:
> 
> Thanks for the review!  To keep this a manageable size, I'm going to
> trim pretty heavily.  If I haven't replied to something, please
> interpret it to mean that I think it's a good suggestion and will
> fix/change in v9.
> 
>> +++ b/doc/src/sgml/runtime.sgml
>>
>> I think you mean postgresql.conf?
> 
> Sorry, what does this review comment go with?

Oops! That was a comment I made then realized I had misunderstood what
was going on.  I removed the code but not the comment apparently.

Thanks for the other clarifications.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] [PATCH] Remove TZ entry from postgres CLI doc page.

2016-03-29 Thread Tom Lane
Matthew Somerville  writes:
> Please find attached a patch to the postgres command line
> documentation page doc/src/sgml/ref/postgres-ref.sgml that removes the
> "TZ" entry from the "Environment" section. If I've understood it
> correctly, since ca4af308 TZ can be looked at when you run initdb, but
> is not looked at when the server is started.

Yeah, this corner of the documentation evidently got missed in that
commit.  Thanks for noticing, will fix!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [BUGS] Re: [HACKERS] BUG #13854: SSPI authentication failure: wrong realm name used

2016-03-29 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > So, it seems that ClientAuthentication() expects to raise ereport(FATAL)
> > in case of authentication failures.  But what's the code path that
> > causes that to happen if a ereport(ERROR) happens in there?  Because all
> > that code is pretty careful to not do ereport(ERROR) directly and
> > instead return STATUS_ERROR which makes ClientAuthentication do the
> > FATAL report.  If this doesn't matter, then isn't this whole code overly
> > complicated for no reason?
> 
> The reason why elog(ERROR) will become a FATAL is that no outer setjmp
> has been executed yet, so elog.c will realize it has noplace to longjmp
> to.

Ah, I was looking callers up-stack and found nothing.  That should have
cued me that that was happening :-)

> Anyway, as things stand, elog(ERROR) will abort the session safely but
> you won't necessarily get the kind of logging you want, so expected
> auth-failure cases should try to go the STATUS_ERROR route.

In other words, the use of palloc() and friends (psprintf in the patch)
should be acceptable here.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [BUGS] Re: [HACKERS] BUG #13854: SSPI authentication failure: wrong realm name used

2016-03-29 Thread Tom Lane
Alvaro Herrera  writes:
> So, it seems that ClientAuthentication() expects to raise ereport(FATAL)
> in case of authentication failures.  But what's the code path that
> causes that to happen if a ereport(ERROR) happens in there?  Because all
> that code is pretty careful to not do ereport(ERROR) directly and
> instead return STATUS_ERROR which makes ClientAuthentication do the
> FATAL report.  If this doesn't matter, then isn't this whole code overly
> complicated for no reason?

The reason why elog(ERROR) will become a FATAL is that no outer setjmp
has been executed yet, so elog.c will realize it has noplace to longjmp
to.

Whether it's overcomplicated I dunno.  I think the idea behind returning
STATUS_ERROR is to allow a centralized reporting site to decorate the
errors with additional info, as indeed auth_fail does.  Certainly that
could be done another way (errcontext?), but that's the way we've got.

Anyway, as things stand, elog(ERROR) will abort the session safely but
you won't necessarily get the kind of logging you want, so expected
auth-failure cases should try to go the STATUS_ERROR route.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2016-03-29 Thread Thomas Munro
On Wed, Mar 30, 2016 at 6:04 AM, Robert Haas  wrote:
> On Tue, Mar 29, 2016 at 3:17 AM, Michael Paquier
>  wrote:
>> OK, so I am switching this patch as "Ready for committer", for 0001.
>> It is in better shape now.
>
> Well...  I have a few questions yet.
>
> The new argument to SyncRepWaitForLSN is called "bool commit", but
> RecordTransactionAbortPrepared passes true.  Either it should be
> passing false, or the parameter is misnamed or at the least in need of
> a better comment.
>
> I don't understand why this patch is touching the abort paths at all.
> XactLogAbortRecord sets XACT_COMPLETION_SYNC_APPLY_FEEDBACK, and
> xact_redo_abort honors it.  But surely it makes no sense to wait for
> an abort to become visible.

You're right, that was totally unnecessary.  Here is a version that
removes that (ie XactLogAbortRecord doesn't request apply feedback
from the standby, xact_redo_abort doesn't send apply feedback to the
primary and RecordTransactionAbortPrepared now passes false to
SyncRepWaitForLSN so it doesn't wait for apply feedback from the
standby).  Also I fixed a silly bug in SyncRepWaitForLSN when capping
the mode.  I have also renamed  XACT_COMPLETION_SYNC_APPLY_FEEDBACK to
the more general XACT_COMPLETION_APPLY_FEEDBACK, because the later
0004 patch will use it for a more general purpose than
synchronous_commit.

-- 
Thomas Munro
http://www.enterprisedb.com


0001-remote-apply-v10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] standby_schedule

2016-03-29 Thread Alvaro Herrera
I think we're at a point where we can translate the tests in
src/test/regress/standby_schedule file into a PostgresNode-based test,
or remove it (probably under src/test/recovery).  That way, it would get
run all the time rather than just when somebody feels like it (which is
probably almost never, if at all).

Would somebody like to volunteer?



-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [BUGS] Re: [HACKERS] BUG #13854: SSPI authentication failure: wrong realm name used

2016-03-29 Thread Alvaro Herrera
So, it seems that ClientAuthentication() expects to raise ereport(FATAL)
in case of authentication failures.  But what's the code path that
causes that to happen if a ereport(ERROR) happens in there?  Because all
that code is pretty careful to not do ereport(ERROR) directly and
instead return STATUS_ERROR which makes ClientAuthentication do the
FATAL report.  If this doesn't matter, then isn't this whole code overly
complicated for no reason?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG #13854: SSPI authentication failure: wrong realm name used

2016-03-29 Thread Christian Ullrich

* Magnus Hagander wrote:


On Tue, Mar 29, 2016 at 5:09 PM, David Steele  wrote:



It seems like this patch should be set "ready for committer".  Can one of
the reviewers do that if appropriate?


I'll pick it up to do that as well as committing it.


Ah, good news!

I hope it's not coming too late, but I have a final update removing a 
rather pointless palloc() return value check. No changes otherwise.


--
Christian



diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
new file mode 100644
index 3b2935c..f7d7b5a
*** a/doc/src/sgml/client-auth.sgml
--- b/doc/src/sgml/client-auth.sgml
*** omicron bryanh
*** 1097,1102 
--- 1097,1140 
   
  
   
+   compat_realm
+   
+
+ If set to 1, the domain's SAM-compatible name (also known as the
+ NetBIOS name) is used for the include_realm
+ option. This is the default. If set to 0, the true realm name from
+ the Kerberos user principal name is used. Leave this option
+ disabled to maintain compatibility with existing 
+ pg_ident.conf files.
+
+
+ Do not enable this option unless your server runs under a domain
+ account (this includes virtual service accounts on a domain member
+ system) and all clients authenticating through SSPI are also using
+ domain accounts, or authentication will fail.
+
+   
+  
+ 
+  
+   upn_username
+   
+
+ If this option is enabled along with compat_realm,
+ the user name from the Kerberos UPN is used for authentication. If
+ it is disabled (the default), the SAM-compatible user name is used.
+ By default, these two names are identical for new user accounts.
+
+
+ Note that libpq uses the SAM-compatible name if no
+ explicit user name is specified. If you use
+ libpq (e.g. through the ODBC driver), you should
+ leave this option disabled.
+
+   
+  
+ 
+  
map

 
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
new file mode 100644
index 899da71..cedebf1
*** a/src/backend/libpq/auth.c
--- b/src/backend/libpq/auth.c
*** typedef SECURITY_STATUS
*** 155,160 
--- 155,165 
(WINAPI * QUERY_SECURITY_CONTEXT_TOKEN_FN) (

   PCtxtHandle, void **);
  static intpg_SSPI_recvauth(Port *port);
+ static intpg_SSPI_make_upn(char *accountname,
+size_t accountnamesize,
+char *domainname,
+size_t domainnamesize,
+bool 
update_accountname);
  #endif
  
  /*
*** pg_SSPI_recvauth(Port *port)
*** 1265,1270 
--- 1270,1284 
  
free(tokenuser);
  
+   if (!port->hba->compat_realm)
+   {
+   int status = pg_SSPI_make_upn(accountname, sizeof(accountname),
+ 
domainname, sizeof(domainname),
+ 
port->hba->upn_username);
+   if (status != STATUS_OK)
+   return status;
+   }
+ 
/*
 * Compare realm/domain if requested. In SSPI, always compare case
 * insensitive.
*** pg_SSPI_recvauth(Port *port)
*** 1300,1305 
--- 1314,1412 
else
return check_usermap(port->hba->usermap, port->user_name, 
accountname, true);
  }
+ 
+ /*
+  * Replaces the domainname with the Kerberos realm name,
+  * and optionally the accountname with the Kerberos user name.
+  */
+ static intpg_SSPI_make_upn(char *accountname,
+size_t accountnamesize,
+char *domainname,
+size_t domainnamesize,
+bool 
update_accountname)
+ {
+   char *samname;
+   char *upname = NULL;
+   char *p = NULL;
+   ULONG upnamesize = 0;
+   size_t upnamerealmsize;
+   BOOLEAN res;
+ 
+   /*
+* Build SAM name (DOMAIN\\user), then translate to UPN
+* (user@kerberos.realm). The realm name is returned in
+* lower case, but that is fine because in SSPI auth,
+* string comparisons are always case-insensitive.
+*/
+ 
+   samname = psprintf("%s\\%s", domainname, accountname);
+   res = TranslateName(samname, NameSamCompatible, NameUserPrincipal,
+ 

Re: [HACKERS] [PATCH] Supporting +-Infinity values by to_timestamp(float8)

2016-03-29 Thread Tom Lane
Anastasia Lubennikova  writes:
> 17.03.2016 06:27, Vitaly Burovoy:
>> Please find attached a new version of the patch.

> I think, I should write something as a reviewer.
> I read the patch again and I don't see any issues with it.
> It applies to the master and works as expected. So I'll mark it as 
> "Ready for committer"

Pushed with minor adjustments.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Remove TZ entry from postgres CLI doc page.

2016-03-29 Thread Matthew Somerville
Hi,

Please find attached a patch to the postgres command line
documentation page doc/src/sgml/ref/postgres-ref.sgml that removes the
"TZ" entry from the "Environment" section. If I've understood it
correctly, since ca4af308 TZ can be looked at when you run initdb, but
is not looked at when the server is started.

I am using Test::PostgreSQL to create a test database; it empties the
postgresql.conf created by initdb and is then therefore not using the
correct timezone, and it took me a while to work out what was
happening and why I couldn't use TZ when starting the database.

ATB,
Matthew


0001-Remove-TZ-entry-from-postgres-CLI-doc-page.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incorrect docs for pgbench / skipped transactions

2016-03-29 Thread Fabien COELHO




Please split the patch into one part for backporting and one part for
master-only and post both patches, clearly indicating which is which.


Attached are the full patch for head and the backport part (the patch name
ends with "backport") separated.


That's not really what I wanted; the full page includes the stuff for
back-porting, whereas I wanted two independent patches.


Indeed, I misunderstood the requirement.


I wordsmithed your version of Tomas's patch and pushed that to 9.5 and
master, and I pushed the first hunk of your full patch to master also.
I think the last hunk of that is overkill, so I did not push that.


Ok.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v8] GSSAPI encryption support

2016-03-29 Thread Robbie Harwood
David Steele  writes:

> On 3/20/16 12:09 AM, Robbie Harwood wrote:
>
>> A new version of my GSSAPI encryption patchset is available
>
> Here's a more thorough review:

Thanks for the review!  To keep this a manageable size, I'm going to
trim pretty heavily.  If I haven't replied to something, please
interpret it to mean that I think it's a good suggestion and will
fix/change in v9.

> +++ b/doc/src/sgml/runtime.sgml
>
> I think you mean postgresql.conf?

Sorry, what does this review comment go with?

> +++ b/src/backend/libpq/auth.c
>
> +  * In most cases, we do not need to send AUTH_REQ_OK until we are ready
> +  * for queries, but if we are doing GSSAPI encryption that request must 
> go
> +  * out now.
>
> Why?

Because otherwise we will send the connection spew (connection
parameters and such) unencrypted, since it will get grouped with the
AUTH_REQ_OK packet.  I'll note this in the comment.

> + ret = be_gssapi_should_crypto(port);
>
> Add LF here.
>
>
> + port->gss->writebuf.cursor += ret;
>
> And here.
>
> + /* need to be called again */
>
> And here.  Well, you get the idea.

Sorry, what is LF?  ASCII newline?

> * PATCH 3 - GSSAPI authentication cleanup
>
> +++ b/src/backend/libpq/auth.c
>
> +  * GSS_C_REPLAY_FLAG and GSS_C_SEQUENCE_FLAG are missing for 
> compatability
> +  * with older clients and should be added in as soon as possible.
>
> Please elaborate here.  Why should they be added and what functionality 
> is missing before they are.

It's request reply detection and out-of-sequence detection.  We can't
require them because old clients don't request them, and so would be
unable to connect.  (There's some history of this in the last couple
versions I've posted, but it's not really interesting beyond "it doesn't
work".)  I will clarify this comment.

> +++ b/src/backend/libpq/be-gssapi-common.c
>
> -#if defined(WIN32) && !defined(WIN32_ONLY_COMPILER)
> -/*
> - * MIT Kerberos GSSAPI DLL doesn't properly export the symbols for MingW
> - * that contain the OIDs required. Redefine here, values copied
> - * from src/athena/auth/krb5/src/lib/gssapi/generic/gssapi_generic.c
> - */
> -static const gss_OID_desc GSS_C_NT_USER_NAME_desc =
> -{10, (void *) "\x2a\x86\x48\x86\xf7\x12\x01\x02\x01\x02"};
> -static GSS_DLLIMP gss_OID GSS_C_NT_USER_NAME = _C_NT_USER_NAME_desc;
> -#endif
>
> Can you explain why it's OK to remove this now?  I can see you've 
> replaced it in gss_init_sec_context() with GSS_C_MUTUAL_FLAG.  Have you 
> tested this on Win32?

This comment is old enough that it references sources from Athena.  If
this is in fact a current krb5 bug (which I doubt, since MIT tests
windows rather heavily), then it should be filed against krb5 instead of
kludged around here.  I do not however have win32 machines to test this
with.  (GSS_C_MUTUAL_FLAG is unrelated; it just requests that the client
and server are both authenticated to each other.)

Thanks,
--Robbie


signature.asc
Description: PGP signature


Re: [HACKERS] Sequence Access Method WIP

2016-03-29 Thread Petr Jelinek

On 29/03/16 22:08, Fabrízio de Royes Mello wrote:



On Tue, Mar 29, 2016 at 4:59 PM, Petr Jelinek > wrote:
 >
 > On 29/03/16 19:46, Fabrízio de Royes Mello wrotez
 >>
 >>
 >>  >
 >>  > Hmm I am unable to reproduce this. What OS? Any special configure
 >> flags you use?
 >>  >
 >>
 >> In my environment the error remains with your last patches.
 >>
 >> I didn't use any special.
 >>
 >> ./configure --prefix=/home/fabrizio/pgsql --enable-cassert
 >> --enable-coverage --enable-tap-tests --enable-depend
 >> make -s -j8 install
 >> make check-world
 >>
 >>
 >> My environment:
 >>
 >> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
 >> $ uname -a
 >> Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC
 >> 2016 x86_64 x86_64 x86_64 GNU/Linux
 >>
 >> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
 >> $ gcc --version
 >> gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
 >> Copyright (C) 2013 Free Software Foundation, Inc.
 >> This is free software; see the source for copying conditions.  There
is NO
 >> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
 >>
 >
 > Hmm nothing special indeed, still can't reproduce, I did one blind
try for a fix. Can you test with attached?
 >

Same error... Now I'll leave in a trip but when I arrive I'll try to
figure out what happening debugging the code.



Okay, thanks.

--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alter or rename enum value

2016-03-29 Thread Andrew Dunstan



On 03/27/2016 10:20 AM, Tom Lane wrote:

Andrew Dunstan  writes:

The more I think about this the more I bump up against the fact that
almost anything we do might want to do to ameliorate the situation is
going to be rolled back. The only approach I can think of that doesn't
suffer from this is to abort if an insert/update will affect an index on
a modified enum. i.e. we prevent the possible corruption from happening
in the first place, as we do now, but in a much more fine grained way.

Perhaps, instead of forbidding ALTER ENUM ADD in a transaction, we could
allow that, but not allow the new value to be *used* until it's committed?
This could be checked cheaply during enum value lookup (ie, is xmin of the
pg_enum row committed).

What you really need is to prevent the new value from being inserted
into any indexes, but checking that directly seems far more difficult,
ugly, and expensive than the above.

I do not know whether this would be a meaningful improvement for
common use-cases, though.  (It'd help if people were more specific
about the use-cases they need to work.)





I think this is a pretty promising approach, certainly well worth 
putting some resources into investigating. One thing I like about it is 
that it gives a nice cheap negative test, so we know if the xmin is 
committed we are safe. So we could start by rejecting anything where 
it's not, but later might adopt a more refined but expensive tests for 
cases where it isn't committed without imposing a penalty on anything else.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench stats per script & other stuff

2016-03-29 Thread Fabien COELHO



Indeed. The documentation is manually edited when submitting changes so as
to minimize diffs, but then it does not correspond anymore to any actual
output, so it is easy to do it wrong.


Well, you fixed the "latency stddev" line to the sample output too, but
in my trial run that line was not displayed, only the latency average.
What are the command line args that supposedly produced this output?
Maybe we should add it as a SGML comment, or even display it to the
doc's reader.


Good point.

The test above shows the stats if there was -P , -L & --rate, because 
under these conditions the necessary data was collected, so they can be 
computed. Thus the output in the documentation assumes that one of these 
was used. I nearly always use "-P 1".


Note that the documentation is not really precise, "will look similar to", 
so there is no commitment.


If you feel like removing the stddev line from the doc because it is not 
there with usual options, fine with me.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] raw output from copy

2016-03-29 Thread Tom Lane
Andrew Dunstan  writes:
> The I would suggest we try to invent something for psql which does help 
> with it. I just don't see this as an SQL problem.

There's certainly a lot to be said for that approach.  I'm still not
convinced that we can make COPY do this without creating compatibility
issues, regardless of the details; and it doesn't seem like a big
enough problem to be worth taking any risks of that sort.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Queries and PostGIS

2016-03-29 Thread Paul Ramsey
On Tue, Mar 29, 2016 at 1:14 PM, Robert Haas  wrote:
> On Tue, Mar 29, 2016 at 3:48 PM, Paul Ramsey  
> wrote:
>>> I have no idea why the cost adjustments that you need are different
>>> for the scan case and the aggregate case.  That does seem problematic,
>>> but I just don't know why it's happening.
>>
>> What might be a good way to debug it? Is there a piece of code I can
>> look at to try and figure out the contribution of COST in either case?
>
> Well, the cost calculations are mostly in costsize.c, but I dunno how
> much that helps.  Maybe it would help if you posted some EXPLAIN
> ANALYZE output for the different cases, with and without parallelism?
>
> One thing I noticed about this output (from your blog)...
>
> Finalize Aggregate
>  (cost=16536.53..16536.79 rows=1 width=8)
>  (actual time=2263.638..2263.639 rows=1 loops=1)
>->  Gather
>(cost=16461.22..16461.53 rows=3 width=32)
>(actual time=754.309..757.204 rows=4 loops=1)
>  Number of Workers: 3
>  ->  Partial Aggregate
>  (cost=15461.22..15461.23 rows=1 width=32)
>  (actual time=676.738..676.739 rows=1 loops=4)
>->  Parallel Seq Scan on pd
>(cost=0.00..13856.38 rows=64 width=2311)
>(actual time=3.009..27.321 rows=42 loops=4)
>  Filter: (fed_num = 47005)
>  Rows Removed by Filter: 17341
>  Planning time: 0.219 ms
>  Execution time: 2264.684 ms
>
> ...is that the finalize aggregate phase is estimated to be very cheap,
> but it's actually wicked expensive.  We get the results from the
> workers in only 750 ms, but it takes another second and a half to
> aggregate those 4 rows???

This is probably a vivid example of the bad behaviour of the naive
union approach. If we have worker states 1,2,3,4 and we go

combine(combine(combine(1,2),3),4)

Then we get kind of a worst case complexity situation where we three
times union an increasingly complex object on the left with a simpler
object on the right. Also, if the objects went into the transfer
functions in relatively non-spatially correlated order, the polygons
coming out of the transfer functions could be quite complex, and each
merge would only add complexity to the output until the final merge
which melts away all the remaining internal boundaries.

I'm surprised it's quite so awful at the end though, and less awful in
the worker stage... how do the workers end up getting rows to work on?
1,2,3,4,1,2,3,4,1,2,3,4? or 1,1,1,2,2,2,3,3,3,4,4,4? The former could
result in optimally inefficient unions, given a spatially correlated
input (surprisingly common in load-once GIS tables)

P.

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Queries and PostGIS

2016-03-29 Thread Robert Haas
On Tue, Mar 29, 2016 at 3:48 PM, Paul Ramsey  wrote:
>> I have no idea why the cost adjustments that you need are different
>> for the scan case and the aggregate case.  That does seem problematic,
>> but I just don't know why it's happening.
>
> What might be a good way to debug it? Is there a piece of code I can
> look at to try and figure out the contribution of COST in either case?

Well, the cost calculations are mostly in costsize.c, but I dunno how
much that helps.  Maybe it would help if you posted some EXPLAIN
ANALYZE output for the different cases, with and without parallelism?

One thing I noticed about this output (from your blog)...

Finalize Aggregate
 (cost=16536.53..16536.79 rows=1 width=8)
 (actual time=2263.638..2263.639 rows=1 loops=1)
   ->  Gather
   (cost=16461.22..16461.53 rows=3 width=32)
   (actual time=754.309..757.204 rows=4 loops=1)
 Number of Workers: 3
 ->  Partial Aggregate
 (cost=15461.22..15461.23 rows=1 width=32)
 (actual time=676.738..676.739 rows=1 loops=4)
   ->  Parallel Seq Scan on pd
   (cost=0.00..13856.38 rows=64 width=2311)
   (actual time=3.009..27.321 rows=42 loops=4)
 Filter: (fed_num = 47005)
 Rows Removed by Filter: 17341
 Planning time: 0.219 ms
 Execution time: 2264.684 ms

...is that the finalize aggregate phase is estimated to be very cheap,
but it's actually wicked expensive.  We get the results from the
workers in only 750 ms, but it takes another second and a half to
aggregate those 4 rows???

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-03-29 Thread Fabrízio de Royes Mello
On Tue, Mar 29, 2016 at 4:59 PM, Petr Jelinek  wrote:
>
> On 29/03/16 19:46, Fabrízio de Royes Mello wrotez
>>
>>
>>  >
>>  > Hmm I am unable to reproduce this. What OS? Any special configure
>> flags you use?
>>  >
>>
>> In my environment the error remains with your last patches.
>>
>> I didn't use any special.
>>
>> ./configure --prefix=/home/fabrizio/pgsql --enable-cassert
>> --enable-coverage --enable-tap-tests --enable-depend
>> make -s -j8 install
>> make check-world
>>
>>
>> My environment:
>>
>> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>> $ uname -a
>> Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC
>> 2016 x86_64 x86_64 x86_64 GNU/Linux
>>
>> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>> $ gcc --version
>> gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
>> Copyright (C) 2013 Free Software Foundation, Inc.
>> This is free software; see the source for copying conditions.  There is
NO
>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
>>
>
> Hmm nothing special indeed, still can't reproduce, I did one blind try
for a fix. Can you test with attached?
>

Same error... Now I'll leave in a trip but when I arrive I'll try to figure
out what happening debugging the code.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] Sequence Access Method WIP

2016-03-29 Thread Petr Jelinek

On 29/03/16 19:46, Fabrízio de Royes Mello wrotez


 >
 > Hmm I am unable to reproduce this. What OS? Any special configure
flags you use?
 >

In my environment the error remains with your last patches.

I didn't use any special.

./configure --prefix=/home/fabrizio/pgsql --enable-cassert
--enable-coverage --enable-tap-tests --enable-depend
make -s -j8 install
make check-world


My environment:

fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
$ uname -a
Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux

fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
$ gcc --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.



Hmm nothing special indeed, still can't reproduce, I did one blind try 
for a fix. Can you test with attached?


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
>From 7f42f2b420e2b931e1ca013f3fdeaccf302f3618 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Fri, 4 Mar 2016 15:03:44 +0100
Subject: [PATCH 2/2] gapless seq

---
 contrib/Makefile |   1 +
 contrib/gapless_seq/Makefile |  63 
 contrib/gapless_seq/expected/concurrency.out |  31 ++
 contrib/gapless_seq/expected/gapless_seq.out | 145 
 contrib/gapless_seq/gapless_seq--1.0.sql |  57 +++
 contrib/gapless_seq/gapless_seq.c| 530 +++
 contrib/gapless_seq/gapless_seq.control  |   6 +
 contrib/gapless_seq/specs/concurrency.spec   |  29 ++
 contrib/gapless_seq/sql/gapless_seq.sql  |  61 +++
 doc/src/sgml/contrib.sgml|   1 +
 doc/src/sgml/filelist.sgml   |   1 +
 doc/src/sgml/gapless-seq.sgml|  24 ++
 12 files changed, 949 insertions(+)
 create mode 100644 contrib/gapless_seq/Makefile
 create mode 100644 contrib/gapless_seq/expected/concurrency.out
 create mode 100644 contrib/gapless_seq/expected/gapless_seq.out
 create mode 100644 contrib/gapless_seq/gapless_seq--1.0.sql
 create mode 100644 contrib/gapless_seq/gapless_seq.c
 create mode 100644 contrib/gapless_seq/gapless_seq.control
 create mode 100644 contrib/gapless_seq/specs/concurrency.spec
 create mode 100644 contrib/gapless_seq/sql/gapless_seq.sql
 create mode 100644 doc/src/sgml/gapless-seq.sgml

diff --git a/contrib/Makefile b/contrib/Makefile
index d12dd63..d2a0620 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -19,6 +19,7 @@ SUBDIRS = \
 		earthdistance	\
 		file_fdw	\
 		fuzzystrmatch	\
+		gapless_seq	\
 		hstore		\
 		intagg		\
 		intarray	\
diff --git a/contrib/gapless_seq/Makefile b/contrib/gapless_seq/Makefile
new file mode 100644
index 000..9378b93
--- /dev/null
+++ b/contrib/gapless_seq/Makefile
@@ -0,0 +1,63 @@
+# contrib/gapless_seq/Makefile
+
+MODULE_big = gapless_seq
+OBJS = gapless_seq.o
+PG_CPPFLAGS = -I$(libpq_srcdir)
+
+EXTENSION = gapless_seq
+DATA = gapless_seq--1.0.sql
+
+EXTRA_CLEAN = $(pg_regress_clean_files) ./regression_output ./isolation_output
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/gapless_seq
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+check: regresscheck isolationcheck
+
+submake-regress:
+	$(MAKE) -C $(top_builddir)/src/test/regress all
+
+submake-isolation:
+	$(MAKE) -C $(top_builddir)/src/test/isolation all
+
+submake-gapless_seq:
+	$(MAKE) -C $(top_builddir)/contrib/gapless_seq
+
+REGRESSCHECKS=gapless_seq
+
+regresscheck: all | submake-regress submake-gapless_seq temp-install
+	$(MKDIR_P) regression_output
+	$(pg_regress_check) \
+	--temp-instance=./tmp_check \
+	--outputdir=./regression_output \
+	$(REGRESSCHECKS)
+
+regresscheck-install-force: | submake-regress submake-gapless_seq temp-install
+	$(pg_regress_installcheck) \
+	$(REGRESSCHECKS)
+
+ISOLATIONCHECKS=concurrency
+
+isolationcheck: all | submake-isolation submake-gapless_seq temp-install
+	$(MKDIR_P) isolation_output
+	$(pg_isolation_regress_check) \
+	--outputdir=./isolation_output \
+	$(ISOLATIONCHECKS)
+
+isolationcheck-install-force: all | submake-isolation submake-gapless_seq temp-install
+	$(pg_isolation_regress_installcheck) \
+	$(ISOLATIONCHECKS)
+
+PHONY: submake-gapless_seq submake-regress check \
+	regresscheck regresscheck-install-force \
+	isolationcheck isolationcheck-install-force
+
+temp-install: EXTRA_INSTALL=contrib/gapless_seq
diff --git a/contrib/gapless_seq/expected/concurrency.out b/contrib/gapless_seq/expected/concurrency.out
new file mode 100644
index 000..ec6a098
--- /dev/null
+++ b/contrib/gapless_seq/expected/concurrency.out
@@ -0,0 +1,31 @@
+Parsed 

Re: [HACKERS] raw output from copy

2016-03-29 Thread Andrew Dunstan



On 03/28/2016 11:18 PM, Pavel Stehule wrote:




Anyway this is certainly not committable as-is, so I'm setting
it back
to Waiting on Author.  But the fact that both libpq and ecpg
would need
updates makes me question whether we can safely pretend that
this isn't
a protocol break.




In that case I humbly submit that there is a case for reviving the
psql patch I posted that kicked off this whole thing and lets you
export a piece of binary data from psql quite easily. It should
certainly not involve any protocol break.


The psql only solution can work only for output. Doesn't help with input.





The I would suggest we try to invent something for psql which does help 
with it. I just don't see this as an SQL problem. Pretty much any driver 
library will have no difficulty in handling binary input and output. 
It's only psql that has an issue, ISTM, and therefore I believe that's 
where the fix should go. What else is going to use this? As an SQL 
change this seems like a solution in search of a problem. If someone can 
make a good case that this is going to be of general use I'll happily go 
along, but I haven't seen one so far.


cheers

andrdew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Queries and PostGIS

2016-03-29 Thread Paul Ramsey
On Tue, Mar 29, 2016 at 12:48 PM, Paul Ramsey  wrote:

>> On the join case, I wonder if it's possible that _st_intersects is not
>> marked parallel-safe?  If that's not the problem, I don't have a
>> second guess, but the thing to do would be to figure out whether
>> consider_parallel is false for the RelOptInfo corresponding to either
>> of pd and pts, or whether it's true for both but false for the
>> joinrel's RelOptInfo, or whether it's true for all three of them but
>> you don't get the desired path anyway.
>
> _st_intersects is definitely marked parallel safe, and in fact will
> generate a parallel plan if used alone (without the operator though,
> it's impossibly slow). It's the && operator that is the issue... and I
> just noticed that the PROCEDURE bound to the && operator
> (geometry_overlaps) is *not* marked parallel safe: could be the
> problem?

Asked and answered: marking the geometry_overlaps as parallel safe
gets me a parallel plan! Now to play with costs and see how it behaves
when force_parallel_mode is not set.

P.

>
> Thanks,
>
> P


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Queries and PostGIS

2016-03-29 Thread Paul Ramsey
> First, I beg to differ with this statement: "Some of the execution
> results output are wrong! "  The point is that
> line has loops=4, so as in any other case where loops>1, you're seeing
> the number of rows divided by the number of loops.  It is the
> *average* number of rows that were processed by each loop - one loop
> per worker, in this case.

Thanks for the explanation, let my reaction be a guide to what the
other unwashed will think :)

> Now, on to your actual question:
>
> I have no idea why the cost adjustments that you need are different
> for the scan case and the aggregate case.  That does seem problematic,
> but I just don't know why it's happening.

What might be a good way to debug it? Is there a piece of code I can
look at to try and figure out the contribution of COST in either case?

> On the join case, I wonder if it's possible that _st_intersects is not
> marked parallel-safe?  If that's not the problem, I don't have a
> second guess, but the thing to do would be to figure out whether
> consider_parallel is false for the RelOptInfo corresponding to either
> of pd and pts, or whether it's true for both but false for the
> joinrel's RelOptInfo, or whether it's true for all three of them but
> you don't get the desired path anyway.

_st_intersects is definitely marked parallel safe, and in fact will
generate a parallel plan if used alone (without the operator though,
it's impossibly slow). It's the && operator that is the issue... and I
just noticed that the PROCEDURE bound to the && operator
(geometry_overlaps) is *not* marked parallel safe: could be the
problem?

Thanks,

P


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2016-03-29 Thread Peter Geoghegan
On Tue, Mar 29, 2016 at 9:11 AM, Robert Haas  wrote:
> One test that kind of bothers me in particular is the "SELECT DISTINCT
> a FROM numeric_test ORDER BY a" test on the high_cardinality_random
> data set.  That's a wash at most work_mem values, but at 32MB it's
> more than 3x slower.  That's very strange, and there are a number of
> other results like that, where one particular work_mem value triggers
> a large regression.  That's worrying.

That case is totally invalid as a benchmark for this patch. Here is
the query plan I get (doesn't matter if I run analyze) when I follow
Tomas' high_cardinality_random 10M instructions (including setting
work_mem to 32MB):

postgres=# explain analyze select distinct a from numeric_test order by a;
  QUERY
PLAN
───
 Sort  (cost=268895.39..270373.10 rows=591082 width=8) (actual
time=3907.917..4086.174 rows=999879 loops=1)
   Sort Key: a
   Sort Method: external merge  Disk: 18536kB
   ->  HashAggregate  (cost=206320.50..212231.32 rows=591082 width=8)
(actual time=3109.619..3387.599 rows=999879 loops=1)
 Group Key: a
 ->  Seq Scan on numeric_test  (cost=0.00..175844.40
rows=12190440 width=8) (actual time=0.025..601.295 rows=1000
loops=1)
 Planning time: 0.088 ms
 Execution time: 4120.656 ms
(8 rows)

Does that seem like a fair test of this patch?

I must also point out an inexplicable differences between the i5 and
Xeon in relation to this query. It took about took 10% less time on
the patched Xeon 10M case, not ~200% more (line 53 of the summary page
in each 10M case). So even if this case did exercise the patch well,
it's far from clear that it has even been regressed at all. It's far
easier to imagine that there was some problem with the i5 tests.

A complete do-over from Tomas would be best, here. He has already
acknowledged that the i5 CREATE INDEX results were completely invalid.
Pending a do-over from Tomas, I recommend ignoring the i5 tests
completely. Also, I should once again point out that many of the
work_mem cases actually had internal sorts at the high end, so once
the code in the patches simply wasn't exercised at all at the high end
(the 1024MB cases, where the numbers might be expected to get really
good).

If there is ever a regression, it is only really sensible to talk
about it while looking at trace_sort output (and, I guess, the query
plan). I've asked Tomas for trace_sort output in all relevant cases.
There is no point in "flying blind" and speculating what the problem
was from a distance.

> Also, it's pretty clear that the patch has more large wins than it
> does large losses, but it seems pretty easy to imagine people who
> haven't tuned any GUCs writing in to say that 9.6 is way slower on
> their workload, because those people are going to be at work_mem=4MB,
> maintenance_work_mem=64MB.  At those numbers, if Tomas's data is
> representative, it's not hard to imagine that the number of people who
> see a significant regression might be quite a bit larger than the
> number who see a significant speedup.

I don't think they are representative. Greg Stark characterized the
regressions as being fairly limited, mostly at the very low end. And
that was *before* all the memory fragmentation stuff made that better.
I haven't done any analysis of how much better that made the problem
*across the board* yet, but for int4 cases I could make 1MB work_mem
queries faster with gigabytes of data on my laptop. I believe I tested
various datum sort cases there, like "select count(distinct(foo)) from
bar"; those are a very pure test of the patch.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Queries and PostGIS

2016-03-29 Thread Robert Haas
On Mon, Mar 28, 2016 at 12:18 PM, Paul Ramsey  wrote:
> I spent some time over the weekend trying out the different modes of
> parallel query (seq scan, aggregate, join) in combination with PostGIS
> and have written up the results here:
>
> http://blog.cleverelephant.ca/2016/03/parallel-postgis.html
>
> The TL:DR; is basically
>
> * With some adjustments to function COST both parallel sequence scan
> and parallel aggregation deliver very good parallel performance
> results.
> * The cost adjustments for sequence scan and aggregate scan are not
> consistent in magnitude.
> * Parallel join does not seem to work for PostGIS indexes yet, but
> perhaps there is some magic to learn from PostgreSQL core on that.
>
> The two findings at the end are ones that need input from parallel
> query masters...
>
> We recognize we'll have to adjust costs to that our particular use
> case (very CPU-intensive calculation per function) is planned better,
> but it seems like different query modes are interpreting costs in
> order-of-magnitude different ways in building plans.
>
> Parallel join would be a huge win, so some help/pointers on figuring
> out why it's not coming into play when our gist operators are in
> effect would be helpful.

First, I beg to differ with this statement: "Some of the execution
results output are wrong! They say that only 1844 rows were removed by
the filter, but in fact 7376 were (as we can confirm by running the
queries without the EXPLAIN ANALYZE). This is a known limitation,
reporting on the results of only one parallel worker, which (should)
maybe, hopefully be fixed before 9.6 comes out."  The point is that
line has loops=4, so as in any other case where loops>1, you're seeing
the number of rows divided by the number of loops.  It is the
*average* number of rows that were processed by each loop - one loop
per worker, in this case.

I am personally of the opinion that showing rowcounts divided by loops
instead of total rowcounts is rather stupid, and that we should change
it regardless.  But it's not parallel query's fault, and changing it
would affect the output of every EXPLAIN ANALYZE involving a nested
loop, probably confusing a lot of people until they figured out what
we'd changed, after which - I *think* they'd realize that they
actually liked the new way much better.

Now, on to your actual question:

I have no idea why the cost adjustments that you need are different
for the scan case and the aggregate case.  That does seem problematic,
but I just don't know why it's happening.

On the join case, I wonder if it's possible that _st_intersects is not
marked parallel-safe?  If that's not the problem, I don't have a
second guess, but the thing to do would be to figure out whether
consider_parallel is false for the RelOptInfo corresponding to either
of pd and pts, or whether it's true for both but false for the
joinrel's RelOptInfo, or whether it's true for all three of them but
you don't get the desired path anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] raw output from copy

2016-03-29 Thread Pavel Stehule
2016-03-29 20:59 GMT+02:00 Tom Lane :

> Pavel Stehule  writes:
> > I am writing few lines as summary:
>
> > 1. invention RAW_TEXT and RAW_BINARY
> > 2. for RAW_BINARY: PQbinaryTuples() returns 1 and PQfformat() returns 1
> > 3.a for RAW_TEXT: PQbinaryTuples() returns 0 and PQfformat() returns 0,
> but
> > the client should to check PQcopyFormat() to not print "\n" on the end
> > 3.b for RAW_TEXT: PQbinaryTuples() returns 1 and PQfformat() returns 1,
> but
> > used output function, not necessary client modification
> > 4. PQcopyFormat() returns 0 for text, 1 for binary, 2 for RAW_TEXT, 3 for
> > RAW_BINARY
> > 5. create tests for ecpg
>
> 3.b certainly seems completely wrong.  PQfformat==1 would imply binary
> data.
>
> I suggest that PQcopyFormat should be understood as defining the format
> of the copy data encapsulation, not the individual fields.  So it would go
> like 0 = traditional text format, 1 = traditional binary format, 2 = raw
> (no encapsulation).  You'd need to also look at PQfformat to distinguish
> raw text from raw binary.  But if we do it as you suggest above, we've
> locked ourselves into only ever having two field format codes, which
> is something the existing design is specifically intended to allow
> expansion in.
>

I have a less courage than you :). The original design worked with almost
clients without changes on client side. New design has lot of combinations,
that are unknown for old clients. It can be better, because the client
authors will do update faster.

If PQfformat will returns 0 = text, 1 = traditional binary, 2 = raw text, 3
= raw binary - like you propose, then PQcopyFormat is useless. I see all
information just from PQfformat.

Regards

Pavel


>
> regards, tom lane
>


Re: [HACKERS] pgbench stats per script & other stuff

2016-03-29 Thread Alvaro Herrera
Fabien COELHO wrote:
> 
> Hello,
> 
> >>In doing this, I noticed that the latency output is wrong if you use -T
> >>instead of -t; it always says the latency is zero because "duration" is
> >>zero.  I suppose it should be like in the attached instead.
> 
> Indeed, I clearly overlooked option -t (transactions) which I never use.

Makes sense.

> >Patch actually attached here.
> 
> Tested. There is a small issue because the \n is missing.
> 
> Here is another version which just replaces duration by time_include,
> as they should be pretty close, and fixes the style so that it is the same
> whether the detailed stats are collected or not, as you pointed out.

Thanks, that makes sense.

> >>At the same time, it says "latency average: XYZ" instead of "latency
> >>average = XYZ" as in printSimpleStats, which doesn't look terribly
> >>important.  But the line appears in the SGML docs.
> 
> Indeed. The documentation is manually edited when submitting changes so as
> to minimize diffs, but then it does not correspond anymore to any actual
> output, so it is easy to do it wrong.

Well, you fixed the "latency stddev" line to the sample output too, but
in my trial run that line was not displayed, only the latency average.
What are the command line args that supposedly produced this output?
Maybe we should add it as a SGML comment, or even display it to the
doc's reader.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench stats per script & other stuff

2016-03-29 Thread Fabien COELHO


Hello,


In doing this, I noticed that the latency output is wrong if you use -T
instead of -t; it always says the latency is zero because "duration" is
zero.  I suppose it should be like in the attached instead.


Indeed, I clearly overlooked option -t (transactions) which I never use.


Patch actually attached here.


Tested. There is a small issue because the \n is missing.

Here is another version which just replaces duration by time_include,
as they should be pretty close, and fixes the style so that it is the same 
whether the detailed stats are collected or not, as you pointed out.


At the same time, it says "latency average: XYZ" instead of "latency 
average = XYZ" as in printSimpleStats, which doesn't look terribly 
important.  But the line appears in the SGML docs.


Indeed. The documentation is manually edited when submitting changes so as 
to minimize diffs, but then it does not correspond anymore to any actual 
output, so it is easy to do it wrong.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 06cd5db..297af4f 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1253,8 +1253,8 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 1/1
-latency average = 15.844 ms
-latency stddev = 2.715 ms
+latency average: 15.844 ms
+latency stddev: 2.715 ms
 tps = 618.764555 (including connections establishing)
 tps = 622.977698 (excluding connections establishing)
 script statistics:
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 52d1223..d5380d2 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3099,8 +3099,8 @@ printSimpleStats(char *prefix, SimpleStats *ss)
 	double		latency = ss->sum / ss->count;
 	double		stddev = sqrt(ss->sum2 / ss->count - latency * latency);
 
-	printf("%s average = %.3f ms\n", prefix, 0.001 * latency);
-	printf("%s stddev = %.3f ms\n", prefix, 0.001 * stddev);
+	printf("%s average: %.3f ms\n", prefix, 0.001 * latency);
+	printf("%s stddev: %.3f ms\n", prefix, 0.001 * stddev);
 }
 
 /* print out results */
@@ -3155,7 +3155,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	else
 		/* only an average latency computed from the duration is available */
 		printf("latency average: %.3f ms\n",
-			   1000.0 * duration * nclients / total->cnt);
+			   1000.0 * time_include * nclients / total->cnt);
 
 	if (throttle_delay)
 	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Queries and PostGIS

2016-03-29 Thread Paul Ramsey
On Mon, Mar 28, 2016 at 9:18 AM, Paul Ramsey  wrote:

> Parallel join would be a huge win, so some help/pointers on figuring
> out why it's not coming into play when our gist operators are in
> effect would be helpful.

Robert, do you have any pointers on what I should look for to figure
out why the parallel join code doesn't fire if I add a GIST operator
to my join condition?

Thanks,

P


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] raw output from copy

2016-03-29 Thread Tom Lane
Pavel Stehule  writes:
> I am writing few lines as summary:

> 1. invention RAW_TEXT and RAW_BINARY
> 2. for RAW_BINARY: PQbinaryTuples() returns 1 and PQfformat() returns 1
> 3.a for RAW_TEXT: PQbinaryTuples() returns 0 and PQfformat() returns 0, but
> the client should to check PQcopyFormat() to not print "\n" on the end
> 3.b for RAW_TEXT: PQbinaryTuples() returns 1 and PQfformat() returns 1, but
> used output function, not necessary client modification
> 4. PQcopyFormat() returns 0 for text, 1 for binary, 2 for RAW_TEXT, 3 for
> RAW_BINARY
> 5. create tests for ecpg

3.b certainly seems completely wrong.  PQfformat==1 would imply binary
data.

I suggest that PQcopyFormat should be understood as defining the format
of the copy data encapsulation, not the individual fields.  So it would go
like 0 = traditional text format, 1 = traditional binary format, 2 = raw
(no encapsulation).  You'd need to also look at PQfformat to distinguish
raw text from raw binary.  But if we do it as you suggest above, we've
locked ourselves into only ever having two field format codes, which
is something the existing design is specifically intended to allow
expansion in.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: index-only scans with partial indexes

2016-03-29 Thread Kevin Grittner
I tried to whip this into shape, but there were a few areas I
didn't feel I had the necessary understanding to feel comfortable
taking on the committer role for it.  I've cleaned it up the best I
could, fixing whitespace and typos, eliminating an unnecessary
addition of an include, improving C comments (at least to my eye),
and adding an Assert where it seemed like a good idea to me.

My own tests and those of others show performance improvements (up
to 10x for some tests) and no significant performance degradation,
even with attempts to create "worst case" scenarios.

The only changes to the regression tests are to change an "out"
file to eliminate re-checking the index predicate against the heap
on every matching row, which seems like a good thing.

I'm taking my name off as committer and marking it "Ready for
Committer".  If someone else wants to comment on the issues where
Tom and Kyotaro-san still seem unsatisfied to the point where I
can get my head around it, I could maybe take it back on as
committer -- if anyone feels that could be a net win.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


partial-index-only-scan-v10.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-29 Thread Andres Freund
On 2016-03-29 14:09:42 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > There's actually lbarx/stbcx - but it's not present in all ISAs. So I
> > guess it's clear where to go.
> 
> Hm.  We could certainly add a configure test to see if the local assembler
> knows these instructions --- but it's not clear that we should depend on
> compile-time environment to match run-time.

I think it'd be easier to continue using lwarx/stwcx, but be careful
about only testing/setting the lowest byte, if we want to go there. But
that then likely would require hints about alignment to the compiler...

i've no experience writing ppc assembly, but it doesn't look too hard.


but i think it's easier to just remove the spinlock from struct lwlock
then - and it also improves the situation for other architectures with
wider spinlocks.  i think that's beneficial for some future things
anyway.

> Googling suggests that these instructions came in with PPC ISA 2.06
> which seems to date to 2010.  So there's undoubtedly still production
> hardware without them.
> 
> In the department of not-production hardware, I checked this on prairiedog
> and got
> /var/tmp/ccbQy9uG.s:1722:Invalid mnemonic 'lbarx'
> /var/tmp/ccbQy9uG.s:1726:Invalid mnemonic 'stbcx.'

Heh. Thanks for testing.


Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-29 Thread Tom Lane
Andres Freund  writes:
> On 2016-03-29 13:24:40 -0400, Tom Lane wrote:
>> AFAICS, lwarx/stwcx are specifically *word* wide.

> There's actually lbarx/stbcx - but it's not present in all ISAs. So I
> guess it's clear where to go.

Hm.  We could certainly add a configure test to see if the local assembler
knows these instructions --- but it's not clear that we should depend on
compile-time environment to match run-time.

Googling suggests that these instructions came in with PPC ISA 2.06
which seems to date to 2010.  So there's undoubtedly still production
hardware without them.

In the department of not-production hardware, I checked this on prairiedog
and got
/var/tmp/ccbQy9uG.s:1722:Invalid mnemonic 'lbarx'
/var/tmp/ccbQy9uG.s:1726:Invalid mnemonic 'stbcx.'

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-03-29 Thread Magnus Hagander
On Tue, Mar 29, 2016 at 6:40 PM, Magnus Hagander 
wrote:

>
>
> On Tue, Mar 29, 2016 at 4:36 PM, David Steele  wrote:
>
>> On 3/22/16 12:31 PM, Magnus Hagander wrote:
>>
>> On Tue, Mar 22, 2016 at 5:27 PM, David Steele >> > wrote:
>>>
>> >
>>
>>> > Adding the stop time column should be a simple addition and I
>>> don't see
>>> > a problem with that. I think I misunderstood your original request
>>> on
>>> > that. Because you are just talking about returning a timestamptz
>>> with
>>> > the "right now" value for when you called pg_stop_backup()? Or to
>>> be
>>> > specific, just before pg_Stop_backup *finished*. Or do you mean
>>> when
>>> > pg_stop_backup() started?
>>>
>>> What would be ideal is the minimum time that could be used for
>>> PITR.  In
>>> an exclusive backup that's the time the end-of-backup record is
>>> written
>>> to WAL.  In a non-exlusive backup I'm not quite sure how that works.
>>>
>>
>> I guess I was hoping that you would know.  I fine with just getting the
>> current timestamp as is currently done in do_pg_stop_backup().  It's not
>> perfect but it will be pretty close.
>>
>> I thought some more about putting STOP_WAL_LOCATION into the backup label
>> and I think this is an important step.  Without that the recovery process
>> needs to use pg_control to determine when the database has reach
>> consistency and that will only work if pg_control was copied last.
>>
>> In summary, I think the patch should be updated to include stop_time as a
>> column and add STOP_WAL_LOCATION and STOP_TIME to backup_label.  The
>> recovery process should read STOP_WAL_LOCATION from backup_label and use
>> that to decide when the database has become consistent.
>>
>
> Is that the only reason we need pg_control to be copied last? I thought
> there were other reasons for that.
>
> I was chatting with Stephen about this earlier, and one thing we
> considered was that maybe we should return pg_control as a bytea field from
> pg_stop_backup() thereby strongly hinting to tools that they should write
> it out from there rather than copy it from the data directory (we can't
> stop them from copying it from the data directory like we do for
> backup_label of course, but if we return the contents and document that's
> how it should be done, it's pretty clear).
>
> But if we can actually remove the whole requirement on the order of the
> copy of pg_control by doing that it certainly seems worthwhile.
>
>
> So - I can definitely see the argument for returning the stop wal
> *location*. But I'm still not sure what the definition of the time would
> be? We can't return it before we know what it means...
>


I had a chat with Heikki, and here's another suggestion:

1. We don't touch the current exclusive backups at all, as previously
discussed, other than deprecating their use. For backwards compat.

2. For new backups, we return the contents of pg_control as a bytea from
pg_stop_backup(). We tell backup programs they are supposed to write this
out as pg_control.backup, *not* as pg_control.

3a. On recovery, if it's an exlcusive backup, we do as we did before.

3b. on recovery, in non-exclusive backups (determined from backup_label),
we check that pg_control.backup exists *and* that pg_control does *not*
exist. That guards us reasonably against backup programs that do the wrong
thing, and we know we get the correct version of pg_control.

4. (we can still add the stop location to the backup_label file in case
backup programs find it useful, but we don't use it in recovery)

Thoughts about this approach?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] incorrect docs for pgbench / skipped transactions

2016-03-29 Thread Robert Haas
On Mon, Mar 21, 2016 at 2:32 PM, Fabien COELHO  wrote:
>>> Ok, I added a reference to the commitfest entry from this wiki page, and
>>> a
>>> note about partial 9.5 backporting.
>>
>> Please split the patch into one part for backporting and one part for
>> master-only and post both patches, clearly indicating which is which.
>
> Attached are the full patch for head and the backport part (the patch name
> ends with "backport") separated.

That's not really what I wanted; the full page includes the stuff for
back-porting, whereas I wanted two independent patches.

I wordsmithed your version of Tomas's patch and pushed that to 9.5 and
master, and I pushed the first hunk of your full patch to master also.
I think the last hunk of that is overkill, so I did not push that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench stats per script & other stuff

2016-03-29 Thread Alvaro Herrera
Alvaro Herrera wrote:

> In doing this, I noticed that the latency output is wrong if you use -T
> instead of -t; it always says the latency is zero because "duration" is
> zero.  I suppose it should be like in the attached instead.  At the same
> time, it says "latency average: XYZ" instead of "latency average = XYZ"
> as in printSimpleStats, which doesn't look terribly important.  But the
> line appears in the SGML docs.

Patch actually attached here.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 52d1223..5cb5906 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3152,10 +3152,13 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 
 	if (throttle_delay || progress || latency_limit)
 		printSimpleStats("latency", >latency);
-	else
+	else if (duration > 0)
 		/* only an average latency computed from the duration is available */
-		printf("latency average: %.3f ms\n",
+		printf("latency average: %.3f ms",
 			   1000.0 * duration * nclients / total->cnt);
+	else
+		printf("latency average: %.3f ms",
+			   1000.0 * time_include * nclients / total->cnt);
 
 	if (throttle_delay)
 	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench stats per script & other stuff

2016-03-29 Thread Alvaro Herrera
Fabien COELHO wrote:
> 
> >- that it does work:-) I'm not sure what happens by the script selection
> >  process, it should be checked carefully because it was not designed
> >  with allowing a zero weight, and it may depend on its/their positions.
> >  It may already work, but it really needs checking.
> 
> Hmmm, it seems ok.

It's not -- if you used -i, it died saying weight is zero.

> >- I would suggest that a warning is shown when a weight is zero,
> >  something like "warning, script #%d weight is zero, will be ignored".
> 
> includes such a warning.

I didn't include this part.

Pushed.


In doing this, I noticed that the latency output is wrong if you use -T
instead of -t; it always says the latency is zero because "duration" is
zero.  I suppose it should be like in the attached instead.  At the same
time, it says "latency average: XYZ" instead of "latency average = XYZ"
as in printSimpleStats, which doesn't look terribly important.  But the
line appears in the SGML docs.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - show weight percent

2016-03-29 Thread Alvaro Herrera
Fabien wrote:
> 
> This minor patch shows the expected drawing percent in multi-script reports,
> next to the relative weight.

Applied together with the zero weight.


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-29 Thread Andres Freund
On 2016-03-29 20:22:00 +0300, Alexander Korotkov wrote:
> > > + while (true)
> > >   {
> > > - if (buf->usage_count == 0)
> > > - buf->usage_count = 1;
> > > + /* spin-wait till lock is free */
> > > + while (state & BM_LOCKED)
> > > + {
> > > + make_spin_delay();
> > > + state = pg_atomic_read_u32(>state);
> > > + oldstate = state;
> > > + }
> >
> > Maybe we should abstract that to pg_atomic_wait_bit_unset_u32()? It
> > seems quite likely we need this in other places (e.g. lwlock.c itself).
> >
> 
> I have some doubts about such function.  At first, I can't find a place for
> it in lwlock.c.  It doesn't run explicit spin delays yet.  Is it related to
> some changes you're planning for lwlock.c.

Yes, see 
http://www.postgresql.org/message-id/20160328130904.4mhugvkf4f3wg...@awork2.anarazel.de


> > >   lsn = XLogSaveBufferForHint(buffer, buffer_std);
> > >   }
> > >
> > > - LockBufHdr(bufHdr);
> > > - Assert(bufHdr->refcount > 0);
> > > - if (!(bufHdr->flags & BM_DIRTY))
> > > + state = LockBufHdr(bufHdr);
> > > +
> > > + Assert(BUF_STATE_GET_REFCOUNT(state) > 0);
> > > +
> > > + if (!(state & BM_DIRTY))
> > >   {
> > >   dirtied = true; /* Means "will be dirtied
> > by this action" */
> >
> > It's again worthwhile to try make this work without taking the lock.
> >
> 
> Comment there claims.
> 
> /*
>  * Set the page LSN if we wrote a backup block. We aren't supposed
>  * to set this when only holding a share lock but as long as we
>  * serialise it somehow we're OK. We choose to set LSN while
>  * holding the buffer header lock, which causes any reader of an
>  * LSN who holds only a share lock to also obtain a buffer header
>  * lock before using PageGetLSN(), which is enforced in
>  * BufferGetLSNAtomic().
> 
> Thus, buffer header lock is used for write serialization.  I don't think it
> would be correct to remove the lock here.

Gah, I forgot about this uglyness. Man, the checksumming commit sure
made a number of things really ugly.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-03-29 Thread Fabrízio de Royes Mello
On Tue, Mar 29, 2016 at 2:26 PM, Petr Jelinek  wrote:
>
> On 29/03/16 18:50, Fabrízio de Royes Mello wrote:
>>
>>
>>
>> On Tue, Mar 29, 2016 at 12:25 PM, David Steele > > wrote:
>>  >
>>  > Hi Petr,
>>  >
>>  > On 3/28/16 3:11 PM, Fabrízio de Royes Mello wrote:
>>  >
>>  >> fabrizio@bagual:~/pgsql
>>  >> $ bin/pg_dump > /tmp/fabrizio.sql
>>  >> pg_dump: [archiver (db)] query failed: ERROR:  column "sequence_name"
>>  >> does not exist
>>  >> LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN
i...
>>  >> ^
>>  >> pg_dump: [archiver (db)] query was: SELECT sequence_name,
start_value,
>>  >> increment_by, CASE WHEN increment_by > 0 AND max_value =
>>  >> 9223372036854775807 THEN NULL  WHEN increment_by < 0 AND
max_value =
>>  >> -1 THEN NULL  ELSE max_value END AS max_value, CASE WHEN
>>  >> increment_by > 0 AND min_value = 1 THEN NULL  WHEN increment_by
< 0
>>  >> AND min_value = -9223372036854775807 THEN NULL  ELSE min_value
END
>>  >> AS min_value, cache_value, is_cycled FROM x
>>  >
>>  >
>>  > It looks like a new patch is needed.  I've marked this "waiting on
>> author".
>>  >
>>
>
> Yeah there were some incompatible commits since my last rebase, fixed,
along with the pg_dump bugs..
>

Now all applies without errors, build and "make check" too.



>> But there are other issue in the gapless-seq extension when I ran "make
>> check":
>>
>>   43   CREATE EXTENSION gapless_seq;
>>   44   CREATE SEQUENCE test_gapless USING gapless;
>>   45   SELECT nextval('test_gapless'::regclass);
>>   46 ! ERROR:  could not access status of transaction 1275068416
>>   47 ! DETAIL:  Could not open file "pg_subtrans/4C00": No such file or
>> directory.
>>   48   BEGIN;
>>   49 SELECT nextval('test_gapless'::regclass);
>>   50 ! ERROR:  could not access status of transaction 1275068416
>>   51 ! DETAIL:  Could not open file "pg_subtrans/4C00": No such file or
>> directory.
>>   52 SELECT nextval('test_gapless'::regclass);
>>   53 ! ERROR:  current transaction is aborted, commands ignored until
>> end of transaction block
>>   54 SELECT nextval('test_gapless'::regclass);
>>   55 ! ERROR:  current transaction is aborted, commands ignored until
>> end of transaction block
>>   56   ROLLBACK;
>>   57   SELECT nextval('test_gapless'::regclass);
>>   58 ! ERROR:  could not access status of transaction 1275068416
>>   59 ! DETAIL:  Could not open file "pg_subtrans/4C00": No such file or
>> directory.
>>
>>
>> And I see the same running manually:
>>
>> fabrizio=# create extension gapless_seq;
>> CREATE EXTENSION
>> fabrizio=# create sequence x using gapless;
>> CREATE SEQUENCE
>> fabrizio=# select nextval('x');
>> ERROR:  could not access status of transaction 1258291200
>> DETAIL:  Could not open file "pg_subtrans/4B00": No such file or
directory.
>>
>> Regards,
>>
>
> Hmm I am unable to reproduce this. What OS? Any special configure flags
you use?
>

In my environment the error remains with your last patches.

I didn't use any special.

./configure --prefix=/home/fabrizio/pgsql --enable-cassert
--enable-coverage --enable-tap-tests --enable-depend
make -s -j8 install
make check-world


My environment:

fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
$ uname -a
Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux

fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
$ gcc --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-29 Thread Andres Freund
On 2016-03-29 13:24:40 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Dilip, could you test performance of reducing ppc's spinlock to 1 byte?
> > Cross-compiling suggest that doing so "just works".  I.e. replace the
> > #if defined(__ppc__) typedef from an int to a char.
> 
> AFAICS, lwarx/stwcx are specifically *word* wide.

Ah, x86 gcc is just too convenient, with it automatically adjusting
instruction types :(

There's actually lbarx/stbcx - but it's not present in all ISAs. So I
guess it's clear where to go.

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Access method extendability

2016-03-29 Thread Alexander Korotkov
On Tue, Mar 29, 2016 at 8:29 PM, Teodor Sigaev  wrote:

> I incorporated your changes and did some additional refinements on top of
>> them
>> still.
>>
>> Attached is delta against v12, that should cause less issues when merging
>> for
>> Teodor.
>>
>
> But last version is 13th?
>
> BTW, it would be cool to add odcs in VII. Internals chapter, description
> should be similar to index's interface description, i.e. how to use generic
> WAL interface.


We already have generic WAL interface description in comments to
generic_xlog.c.  I'm not fan of duplicating things.  What about moving
interface description from comments to docs completely?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] WIP: Access method extendability

2016-03-29 Thread Petr Jelinek

On 29/03/16 19:29, Teodor Sigaev wrote:

I incorporated your changes and did some additional refinements on top
of them
still.

Attached is delta against v12, that should cause less issues when
merging for
Teodor.


But last version is 13th?


No, 12 is last version from Alexander afaics, I named my merge 13, but 
somehow the diff was broken so now I just sent diff against Alexander's 
work with mine + Alvaro's changes, sorry for confusion.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-29 Thread Dilip Kumar
On Tue, Mar 29, 2016 at 2:09 PM, Dilip Kumar  wrote:

>
> Attaching new version v18

- Some cleanup work on v17.
- Improved  *UpdateFreeSpaceMap *function.
- Performance and space utilization are same as V17


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


multi_extend_v18.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Access method extendability

2016-03-29 Thread Teodor Sigaev

I incorporated your changes and did some additional refinements on top of them
still.

Attached is delta against v12, that should cause less issues when merging for
Teodor.


But last version is 13th?

BTW, it would be cool to add odcs in VII. Internals chapter, description should 
be similar to index's interface description, i.e. how to use generic WAL interface.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] raw output from copy

2016-03-29 Thread Pavel Stehule
Hi

2016-03-29 18:19 GMT+02:00 Tom Lane :

> Pavel Stehule  writes:
> > I tested COPY RAW on old psql clients - and it is working without any
> > problem - so when the client uses same logic as psql, then it should to
> > work. Sure, there can be differently implemented clients, but the COPY
> > client side is usually simple - store stream to output.
>
> My point is precisely that I doubt all clients are that stupid about COPY.
>
> > Maybe I am blind, but I don't see any new security risks. The risk can be
> > only on client side - and if client is not able work with new value, then
> > it can fails.
>
> Well, the point is that low-level code might get used to process the data
> stream for commands it doesn't have any control over.  Maybe there's no
> realistic security risk there, or maybe there is; I'm not sure.
>
> > I am thinking so PQbinaryTuples should to return 1 (without change), and
> > PQfformat should to return 2.
>
> Well, that seems pretty backwards to me.  The format of the individual
> fields is still what it is under COPY BINARY; you would not use a
> different per-field transformation.  You do need to know about the
> overall format of the copy data stream being different, and defining
> PQbinaryTuples as still returning 1 means there's no clean way to
> understand overall copy format vs. per-field format.
>

> There's a case to be made that we should invent a new function named
> along the lines of PQcopyFormat() rather than overloading PQbinaryTuples()
> some more.  That function is currently deprecated and I'm not very happy
> with un-deprecating it only to use it in a confusing way.
>

I see a introduction of PQcopyFormat() as best idea. So for
PQbinaryTuples() and PQfformat() these new changes are transparent - and
PQcopyFormat can returns info about used method.


> To be more concrete about this: I think it's actually rather broken
> that this patch ties RAW to binary format of the field contents.
> Why would it not be exactly as useful to have delimiter-less COPY
> of textual data, for use when there's just one datum and/or you're
> confident in picking the data apart for yourself?  But as things stand
> it'd be too confusing for an application to try to figure out what's
> happening in such a case.
>
> So I think we should either invent RAW_TEXT and RAW_BINARY formats
> (not just RAW) or make RAW be an orthogonal copy option.  And we need
> to improve libpq's behavior enough so that applications can sanely
> figure out what's happening.
>

I had a use case that required binary mode. Higher granularity has sense.

This opening new question - RAW_TEXT will use text output function. But if
I will pass this value as text value, then a behave of current clients will
be same as usual COPY. So I need to use binary protocol. And then the
behave of PQbinaryTuples() and PQfformat() is the question? Although text
value can be passed in binary mode too (with format [length, data...]).


>
> > I executed all tests in libpq and ecpg without any problems. Can you,
> > please, help me with repeating a ecpg issues?
>
> Of course the ecpg tests pass; you didn't extend them to see what would
> happen if someone tries COPY RAW with ecpg.   Likewise, we have no tests
> exercising a client's use of libpq with more intelligence than psql has
> got.  But that doesn't mean it's acceptable to write this patch with no
> thought for such clients.
>

if we don't change PQbinaryTuples() and PQfformat(), then COPY RAW should
be transparent for any client. Server sending data in binary format - what
is generic.


>
> I am fairly sure that there actually are third-party client libraries
> that have more intelligence about COPY than psql, but I do not remember
> any specifics unfortunately.
>

The COPY RAW should not to break any existing application. This is new
feature - and old application, old client use COPY RAW newer. I see as
important the conformity of used mode (text/binary) and PQbinaryTuples()
and PQfformat().

I am writing few lines as summary:

1. invention RAW_TEXT and RAW_BINARY
2. for RAW_BINARY: PQbinaryTuples() returns 1 and PQfformat() returns 1
3.a for RAW_TEXT: PQbinaryTuples() returns 0 and PQfformat() returns 0, but
the client should to check PQcopyFormat() to not print "\n" on the end
3.b for RAW_TEXT: PQbinaryTuples() returns 1 and PQfformat() returns 1, but
used output function, not necessary client modification
4. PQcopyFormat() returns 0 for text, 1 for binary, 2 for RAW_TEXT, 3 for
RAW_BINARY
5. create tests for ecpg

Is it ok?

What do you prefer 3.a, or 3.b?

Regards

Pavel

>
> regards, tom lane
>


Re: [HACKERS] Reworks of CustomScan serialization/deserialization

2016-03-29 Thread Robert Haas
On Mon, Mar 28, 2016 at 11:00 PM, Kouhei Kaigai  wrote:
>> -Original Message-
>> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
>> Sent: Tuesday, March 29, 2016 10:54 AM
>> To: Kaigai Kouhei(海外 浩平)
>> Cc: Petr Jelinek; pgsql-hackers@postgresql.org
>> Subject: Re: [HACKERS] Reworks of CustomScan serialization/deserialization
>>
>> On Mon, Mar 28, 2016 at 9:36 PM, Kouhei Kaigai  wrote:
>> > I don't have a strong reason to keep these stuff in separate files.
>> > Both stuffs covers similar features and amount of code are enough small.
>> > So, the attached v4 just merged custom-node.[ch] stuff into extensible.
>> >
>> > Once we put similar routines closely, it may be better to consolidate
>> > these routines.
>> > As long as EXTNODENAME_MAX_LEN == CUSTOM_NAME_MAX_LEN, both features
>> > have identical structure layout, so it is easy to call an internal
>> > common function to register or find out a table of callbacks according
>> > to the function actually called by other modules.
>> >
>> > I'm inclined to think to replace EXTNODENAME_MAX_LEN and
>> > CUSTOM_NAME_MAX_LEN by NAMEDATALEN again, to fit structure layout.
>>
>> I don't think that we need both EXTNODENAME_MAX_LEN and
>> CUSTOM_NAME_MAX_LEN; we can use EXTNODENAME_MAX_LEN for both.  I'm
>> opposed to using NAMEDATALEN for anything unrelated to the size of a
>> Name.  If it's not being stored in a catalog, it doesn't need to care.
>>
> OK, I adjusted the v4 patch to use EXTNODENAME_MAX_LEN for both.
>
> The structure of hash entry was revised as follows, then registered via
> an internal common function: RegisterExtensibleNodeEntry, and found out
> via also an internal common function: GetExtensibleNodeEntry.
>
> typedef struct
> {
> charextnodename[EXTNODENAME_MAX_LEN];
> const void *extnodemethods;
>  } ExtensibleNodeEntry;
>
> ExtensibleNodeMethods and CustomScanMethods shall be stored in
> 'extensible_node_methods' and 'custom_scan_methods' separatedly.
> The entrypoint functions calls above internal common functions with
> appropriate HTAB variable.
>
> It will be re-usable if we would have further extensible nodes in the
> future versions.

Committed with a bit of wordsmithing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-29 Thread Tom Lane
Andres Freund  writes:
> Dilip, could you test performance of reducing ppc's spinlock to 1 byte?
> Cross-compiling suggest that doing so "just works".  I.e. replace the
> #if defined(__ppc__) typedef from an int to a char.

AFAICS, lwarx/stwcx are specifically *word* wide.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Pg-Logical output pkg; can't install 9.4 and 9.5 on same Wheezy box

2016-03-29 Thread Jerry Sievers
Hackers, please see below.

Posted on Admin and then General but not sure  anyone of authority has noticed.

Briefly, conflicting  files in an include directory prevent 9.4/9.5
installed on same host instance.  Thanks

-

Posted to Admin a few days ago...  I'll try here next before going to
Hackers or trying to locate the packagers.



Anyone else as enthused as I am about Pg-Logical and tried installing
 both version on same host?

Fails on the output plugin package due to header file conflict.

I recall as might be expected that the reverse is true also.  If you
wipe 9.5 and then install 9.4 it works then you get borked trying to
install the 9.5 version.

I wonder if those headers should be packaged down into a version
numbered directory?




# uname -a
Linux jerry 3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64 GNU/Linux

# 
# echo $#; echo $*
4
postgresql-9.4-pglogical postgresql-9.4-pglogical-output 
postgresql-9.5-pglogical postgresql-9.5-pglogical-output

# dpkg -l | grep pglogical
ii  postgresql-9.5-pglogical 1.0.1-1wheezy   amd64  
 PGLogical plugin for PostgreSQL 9.5
ii  postgresql-9.5-pglogical-output  1.0.1-1wheezy   amd64  
 PGLogical Output plugin for PostgreSQL 9.5

# apt-get install -y $1
Reading package lists... Done
Building dependency tree   
Reading state information... Done
The following extra packages will be installed:
  postgresql-9.4-pglogical-output
The following NEW packages will be installed:
  postgresql-9.4-pglogical postgresql-9.4-pglogical-output
0 upgraded, 2 newly installed, 0 to remove and 308 not upgraded.
Need to get 0 B/249 kB of archives.
After this operation, 827 kB of additional disk space will be used.
debconf: unable to initialize frontend: Dialog
debconf: (Dialog frontend will not work on a dumb terminal, an emacs shell 
buffer, or without a controlling terminal.)
debconf: falling back to frontend: Readline
(Reading database ... 173140 files and directories currently installed.)
Unpacking postgresql-9.4-pglogical-output (from 
.../postgresql-9.4-pglogical-output_1.0.1-1wheezy_amd64.deb) ...
dpkg: error processing 
/var/cache/apt/archives/postgresql-9.4-pglogical-output_1.0.1-1wheezy_amd64.deb 
(--unpack):

 trying to overwrite '/usr/include/postgresql/pglogical_output/compat.h', which 
is also in package postgresql-9.5-pglogical-output 1.0.1-1wheezy


Selecting previously unselected package postgresql-9.4-pglogical.
Unpacking postgresql-9.4-pglogical (from 
.../postgresql-9.4-pglogical_1.0.1-1wheezy_amd64.deb) ...
Processing triggers for postgresql-common ...
debconf: unable to initialize frontend: Dialog
debconf: (Dialog frontend will not work on a dumb terminal, an emacs shell 
buffer, or without a controlling terminal.)
debconf: falling back to frontend: Readline
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
  en_us
Removing obsolete dictionary files:
Errors were encountered while processing:
 /var/cache/apt/archives/postgresql-9.4-pglogical-output_1.0.1-1wheezy_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

# dpkg -L $4
/.
/usr
/usr/share
/usr/share/postgresql
/usr/share/postgresql/9.5
/usr/share/postgresql/9.5/extension
/usr/share/postgresql/9.5/extension/pglogical_output--1.0.0.sql
/usr/share/postgresql/9.5/extension/pglogical_output--1.0.1.sql
/usr/share/postgresql/9.5/extension/pglogical_output--1.0.0--1.0.1.sql
/usr/share/postgresql/9.5/extension/pglogical_output.control
/usr/share/doc
/usr/share/doc/postgresql-9.5-pglogical-output
/usr/share/doc/postgresql-9.5-pglogical-output/copyright
/usr/share/doc/postgresql-9.5-pglogical-output/changelog.Debian.gz
/usr/share/doc/postgresql-9.5-pglogical-output/README.md.gz
/usr/lib
/usr/lib/postgresql
/usr/lib/postgresql/9.5
/usr/lib/postgresql/9.5/lib
/usr/lib/postgresql/9.5/lib/pglogical_output.so
/usr/include
/usr/include/postgresql
/usr/include/postgresql/pglogical_output

/usr/include/postgresql/pglogical_output/compat.h
^^

/usr/include/postgresql/pglogical_output/hooks.h

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consult...@comcast.net
p: 312.241.7800


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-29 Thread Andres Freund
On 2016-03-29 13:09:05 -0400, Robert Haas wrote:
> On Mon, Mar 28, 2016 at 9:09 AM, Andres Freund  wrote:
> > On 2016-03-28 11:48:46 +0530, Dilip Kumar wrote:
> >> On Sun, Mar 27, 2016 at 5:48 PM, Andres Freund  wrote:
> >> > What's sizeof(BufferDesc) after applying these patches? It should better
> >> > be <= 64...
> >> >
> >>
> >> It is 72.
> >
> > Ah yes, miscalculated the required alignment.  Hm. So we got to get this
> > smaller. I see three approaches:
> >
> > 1) Reduce the spinlock size on ppc. That actually might just work by
> >replacing "unsigned int" by "unsigned char"
> > 2) Replace the lwlock spinlock by a bit in LWLock->state. That'd avoid
> >embedding the spinlock, and actually might allow to avoid one atomic
> >op in a number of cases.
> > 3) Shrink the size of BufferDesc by removing buf_id; that'd bring it to
> >64byte.
> >
> > I'm a bit hesitant to go for 3), because it'd likely end up adding a bit
> > of arithmetic to a number of places in bufmgr.c.  Robert, what do you
> > think?
> 
> I don't have a clear idea what's going to be better here.

My gut feeling is that we should do both 1) and 2).

Dilip, could you test performance of reducing ppc's spinlock to 1 byte?
Cross-compiling suggest that doing so "just works".  I.e. replace the
#if defined(__ppc__) typedef from an int to a char.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Client Log Output Filtering

2016-03-29 Thread Tom Lane
Andres Freund  writes:
> There's a number of cases during early startup/auth where we really
> don't want client to get messages.

Right, which we handle at present with ClientAuthInProgress.  But
I think it's worth drawing a distinction between "don't send message
to client because of the state we're in" and "don't send this message
to client because it's security-sensitive".  The latter is better
handled by annotations attached to specific ereport sites, the former
not.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-29 Thread Robert Haas
On Mon, Mar 28, 2016 at 9:09 AM, Andres Freund  wrote:
> On 2016-03-28 11:48:46 +0530, Dilip Kumar wrote:
>> On Sun, Mar 27, 2016 at 5:48 PM, Andres Freund  wrote:
>> > What's sizeof(BufferDesc) after applying these patches? It should better
>> > be <= 64...
>> >
>>
>> It is 72.
>
> Ah yes, miscalculated the required alignment.  Hm. So we got to get this
> smaller. I see three approaches:
>
> 1) Reduce the spinlock size on ppc. That actually might just work by
>replacing "unsigned int" by "unsigned char"
> 2) Replace the lwlock spinlock by a bit in LWLock->state. That'd avoid
>embedding the spinlock, and actually might allow to avoid one atomic
>op in a number of cases.
> 3) Shrink the size of BufferDesc by removing buf_id; that'd bring it to
>64byte.
>
> I'm a bit hesitant to go for 3), because it'd likely end up adding a bit
> of arithmetic to a number of places in bufmgr.c.  Robert, what do you
> think?

I don't have a clear idea what's going to be better here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Client Log Output Filtering

2016-03-29 Thread Andres Freund
On 2016-03-29 12:58:22 -0400, Tom Lane wrote:
> Looking back at the earlier thread Andres mentioned, I see that he was
> specifically on about being able to do ereport(ERROR | LOG_NO_CLIENT),
> which I've got a problem with because of the point about not breaking
> wire-protocol expectations.

Yes.  The reason for doing it in that case is that we weren't allowed to
send errors to the client in that specific case - by the protocol state.

There's a number of cases during early startup/auth where we really
don't want client to get messages.


> and it doesn't look like he thought about the elevel-comparisons
> issue either.

Nope, I didn't. That's a long while ago ;)

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Client Log Output Filtering

2016-03-29 Thread Robert Haas
On Tue, Mar 29, 2016 at 12:58 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Mar 29, 2016 at 12:38 PM, Tom Lane  wrote:
>>> If we invent LOG_ONLY (feel free to bikeshed the name), we could later
>>> redefine it as (LOG | ERR_HIDE_FROM_CLIENT), if we ever upgrade the
>>> underlying implementation to allow that.  But I remain concerned about
>>> dealing with logic like "if (elevel < ERROR)", and I am unconvinced that
>>> there's a near-term use-case here that's compelling enough to justify
>>> finding all the places that do that.
>
>> Yeah, I think it's dead certain that such code exists, and, ahem, not
>> only in our tree.  I suspect that EDB is not the only organization
>> that has written code that involves comparing error levels.
>
> Yeah, that's exactly my concern: it'd likely break third-party code
> not only our own.
>
>> Putting
>> the flags in the low-order bits seems like it might be workable, but I
>> think using a high-order bit is right out.
>
> We'd need a bit more investigation.  I'm not exactly sure that we can
> renumber the existing elevel codes at all without breaking things
> (guc.c's management of client_min_messages et al would be a place
> to look at, for instance).  But if someone wants to pursue it,
> the general concept seems somewhat sane.
>
> Looking back at the earlier thread Andres mentioned, I see that he was
> specifically on about being able to do ereport(ERROR | LOG_NO_CLIENT),
> which I've got a problem with because of the point about not breaking
> wire-protocol expectations.  You could maybe define such a case as
> substituting a text like "error message is hidden" when sending the
> error to the client?  But the draft patch he had didn't address that
> point, and it doesn't look like he thought about the elevel-comparisons
> issue either.
>
>> I don't agree that there is no compelling use case for log-only
>> output.  I think there's a lot of use case for that, either for
>> general auditing (like pgaudit) or for any specific case where you
>> want to log sensitive information that shouldn't ever be allowed to
>> leak to the client.
>
> Oh, I agree that there's a compelling use-case for LOG |
> ERR_HIDE_FROM_CLIENT.  I'm less certain that there's a use-case
> for supporting such a flag across all elevels that is strong enough
> to justify all the work we'd have to do to make it happen.  Basically,
> my point is that LOG_ONLY achieves 95% of the benefit for probably
> 0.01% of the work.

If LOG_ONLY is what we can get right now, I'm happy to take the money and run.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2016-03-29 Thread Robert Haas
On Tue, Mar 29, 2016 at 3:17 AM, Michael Paquier
 wrote:
> OK, so I am switching this patch as "Ready for committer", for 0001.
> It is in better shape now.

Well...  I have a few questions yet.

The new argument to SyncRepWaitForLSN is called "bool commit", but
RecordTransactionAbortPrepared passes true.  Either it should be
passing false, or the parameter is misnamed or at the least in need of
a better comment.

I don't understand why this patch is touching the abort paths at all.
XactLogAbortRecord sets XACT_COMPLETION_SYNC_APPLY_FEEDBACK, and
xact_redo_abort honors it.  But surely it makes no sense to wait for
an abort to become visible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Client Log Output Filtering

2016-03-29 Thread Tom Lane
Robert Haas  writes:
> On Tue, Mar 29, 2016 at 12:38 PM, Tom Lane  wrote:
>> If we invent LOG_ONLY (feel free to bikeshed the name), we could later
>> redefine it as (LOG | ERR_HIDE_FROM_CLIENT), if we ever upgrade the
>> underlying implementation to allow that.  But I remain concerned about
>> dealing with logic like "if (elevel < ERROR)", and I am unconvinced that
>> there's a near-term use-case here that's compelling enough to justify
>> finding all the places that do that.

> Yeah, I think it's dead certain that such code exists, and, ahem, not
> only in our tree.  I suspect that EDB is not the only organization
> that has written code that involves comparing error levels.

Yeah, that's exactly my concern: it'd likely break third-party code
not only our own.

> Putting
> the flags in the low-order bits seems like it might be workable, but I
> think using a high-order bit is right out.

We'd need a bit more investigation.  I'm not exactly sure that we can
renumber the existing elevel codes at all without breaking things
(guc.c's management of client_min_messages et al would be a place
to look at, for instance).  But if someone wants to pursue it,
the general concept seems somewhat sane.

Looking back at the earlier thread Andres mentioned, I see that he was
specifically on about being able to do ereport(ERROR | LOG_NO_CLIENT),
which I've got a problem with because of the point about not breaking
wire-protocol expectations.  You could maybe define such a case as
substituting a text like "error message is hidden" when sending the
error to the client?  But the draft patch he had didn't address that
point, and it doesn't look like he thought about the elevel-comparisons
issue either.

> I don't agree that there is no compelling use case for log-only
> output.  I think there's a lot of use case for that, either for
> general auditing (like pgaudit) or for any specific case where you
> want to log sensitive information that shouldn't ever be allowed to
> leak to the client.

Oh, I agree that there's a compelling use-case for LOG |
ERR_HIDE_FROM_CLIENT.  I'm less certain that there's a use-case
for supporting such a flag across all elevels that is strong enough
to justify all the work we'd have to do to make it happen.  Basically,
my point is that LOG_ONLY achieves 95% of the benefit for probably
0.01% of the work.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Client Log Output Filtering

2016-03-29 Thread Alvaro Herrera
Robert Haas wrote:

> Yeah, I think it's dead certain that such code exists, and, ahem, not
> only in our tree.  I suspect that EDB is not the only organization
> that has written code that involves comparing error levels.  Putting
> the flags in the low-order bits seems like it might be workable, but I
> think using a high-order bit is right out.

We could redefine elevel as a struct when assertions are enabled, and
provide functions to do the < comparisons in that case; they would
reduce to regular integers when assertions are disabled.  That would
raise compile-time errors in all places that need to be updated.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Access method extendability

2016-03-29 Thread Petr Jelinek

On 29/03/16 18:25, Alvaro Herrera wrote:

+ /*-
>+  * API for construction of generic xlog records
>+  *
>+  * This API allows user to construct generic xlog records which describe
>+  * difference between pages in a generic way.  This is useful for
>+  * extensions which provide custom access methods because they can't
>+  * register their own WAL redo routines.
>+  *
>+  * Each record must be constructed by following these steps:
>+  * 1) GenericXLogStart(relation) - start construction of a generic xlog
>+  *  record for the given relation.
>+  * 2) GenericXLogRegister(buffer, isNew) - register one or more buffers
>+  *  for the record.  This function returns a copy of the page
>+  *  image where modifications can be performed.  The second argument
>+  *  indicates if the block is new (i.e. a full page image should be 
taken).
>+  * 3) Apply modification of page images obtained in the previous step.
>+  * 4) GenericXLogFinish() - finish construction of generic xlog record.
>+  *
>+  * The xlog record construction can be canceled at any step by calling
>+  * GenericXLogAbort().  All changes made to page images copies will be
>+  * discarded.
>+  *
>+  * Please, note the following points when constructing generic xlog records.
>+  * - No direct modifications of page images are allowed!  All modifications
>+  * must be done in the copies returned by GenericXLogRegister().  In 
other
>+  * words the code which makes generic xlog records must never call
>+  * BufferGetPage().
>+  * - Registrations of buffers (step 2) and modifications of page images
>+  * (step 3) can be mixed in any sequence.  The only restriction is 
that
>+  * you can only modify page image after registration of corresponding
>+  * buffer.
>+  * - After registration, the buffer also can be unregistered by calling
>+  * GenericXLogUnregister(buffer).  In this case the changes made in
>+  * that particular page image copy will be discarded.
>+  * - Generic xlog assumes that pages are using standard layout, i.e., all
>+  * data between pd_lower and pd_upper will be discarded.
>+  * - Maximum number of buffers simultaneously registered for a generic xlog
>+  * record is MAX_GENERIC_XLOG_PAGES.  An error will be thrown if 
this limit
>+  * is exceeded.
>+  * - Since you modify copies of page images, GenericXLogStart() doesn't
>+  * start a critical section.  Thus, you can do memory allocation, 
error
>+  * throwing etc between GenericXLogStart() and GenericXLogFinish().
>+  * The actual critical section is present inside GenericXLogFinish().
>+  * - GenericXLogFinish() takes care of marking buffers dirty and setting 
their
>+  * LSNs.  You don't need to do this explicitly.
>+  * - For unlogged relations, everything works the same except there is no
>+  * WAL record produced.  Thus, you typically don't need to do any 
explicit
>+  * checks for unlogged relations.
>+  * - If registered buffer isn't new, generic xlog record contains delta
>+  * between old and new page images.  This delta is produced by per 
byte
>+  * comparison.  This current delta mechanism is not effective for 
data shifts
>+  * inside the page and may be improved in the future.
>+  * - Generic xlog redo function will acquire exclusive locks on buffers
>+  * in the same order they were registered.  After redo of all 
changes,
>+  * the locks will be released in the same order.
>+  *
>+  *
>+  * Internally, delta between pages consists of set of fragments.  Each
>+  * fragment represents changes made in given region of page.  A fragment is
>+  * described as follows:
>+  *
>+  * - offset of page region (OffsetNumber)
>+  * - length of page region (OffsetNumber)
>+  * - data - the data to place into described region ('length' number of 
bytes)
>+  *
>+  * Unchanged regions of page are not represented in the delta.  As a result,
>+  * the delta can be more compact than full page image.  But if the unchanged 
region
>+  * of the page is less than fragment header (offset and length) the delta
>+  * would be bigger than the full page image. For this reason we break into 
fragments
>+  * only if the unchanged region is bigger than MATCH_THRESHOLD.
>+  *
>+  * The worst case for delta size is when we didn't find any unchanged region
>+  * in the page. Then size of delta would be size of page plus size of 
fragment
>+  * header.
>+  */
>+ #define FRAGMENT_HEADER_SIZE  (2 * sizeof(OffsetNumber))
>+ #define MATCH_THRESHOLD   FRAGMENT_HEADER_SIZE
>+ #define MAX_DELTA_SIZEBLCKSZ + FRAGMENT_HEADER_SIZE




I incorporated your changes and did some additional refinements on top 
of them still.


Attached is delta against v12, that should cause less issues when 
merging for 

Re: [HACKERS] [PROPOSAL] Client Log Output Filtering

2016-03-29 Thread Robert Haas
On Tue, Mar 29, 2016 at 12:38 PM, Tom Lane  wrote:
> Andres Freund  writes:
>> On 2016-03-29 12:28:40 -0400, Tom Lane wrote:
>>> My proposal would be to invent a new elevel macro, maybe LOG_ONLY,
>>> for this purpose.  But under the hood it'd be the same as COMMERROR.
>
>> A couple years back I proposed making thinks like COMERROR into flags |
>> ed into elevel, rather than distinct levels.  I still think that's a
>> better approach; and it doesn't force us to forgo using distinct log
>> levels.
>
> If we invent LOG_ONLY (feel free to bikeshed the name), we could later
> redefine it as (LOG | ERR_HIDE_FROM_CLIENT), if we ever upgrade the
> underlying implementation to allow that.  But I remain concerned about
> dealing with logic like "if (elevel < ERROR)", and I am unconvinced that
> there's a near-term use-case here that's compelling enough to justify
> finding all the places that do that.

Yeah, I think it's dead certain that such code exists, and, ahem, not
only in our tree.  I suspect that EDB is not the only organization
that has written code that involves comparing error levels.  Putting
the flags in the low-order bits seems like it might be workable, but I
think using a high-order bit is right out.

I don't agree that there is no compelling use case for log-only
output.  I think there's a lot of use case for that, either for
general auditing (like pgaudit) or for any specific case where you
want to log sensitive information that shouldn't ever be allowed to
leak to the client.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-03-29 Thread Julian Markwort

[This is a rather informal user-review]

Here are some thoughts and experiences on using the new features, I 
focused on testing the basic funcionality of setting password_encryption 
to scram and then generating some users with passwords. After that, I 
took a look at the documentation, specifically all those parts that 
mentioned "md5", but not SCRAM, so i took some time to write those down 
and add my thoughts on them.


We're quite keen on seeing these features in a future release, so I 
suggest that we add these patches to the next commitfest asap in order 
to keep the discussion on this topic flowing.


For those of you who like to put the authentication method itself up for 
discussion, I'd like to add that it seems fairly simple to insert code 
for new authentication mechanisms.

In conclusion I think these patches are very useful.


My remarks follow below.

Kind regards,
Julian Markwort
julian.markw...@uni-muenster.de




Things I noticed:
1.
when using either
CREATE ROLE
ALTER ROLE
with the parameter
ENCRYPTED
md5 encryption is always assumed (I've come to realize that 
UNENCRYPTED always equals plain and, in the past, ENCRYPTED equaled md5 
since there were no other options)


I don't know if this is intended behaviour. Maybe this option 
should be omitted (or marked as deprecated in the documentation) from 
the CREATE/ALTER functions (since without this Option, the 
password_encryption from pg_conf.hba is used)

or maybe it should have it's own parameter like
CREATE ROLE testuser WITH LOGIN ENCRYPTED 'SCRAM' PASSWORD 'test';
so that the desired encryption is used.
From my point of view, this would be the sensible thing to do, 
especially if different verifiers should be allowed (as proposed by 
these patches).
In either case, a bit of text explaining the (UN)ENCRYPTED option 
should be added to the documentation of the CREATE/ALTER ROLE functions.


2.
Documentation
III.
17. Server Setup and Operation
17.2. Creating a Database Cluster: maybe list SCRAM as a 
possible method for securing the db-admin


19. Client Authentication
19.1. The pg_hba.conf File: SCRAM is not listed in the list 
of available auth_methods to be specified in pg_conf.hba

19.3 Authentication Methods
19.3.2 Password Authentication: SCRAM would belong to 
the same category as md5 and password, as they are all password-based.


20. Database Roles
20.2. Role Attributes: password : list SCRAM as 
authentication method as well


VI.
ALTER ROLE: is SCRAM also dependent on the role name for 
salting? if so, add warning.
(it doesn't seem that way, however I'm curious as 
to why the function FlattenPasswordIdentifiers in 
src/backend/commands/user.c called by AlterRole passes rolname to 
scram_build_verifier(), when that function does absolutely nothing with 
this argument?)
CREATE ROLE: can SCRAM also be used in the list of PASSWORD 
VERIFIERS?


VII.
49. System Catalogs:
49.9 pg_auth_verifiers: Column names and types are mixed up
in description for column vervalue:
explain some basic stuff about md5 
maybe as well?


remark: the statements about the 
composition of the string that is md5-hashed are contradictory.
(concatenating "bar" to "foo" 
results in foobar, not the other way round, as it is implied in the 
explanation of the md5 hashing), this however, is not really linked to 
the changes introduced with these patches.


remark: naming inconsistency: md5 
vervalues are stored "md5*" why don't we take the same approach and use 
it on SCRAM hashes (i.e. "scram*" ).
(if this is a general convention 
thing, please ignore this comment, however I couldn't find anything in 
the relevant RFC's while skimming through them).


50. Frontend/Backend Protocol
50.2.1 Start-up:  add explanation for 
"AuthenticationSCRAMPassword" authentication request message. (?)

50.5 message formats  see 50.2.1



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Client Log Output Filtering

2016-03-29 Thread Andres Freund
On 2016-03-29 12:38:48 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2016-03-29 12:28:40 -0400, Tom Lane wrote:
> If we invent LOG_ONLY (feel free to bikeshed the name), we could later
> redefine it as (LOG | ERR_HIDE_FROM_CLIENT), if we ever upgrade the
> underlying implementation to allow that.  But I remain concerned about
> dealing with logic like "if (elevel < ERROR)", and I am unconvinced that
> there's a near-term use-case here that's compelling enough to justify
> finding all the places that do that.

Hm. Putting the distinct levels in the upper bits, and the flags in the
lower bits should deal with that concern?   We already have a bunch of
pretty ugly bits about errors that shouldn't go to clients, I'd rather
have something that allows addressing those as well.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-03-29 Thread Magnus Hagander
On Tue, Mar 29, 2016 at 4:36 PM, David Steele  wrote:

> On 3/22/16 12:31 PM, Magnus Hagander wrote:
>
> On Tue, Mar 22, 2016 at 5:27 PM, David Steele > > wrote:
>>
> >
>
>> > Adding the stop time column should be a simple addition and I don't
>> see
>> > a problem with that. I think I misunderstood your original request
>> on
>> > that. Because you are just talking about returning a timestamptz
>> with
>> > the "right now" value for when you called pg_stop_backup()? Or to be
>> > specific, just before pg_Stop_backup *finished*. Or do you mean when
>> > pg_stop_backup() started?
>>
>> What would be ideal is the minimum time that could be used for PITR.
>> In
>> an exclusive backup that's the time the end-of-backup record is
>> written
>> to WAL.  In a non-exlusive backup I'm not quite sure how that works.
>>
>
> I guess I was hoping that you would know.  I fine with just getting the
> current timestamp as is currently done in do_pg_stop_backup().  It's not
> perfect but it will be pretty close.
>
> I thought some more about putting STOP_WAL_LOCATION into the backup label
> and I think this is an important step.  Without that the recovery process
> needs to use pg_control to determine when the database has reach
> consistency and that will only work if pg_control was copied last.
>
> In summary, I think the patch should be updated to include stop_time as a
> column and add STOP_WAL_LOCATION and STOP_TIME to backup_label.  The
> recovery process should read STOP_WAL_LOCATION from backup_label and use
> that to decide when the database has become consistent.
>

Is that the only reason we need pg_control to be copied last? I thought
there were other reasons for that.

I was chatting with Stephen about this earlier, and one thing we considered
was that maybe we should return pg_control as a bytea field from
pg_stop_backup() thereby strongly hinting to tools that they should write
it out from there rather than copy it from the data directory (we can't
stop them from copying it from the data directory like we do for
backup_label of course, but if we return the contents and document that's
how it should be done, it's pretty clear).

But if we can actually remove the whole requirement on the order of the
copy of pg_control by doing that it certainly seems worthwhile.


So - I can definitely see the argument for returning the stop wal
*location*. But I'm still not sure what the definition of the time would
be? We can't return it before we know what it means...


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


  1   2   >