Re: [HACKERS] Quorum commit for multiple synchronous replication.

2017-08-22 Thread Michael Paquier
On Wed, Aug 23, 2017 at 3:04 PM, Masahiko Sawada  wrote:
> It seems to me that we should discuss whether we want to keep the some
> syntax such as 'a,b', 'N(a,b)' before thinking whether or not that
> making the quorum commit the default behavior of 'N(a,b)' syntax. If
> we plan to remove such syntax in a future release we can live with the
> current code and should document it.

The parsing code of repl_gram.y represents zero maintenance at the
end, so let me suggest to just live with what we have and do nothing.
Things kept as they are are not bad either. By changing the default,
people may have their failover flows silently trapped. So if we change
the default we will perhaps make some users happy, but I think that we
are going to make also some people angry. That's not fun to debug
silent failover issues.

At the end of the day, we could just add one sentence in the docs
saying the use of ANY and FIRST is encouraged over the past grammar
because they are clearer to understand.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Quorum commit for multiple synchronous replication.

2017-08-22 Thread Masahiko Sawada
On Sat, Aug 19, 2017 at 12:28 AM, Robert Haas  wrote:
> On Thu, Aug 17, 2017 at 1:13 AM, Michael Paquier
>  wrote:
>> I had in mind a ereport(WARNING) in create_syncrep_config. Extra
>> thoughts/opinions welcome.
>
> I think for v10 we should just document the behavior we've got; I
> think it's too late to be whacking things around now.
>
> For v11, we could emit a warning if we plan to deprecate and
> eventually remove the syntax without ANY/FIRST, but let's not do:
>
> WARNING:  what you did is ok, but you might have wanted to do something else
>
> First of all, whether or not that can properly be called a warning is
> highly debatable.  Also, if you do that sort of thing to your spouse
> and/or children, they call it "nagging".  I don't think users will
> like it any more than family members do.
>

It seems to me that we should discuss whether we want to keep the some
syntax such as 'a,b', 'N(a,b)' before thinking whether or not that
making the quorum commit the default behavior of 'N(a,b)' syntax. If
we plan to remove such syntax in a future release we can live with the
current code and should document it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC: Sharing record typmods between backends

2017-08-22 Thread Andres Freund
On 2017-08-22 16:41:23 -0700, Andres Freund wrote:
> On 2017-08-21 11:02:52 +1200, Thomas Munro wrote:
> > On Mon, Aug 21, 2017 at 6:17 AM, Andres Freund  wrote:
> > > Pushing 0001, 0002 now.
> > >
> > > - rebased after conflicts
> > > - fixed a significant number of too long lines
> > > - removed a number of now superflous linebreaks
> >
> > Thanks!  Please find attached a rebased version of the rest of the patch 
> > set.
>
> Pushed 0001, 0002.  Looking at later patches.

Committing 0003. This'll probably need further adjustment, but I think
it's good to make progress here.

Changes:
- pgindent'ed after adding the necessary typedefs to typedefs.list
- replaced INT64CONST w UINT64CONST
- moved count assertion in delete_item to before decrementing - as count
  is unsigned, it'd just wrap around on underflow not triggering the assertion.
- documented and asserted resize is called without partition lock held
- removed reference to iterator in dshash_find comments
- removed stray references to dshash_release (rather than dshash_release_lock)
- reworded dshash_find_or_insert reference to dshash_find to also
  mention error handling.

Notes for possible followup commits of the dshash API:
- nontrivial portions of dsahash are essentially critical sections lest
  dynamic shared memory is leaked. Should we, short term, introduce
  actual critical section markers to make that more obvious? Should we,
  longer term, make this more failsafe / easier to use, by
  extending/emulating memory contexts for dsa memory?
- I'm very unconvinced of supporting both {compare,hash}_arg_function
  and the non-arg version. Why not solely support the _arg_ version, but
  add the size argument? On all relevant platforms that should still be
  register arg callable, and the branch isn't free either.
- might be worthwhile to try to reduce duplication between
  delete_item_from_bucket, delete_key_from_bucket, delete_item
  dshash_delete_key.


For later commits in the series:
- Afaict the whole shared tupledesc stuff, as tqueue.c before, is
  entirely untested. This baffles me. See also [1]. I can force the code
  to be reached with force_parallel_mode=regress/1, but this absolutely
  really totally needs to be reached by the default tests. Robert?
- gcc wants static before const (0004).
- Afaict GetSessionDsmHandle() uses the current rather than
  TopMemoryContext. Try running the regression tests under
  force_parallel_mode - crashes immediately for me without fixing that.
- SharedRecordTypmodRegistryInit() is called from GetSessionDsmHandle()
  which calls EnsureCurrentSession(), but
  SharedRecordTypmodRegistryInit() does so again - sprinkling those
  around liberally seems like it could hide bugs.

Regards,

Andres

[1] https://coverage.postgresql.org/src/backend/executor/tqueue.c.gcov.html


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-08-22 Thread Haribabu Kommi
On Mon, Aug 21, 2017 at 7:25 PM, Amit Kapila 
wrote:

> On Mon, Aug 21, 2017 at 12:58 PM, Haribabu Kommi
>  wrote:
> >
> > On Sun, Aug 13, 2017 at 5:17 PM, Amit Kapila 
> > wrote:
> >>
> >>
> >> Also, it is quite possible that some of the storage Am's don't even
> >> want to return bool as a parameter from HeapTupleSatisfies* API's.  I
> >> guess what we need here is to provide a way so that different storage
> >> am's can register their function pointer for an equivalent to
> >> satisfies function.  So, we need to change
> >> SnapshotData.SnapshotSatisfiesFunc in some way so that different
> >> handlers can register their function instead of using that directly.
> >> I think that should address the problem you are planning to solve by
> >> omitting buffer parameter.
> >
> >
> > Thanks for your suggestion. Yes, it is better to go in the direction of
> > SnapshotSatisfiesFunc.
> >
> > I verified the above idea of implementing the Tuple visibility functions
> > and assign them into the snapshotData structure based on the snapshot.
> >
> > The Tuple visibility functions that are specific to the relation are
> > available
> > with the RelationData structure and this structure may not be available,
> >
>
> Which functions are you referring here?  I don't see anything in
> tqual.h that uses RelationData.



With storage API's, the tuple visibility functions are available with
RelationData
and those are needs used to update the SnapshotData structure
SnapshotSatisfiesFunc member.

But the RelationData is not available everywhere, where the snapshot is
created,
but it is available every place where the tuple visibility is checked. So I
just changed
the way of checking the tuple visibility with the information of snapshot
by calling
the corresponding tuple visibility function from RelationData.

If SnapshotData provides MVCC, then the MVCC specific tuple visibility
function from
RelationData is called. The SnapshotSatisfiesFunc member is changed to a
enum
that holds the tuple visibility type such as MVCC, DIRTY, SELF and etc.
Whenever
the visibility check is needed, the corresponding function is called.


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Pluggable storage

2017-08-22 Thread Haribabu Kommi
On Tue, Aug 15, 2017 at 4:53 PM, Andres Freund  wrote:

> Hi,
>
>
> On 2017-06-13 11:50:27 +1000, Haribabu Kommi wrote:
> > Here I attached WIP patches to support pluggable storage. The patch
> series
> > are may not work individually. Still so many things are under
> development.
> > These patches are just to share the approach of the current development.
>
> Making a pass through the patchset to get a feel where this at, and
> where this is headed.  I previously skimmed the thread to get a rough
> sense on what's discused, but not in a very detailed manner.
>

Thanks for the review.


General:
>
> - I think one important discussion we need to have is what kind of
>   performance impact we're going to accept introducing this. It seems
>   very likely that this'll cause some slowdown.  We can kind of
>   alleviate that by doing some optimizations at the same time, but
>   nevertheless, this abstraction is going to cost.
>

OK. May be to take some decision, we may need some performance
figures, I will measure the performance once the API's stabilized.

- I don't think we should introduce this without a user besides
>   heapam. The likelihood that API will be usable by anything else
>   without a testcase seems fairly remote.  I think some src/test/modules
>   type implementation of a per-session, in-memory storage - relatively
>   easy to implement - or such is necessary.
>

Sure, I will add a test module once the API's are stabilized.



> - I think, and detailed some of that, we're going to need some cleanups
>   that go in before this, to decrease the size / increase the quality of
>   the new APIs.  It's going to get more painful to change APIs
>   subsequently.
>
> - We'll have to document clearly that these APIs are going to change for
>   a while, even after the release introducing them.
>

Yes, that's correct, because this is the first time we are developing the
storage API's to support pluggable storage, so it may needs some refinements
based on the usage to support different storage methods.



> StorageAm - Scan stuff:
>
> - I think API isn't quite right. There's a lot of granular callback
>   functionality like scan_begin_function / scan_begin_catalog /
>   scan_begin_bm - these largely are convenience wrappers around the same
>   function, and I don't think that would, or rather should, change in
>   any abstracted storage layer.  So I think there needs to be some
>   unification first (pretty close w/ beginscan_internal already, but
>   perhaps we should get rid of a few of these wrappers).
>

OK. I will change the API to add a function to beginscan_internal and
replace
the rest of the functions usage with beginscan_internal. And also there are
many bool flags that are passed to the beginscan_internal, I will try to
optimize
them also.



> - Some of the exposed functionality, e.g. scan_getpage,
>   scan_update_snapshot, scan_rescan_set_params looks like it should just
>   be excised, i.e. there's no reason for it to exist.
>

Currently these API's are used only in Bitmap and Sample scan's.
These scan methods are fully depends on the heap format. I will
check how to remove these API's.

- Minor: don't think the _function suffix for Storis necessary, just
>   makes things long, and every member has it. Besides that, it's also
>   easy to misunderstand - for a second I understood
>   scan_getnext_function to be about getting the next function...
>

OK. How about adding _hook?



> - Scans are still represented as HeapScanDesc - I don't think that's
>   going to fly. Either this needs to be an opaque type (i.e. a struct
>   that's not defined, just forward declared), or it needs to be a base
>   struct that individual AMs embed in their own structs.  Individual AMs
>   definitely are going to need different pieces of data.
>

Currently the internal members of the HeapScanDesc are directly used
in many places especially in Bitmap and Sample scan's. I am yet to write
the code in a best way to handle these scan methods and then removing
its usage will be easy.


> Storage AM - tuple stuff:
>
> - tuple_get_{xmin, updated_xid, cmin, itempointer, ctid, heaponly} are
>   each individual functions, that seems pretty painful to maintain, and
>   v/ likely to just grow and grow. Not sure what the solution is, but
>   this seems like a hard sell.
>

OK. How about adding a one API and takes some flags to represent
what type of data that is needed from the tuple and returned the
corresponding
data as void *. The caller must typecast the data to their corresponding
type before use it.

- The three *speculative* functions don't quite seem right to me, nor do
>   I understand:
> +*
> +* Setting a tuple's speculative token is a slot-only operation,
> so no need
> +* for a storage AM method, but after inserting a tuple containing
> a
> +* speculative token, the insertion must be completed by these
> routines:
> +*/
>   I don't see anything related to slots, here?
>

The tupl

Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages

2017-08-22 Thread Dilip Kumar
On Tue, Aug 22, 2017 at 8:40 PM, Alexander Kumenkov
 wrote:
> Hi Dilip,
>
> Recently I was thinking about this too, when working on the index-only
> count(*) patch for indexes supporting amgetbitmap [1]. That patch teaches
> bitmap heap scan node to skip heap fetches under certain conditions. Exact
> tidbitmap pages are a prerequisite for this, so the lossines of the bitmap
> heavily influences the cost of a scan.
>
> I used a very simple estimation: lossy_pages = max(0, total_pages -
> maxentries / 2). The rationale is that after the initial lossification, the
> number of lossy pages grows slower. It is good enough to reflect this
> initial sharp increase in the lossy page number.

Make sense to me.
>
> The thing that seems more important to me here is that the tidbitmap is very
> aggressive in lossifying the pages: it tries to keep the number of entries
> under maxentries / 2, see tbm_lossify():
> ...
> if (tbm->nentries <= tbm->maxentries / 2)
> {
> /*
>  * We have made enough room.
> ...
> I think we could try higher fill factor, say, 0.9. tbm_lossify basically
> just continues iterating over the hashtable with not so much overhead per a
> call, so calling it more frequently should not be a problem. On the other
> hand, it would have to process less pages, and the bitmap would be less
> lossy.
>
> I didn't benchmark the index scan per se with the 0.9 fill factor, but the
> reduction of lossy pages was significant.

I will try this and produce some performance number.

Thanks for the input.

>
> [1]
> https://www.postgresql.org/message-id/flat/251401bb-6f53-b957-4128-578ac22e8acf%40postgrespro.ru#251401bb-6f53-b957-4128-578ac22e8...@postgrespro.ru
>


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explicit relation name in VACUUM VERBOSE log

2017-08-22 Thread Masahiko Sawada
On Tue, Aug 22, 2017 at 3:23 PM, Simon Riggs  wrote:
> On 15 August 2017 at 02:27, Masahiko Sawada  wrote:
>
>> Is there any reasons why we don't
>> write an explicit name in vacuum verbose logs?
>
> None. Sounds like a good idea.
>
>> If not, can we add
>> schema names to be more clearly?
>
> Yes, we can. I'm not sure why you would do this only for VACUUM
> though? I see many messages in various places that need same treatment

Yeah, I was thinking that too. But since there are a lot of message
that output relation name I proposed this for the first step.

> I would also be inclined to do this by changing only the string
> presented, not the actual message string.

+1

> e.g.
> replace RelationGetRelationName() with
> RelationGetOptionallyQualifiedRelationName()
> and then control whether we include this new behaviour with
> log_qualified_object_names = on | off

Is there any case where we don't want to get non-qualified object
names? If users want to get the same log message as what they got so
far, it would be better to have a GUC that allows us to switch between
the existing behavior and the forcibly logging qualified object names.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regression stoping PostgreSQL 9.4.13 if a walsender is running

2017-08-22 Thread Michael Paquier
On Wed, Aug 23, 2017 at 2:28 AM, Marco Nenciarini
 wrote:
> I have noticed that after the 9.4.13 release PostgreSQL reliably fails
> to shutdown with smart and fast method if there is a running walsender.
>
> The postmaster continues waiting forever for the walsender termination.
>
> It works perfectly with all the other major releases.

Right. A similar issue has been reported yesterday:
https://www.postgresql.org/message-id/caa5_dud0o1xym8onozhrepypu-t8nzklzs1pt2jpzp0ns+v...@mail.gmail.com
Thanks for digging into the origin of the problem, I was lacking of
time yesterday to look at it.

> I bisected the issue to commit 1cdc0ab9c180222a94e1ea11402e728688ddc37d
>
> After some investigation I discovered that the instruction that sets
> got_SIGUSR2 was lost during the backpatch in the WalSndLastCycleHandler
> function.

That looks correct to me, only REL9_4_STABLE is impacted. This bug
breaks many use cases like failovers :(
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] obsolete code in pg_upgrade

2017-08-22 Thread Peter Eisentraut
It seems to me that this code in pg_upgrade/check.c has been useless
since at least version 9.1:

/* Is it 9.0 but without tablespace directories? */
if (GET_MAJOR_VERSION(new_cluster.major_version) == 900 &&
new_cluster.controldata.cat_ver < TABLE_SPACE_SUBDIRS_CAT_VER)
pg_fatal("This utility can only upgrade to PostgreSQL version
9.0 after 2010-01-11\n"
 "because of backend API changes made during
development.\n");


-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] scan-build plpython stuff

2017-08-22 Thread Peter Eisentraut
I wanted to make some progress cleaning up some of the warnings produced
by scan-build, so I grabbed plpython and fixed all the warnings there.

Attached are two patches, one that decorates PLy_elog() with compiler
hints similar to elog(), and one with some very minor tweaks.  This
fixes all 13 or so (depending on scan-build version) warnings in plpython.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 6dcfacdfab7cf0ed98065984bc4d8e00e94c12e6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 22 Aug 2017 20:05:49 -0400
Subject: [PATCH 1/2] Add compiler hints to PLy_elog()

Decorate PLy_elog() in a similar way as elog(), to give compilers and
static analyzers hints in which cases it does not return.
---
 src/pl/plpython/plpy_elog.c |  2 +-
 src/pl/plpython/plpy_elog.h | 28 +++-
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/src/pl/plpython/plpy_elog.c b/src/pl/plpython/plpy_elog.c
index bb864899f6..e244104fed 100644
--- a/src/pl/plpython/plpy_elog.c
+++ b/src/pl/plpython/plpy_elog.c
@@ -44,7 +44,7 @@ static bool set_string_attr(PyObject *obj, char *attrname, 
char *str);
  * in the context.
  */
 void
-PLy_elog(int elevel, const char *fmt,...)
+PLy_elog_impl(int elevel, const char *fmt,...)
 {
char   *xmsg;
char   *tbmsg;
diff --git a/src/pl/plpython/plpy_elog.h b/src/pl/plpython/plpy_elog.h
index e73177d130..e4b30c3cca 100644
--- a/src/pl/plpython/plpy_elog.h
+++ b/src/pl/plpython/plpy_elog.h
@@ -10,7 +10,33 @@ extern PyObject *PLy_exc_error;
 extern PyObject *PLy_exc_fatal;
 extern PyObject *PLy_exc_spi_error;
 
-extern void PLy_elog(int elevel, const char *fmt,...) pg_attribute_printf(2, 
3);
+/*
+ * PLy_elog()
+ *
+ * See comments at elog() about the compiler hinting.
+ */
+#ifdef HAVE__VA_ARGS
+#ifdef HAVE__BUILTIN_CONSTANT_P
+#define PLy_elog(elevel, ...) \
+   do { \
+   PLy_elog_impl(elevel, __VA_ARGS__); \
+   if (__builtin_constant_p(elevel) && (elevel) >= ERROR) \
+   pg_unreachable(); \
+   } while(0)
+#else  /* 
!HAVE__BUILTIN_CONSTANT_P */
+#define PLy_elog(elevel, ...)  \
+   do { \
+   const int elevel_ = (elevel); \
+   PLy_elog_impl(elevel_, __VA_ARGS__); \
+   if (elevel_ >= ERROR) \
+   pg_unreachable(); \
+   } while(0)
+#endif /* 
HAVE__BUILTIN_CONSTANT_P */
+#else  /* !HAVE__VA_ARGS */
+#define PLy_elog PLy_elog_impl
+#endif /* HAVE__VA_ARGS */
+
+extern void PLy_elog_impl(int elevel, const char *fmt,...) 
pg_attribute_printf(2, 3);
 
 extern void PLy_exception_set(PyObject *exc, const char *fmt,...) 
pg_attribute_printf(2, 3);
 
-- 
2.14.1

From 1b634547e006df6e904be7b8a9106c27a8c17c77 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 22 Aug 2017 20:05:49 -0400
Subject: [PATCH 2/2] PL/Python: Fix remaining scan-build warnings

Apparently, scan-build thinks that proc->is_setof can change during
PLy_exec_function().  To make it clearer, save the value in a local
variable.

Also add an assertion to clear another warning.
---
 src/pl/plpython/plpy_exec.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/pl/plpython/plpy_exec.c b/src/pl/plpython/plpy_exec.c
index 26f61dd0f3..b10b1681f1 100644
--- a/src/pl/plpython/plpy_exec.c
+++ b/src/pl/plpython/plpy_exec.c
@@ -57,6 +57,7 @@ static void PLy_abort_open_subtransactions(int 
save_subxact_level);
 Datum
 PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedure *proc)
 {
+   boolis_setof = proc->is_setof;
Datum   rv;
PyObject   *volatile plargs = NULL;
PyObject   *volatile plrv = NULL;
@@ -73,7 +74,7 @@ PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedure *proc)
 
PG_TRY();
{
-   if (proc->is_setof)
+   if (is_setof)
{
/* First Call setup */
if (SRF_IS_FIRSTCALL())
@@ -93,6 +94,7 @@ PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedure *proc)
funcctx = SRF_PERCALL_SETUP();
Assert(funcctx != NULL);
srfstate = (PLySRFState *) funcctx->user_fctx;
+   Assert(srfstate != NULL);
}
 
if (srfstate == NULL || srfstate->iter == NULL)
@@ -125,7 +127,7 @@ PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedure 
*proc)
 * We stay in the SPI context while doing this, because 
PyIter_Next()
 * calls back into Python code which might contain SPI calls.
 */
-   if (proc->is_setof)
+   if (is_setof)
 

Re: [HACKERS] POC: Sharing record typmods between backends

2017-08-22 Thread Andres Freund
On 2017-08-21 11:02:52 +1200, Thomas Munro wrote:
> On Mon, Aug 21, 2017 at 6:17 AM, Andres Freund  wrote:
> > Pushing 0001, 0002 now.
> >
> > - rebased after conflicts
> > - fixed a significant number of too long lines
> > - removed a number of now superflous linebreaks
> 
> Thanks!  Please find attached a rebased version of the rest of the patch set.

Pushed 0001, 0002.  Looking at later patches.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Blessed TupleDescs and constraints

2017-08-22 Thread Andres Freund
Hi,

while reviewing the "shared record typmod" patchseries I noticed that
that the last version [1] of the patchset didn't consider the constraint
in the computed hashvalue, but comparison did. In the ensuing IM
discussion Thomas and I noticed that assign_record_type_typmod() already
doesn't properly deal with this.

It's possible to get tupledescs with constraints blessed, e.g. because
plpgsql is fairly liberal in doing so.  But as
assign_record_type_typmod(), creates its cache entries with
CreateTupleDescCopy(), the end result is that we'll never find an
existing blessed tupledesc when doing a lookup - the cached ones will
all have no constraints and thus the comparison will fail.

In many cases that'll not be immediately observable, because the blessed
tupledesc, rather than its cached copy, will have it's tdtypemod updated
too, therefore avoiding future blessings. But that won't work if another
copy is used for the lookup.

Does anyone see a case where it'd be good to consider constraints in
blessed tupledescs? Please note that that'd also be problematic for
parallelism. Both for existing code (tqueue doesn't handle constraints)
and proposed future code (it'd be considerably more complicated to share
constraints too).


Greetings,

Andres Freund

[1] 
http://archives.postgresql.org/message-id/CAEepm%3D04LM87Ya_Avgw40934Wh3G4Oyy%2BmmthYHuMb9m5WZwaQ%40mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-08-22 Thread Alik Khilazhev
Hello, Fabien

I am attaching patch v7.

> Yes, I agree. a >= 1 does not make much sense... If you want uniform you 
> should use random(), not call random_zipfian with a = 1. Basically it 
> suggests that too large values of "a" should be rejected. Not sure where to 
> put the limit, though.

I set upper bound for parameter to be equal to 1000.
> 
> Yes, as a general principle I think that the documentation should reflect the 
> implementation.

Documentation have been updated, I have removed algorithms descriptions and put 
references to them there.



pgbench-zipf-07v.patch
Description: Binary data
—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb failure on Debian sid/mips64el in EventTriggerEndCompleteQuery

2017-08-22 Thread Christoph Berg
Re: Tom Lane 2017-08-13 <14517.1502638...@sss.pgh.pa.us>
> I suspect you could work around this with
> 
>   boolisCompleteQuery = (context <= PROCESS_UTILITY_QUERY);
> - boolneedCleanup;
> + volatile bool   needCleanup;
>   boolcommandCollected = false;
> 
> If that fixes it, it's definitely a compiler bug.  That function does
> not change needCleanup after the sigsetjmp call, so per POSIX it
> should not have to label the variable volatile.  This is far from
> being the first such bug we've seen though.

In the meantime, gcc-7 is at version 7.2.0-1, so I gave 9.6 on
mips64el a new try. It's still failing at initdb time, and indeed
adding "volatile" makes initdb proceed, but then the rest of the
testsuite fails in various ways:

DETAIL:  Failed process was running: CREATE TABLE enumtest_child (parent 
rainbow REFERENCES enumtest_parent);
DETAIL:  Failed process was running: create table trigtest2 (i int references 
trigtest(i) on delete cascade);
DETAIL:  Failed process was running: CREATE TABLE trunc_b (a int REFERENCES 
truncate_a);
DETAIL:  Failed process was running: CREATE SCHEMA evttrig
CREATE TABLE one (col_a SERIAL PRIMARY KEY, col_b text DEFAULT 
'forty two')
CREATE INDEX one_idx ON one (col_b)
CREATE TABLE two (col_c INTEGER CHECK (col_c > 0) REFERENCES 
one DEFAULT 42);

Hopefully the compiler gets fixed soonish on mips64el...

Thanks for the analysis,
Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] coverage analysis improvements

2017-08-22 Thread Peter Eisentraut
On 8/21/17 01:23, Michael Paquier wrote:
> Patch 0001 fails to apply as of c629324.

Updated patches attached.

> Which versions of lcov and gcov did you use for your tests?

LCOV version 1.13, and gcc-7 and gcc-6

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 7b2f5087a2d0710e6a5c7ffa946dcfabd163f987 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 10 Aug 2017 23:33:47 -0400
Subject: [PATCH v2 1/9] Run only top-level recursive lcov

This is the way lcov was intended to be used.  It is much faster and
more robust and makes the makefiles simpler than running it in each
subdirectory.

This also removes the direct gcov calls and lets lcov do it instead.
The direct gcov calls are useless because lcov/geninfo call gcov
internally and use that information.
---
 .gitignore|  3 +--
 GNUmakefile.in|  2 +-
 doc/src/sgml/regress.sgml |  2 +-
 src/Makefile.global.in| 33 +++--
 4 files changed, 14 insertions(+), 26 deletions(-)

diff --git a/.gitignore b/.gitignore
index 4976fd9119..2052f719d1 100644
--- a/.gitignore
+++ b/.gitignore
@@ -19,10 +19,9 @@ objfiles.txt
 .deps/
 *.gcno
 *.gcda
-*.gcov
-*.gcov.out
 lcov.info
 coverage/
+coverage-stamp
 *.vcproj
 *.vcxproj
 win32ver.rc
diff --git a/GNUmakefile.in b/GNUmakefile.in
index dc76a5d11d..8d77b01eea 100644
--- a/GNUmakefile.in
+++ b/GNUmakefile.in
@@ -41,7 +41,7 @@ install-world-contrib-recurse: install-world-src-recurse
 
 $(call recurse,installdirs uninstall init-po update-po,doc src config)
 
-$(call recurse,distprep coverage,doc src config contrib)
+$(call recurse,distprep,doc src config contrib)
 
 # clean, distclean, etc should apply to contrib too, even though
 # it's not built by default
diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 7c2b1029c2..796cdc26ff 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -699,7 +699,7 @@ Test Coverage Examination
 ./configure --enable-coverage ... OTHER OPTIONS ...
 make
 make check # or other test suite
-make coverage-html
+make coverage
 
 Then point your HTML browser
 to coverage/index.html.
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index e8b3a519cb..e39ed884e7 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -19,7 +19,7 @@
 #
 # Meta configuration
 
-standard_targets = all install installdirs uninstall distprep clean distclean 
maintainer-clean coverage check installcheck init-po update-po
+standard_targets = all install installdirs uninstall distprep clean distclean 
maintainer-clean check installcheck init-po update-po
 # these targets should recurse even into subdirectories not being built:
 standard_always_targets = distprep clean distclean maintainer-clean
 
@@ -863,34 +863,23 @@ endif # enable_nls
 #  (by gcc -ftest-coverage)
 #   foo.gcda   gcov data file, created when the program is run (for
 #  programs compiled with gcc -fprofile-arcs)
-#   foo.c.gcov gcov output file with coverage information, created by
-#  gcov from foo.gcda (by "make coverage")
-#   foo.c.gcov.out  stdout captured when foo.c.gcov is created, mildly
-#  interesting
 #   lcov.info  lcov tracefile, built from gcda files in one directory,
 #  later collected by "make coverage-html"
 
 ifeq ($(enable_coverage), yes)
 
-# There is a strange interaction between lcov and existing .gcov
-# output files.  Hence the rm command and the ordering dependency.
-
-gcda_files := $(wildcard *.gcda)
+gcda_files = $(shell find . -name '*.gcda' -print)
 
 lcov.info: $(gcda_files)
-   rm -f *.gcov .*.gcov
-   $(if $^,$(LCOV) -d . -c -o $@ $(LCOVFLAGS) --gcov-tool $(GCOV))
-
-%.c.gcov: %.gcda | lcov.info
-   $(GCOV) -b -f -p -o . $(GCOVFLAGS) $*.c >$*.c.gcov.out
-
-coverage: $(gcda_files:.gcda=.c.gcov) lcov.info
+   $(LCOV) -d . -c -o $@ $(LCOVFLAGS) --gcov-tool $(GCOV)
 
-.PHONY: coverage-html
-coverage-html: coverage
+coverage-stamp: lcov.info
rm -rf coverage
-   mkdir coverage
-   $(GENHTML) --show-details --legend --output-directory=coverage 
--title=PostgreSQL --num-spaces=4 --prefix=$(abs_top_srcdir) `find . -name 
lcov.info -print`
+   $(GENHTML) --show-details --legend --output-directory=coverage 
--title=PostgreSQL --num-spaces=4 --prefix=$(abs_top_srcdir) $<
+   touch $@
+
+.PHONY: coverage coverage-html
+coverage coverage-html: coverage-stamp
 
 
 # hook for clean-up
@@ -898,8 +887,8 @@ clean distclean maintainer-clean: clean-coverage
 
 .PHONY: clean-coverage
 clean-coverage:
-   rm -rf coverage
-   rm -f *.gcda *.gcno lcov.info *.gcov .*.gcov *.gcov.out
+   rm -rf coverage/ coverage-stamp
+   rm -f *.gcda *.gcno lcov.info
 
 
 # User-callable target to reset counts between test runs

base-commit: 2bfd1b1ee562c4e4fd065c7f7d1beaa9b9852070
-- 
2.14.1

From 0bbf137972e22923f650a38ff46c0e13

[HACKERS] Regression stoping PostgreSQL 9.4.13 if a walsender is running

2017-08-22 Thread Marco Nenciarini
I have noticed that after the 9.4.13 release PostgreSQL reliably fails
to shutdown with smart and fast method if there is a running walsender.

The postmaster continues waiting forever for the walsender termination.

It works perfectly with all the other major releases.

I bisected the issue to commit 1cdc0ab9c180222a94e1ea11402e728688ddc37d

After some investigation I discovered that the instruction that sets
got_SIGUSR2 was lost during the backpatch in the WalSndLastCycleHandler
function.

The trivial patch is the following:

~~~
diff --git a/src/backend/replication/walsender.c
b/src/backend/replication/walsender.c
index a0601b3..b24f9a1 100644
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
*** WalSndLastCycleHandler(SIGNAL_ARGS)
*** 2658,2663 
--- 2658,2664 
  {
int save_errno = errno;

+   got_SIGUSR2 = true;
if (MyWalSnd)
SetLatch(&MyWalSnd->latch);

~~~

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages

2017-08-22 Thread Alexander Kumenkov

Hi Dilip,

Recently I was thinking about this too, when working on the index-only 
count(*) patch for indexes supporting amgetbitmap [1]. That patch 
teaches bitmap heap scan node to skip heap fetches under certain 
conditions. Exact tidbitmap pages are a prerequisite for this, so the 
lossines of the bitmap heavily influences the cost of a scan.


I used a very simple estimation: lossy_pages = max(0, total_pages - 
maxentries / 2). The rationale is that after the initial lossification, 
the number of lossy pages grows slower. It is good enough to reflect 
this initial sharp increase in the lossy page number.


The thing that seems more important to me here is that the tidbitmap is 
very aggressive in lossifying the pages: it tries to keep the number of 
entries under maxentries / 2, see tbm_lossify():

...
if (tbm->nentries <= tbm->maxentries / 2)
{
/*
 * We have made enough room.
...
I think we could try higher fill factor, say, 0.9. tbm_lossify basically 
just continues iterating over the hashtable with not so much overhead 
per a call, so calling it more frequently should not be a problem. On 
the other hand, it would have to process less pages, and the bitmap 
would be less lossy.


I didn't benchmark the index scan per se with the 0.9 fill factor, but 
the reduction of lossy pages was significant.


Regards,
Alexander Kuzmenkov

[1] 
https://www.postgresql.org/message-id/flat/251401bb-6f53-b957-4128-578ac22e8acf%40postgrespro.ru#251401bb-6f53-b957-4128-578ac22e8...@postgrespro.ru





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Push limit to sort through a subquery

2017-08-22 Thread Konstantin Knizhnik



On 18.08.2017 04:33, Robert Haas wrote:


It seems like a somewhat ad-hoc approach; it supposes that we can take 
any query produced by deparseSelectStmtForRel() and stick a LIMIT 
clause onto the very end and all will be well.  Maybe that's not a 
problematic assumption, not sure.  The grammar happens to allow both 
FOR UPDATE LIMIT n and LIMIT n FOR UPDATE even though only the latter 
syntax is documented.


--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



I am not absolutely sure that it is possible to append any query which 
can be constructed by postgres_fdw for foreign scan with "LIMIT n" clause.
But I also do not know example when it is not possible. As you have 
mentioned, "FOR UPDATE LIMIT n" is currently recognized by Postgres.


Can you suggest how to implement limit push down to FDW in better way?
Move deparseSelectStmtForRel() from postgresGetForeignPlan to 
postgresIterateForeignScan ?
It seems to be problematic because many information required by 
deparseSelectStmtForRel is not available in postgresIterateForeignScan.
In principle, it is possible to somehow propagate it here. But from my 
point of view it is not right approach...


IMHO there is some contradiction in Postgres optimizer that static 
information about limit is not taken in account at the planning stage 
and is actually used only during query execution,
when pass_down_bound() function is called to propagate knowledge about 
limit down through plan nodes. Certainly I understand that it gives more 
flexibility: we can use information from

previous steps of query execution which was not available at planning stage.

But pushing down limit at planning stage requires too much changes. And 
the proposed patch is very small and non-invasive. And in principle, it 
can be used not only postgres_fdw, but also in other FDW implementations 
to push down information about LIMIT.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Page Scan Mode in Hash Index

2017-08-22 Thread Ashutosh Sharma
On Tue, Aug 22, 2017 at 3:55 PM, Amit Kapila  wrote:
> On Tue, Aug 22, 2017 at 2:28 PM, Ashutosh Sharma  
> wrote:
>> On Sat, Aug 19, 2017 at 11:37 AM, Amit Kapila  
>> wrote:
>>> On Fri, Aug 11, 2017 at 6:51 PM, Ashutosh Sharma  
>>> wrote:
>
> 7.
> _hash_kill_items(IndexScanDesc scan)
> {
> ..
> + /*
> + * If page LSN differs it means that the page was modified since the
> + * last read. killedItems could be not valid so LP_DEAD hints apply-
> + * ing is not safe.
> + */
> + page = BufferGetPage(buf);
> + if (PageGetLSN(page) != so->currPos.lsn)
> + {
> + _hash_relbuf(rel, buf);
> + return;
> + }
> ..
> }
>
> How does this check cover the case of unlogged tables?

 Thanks for putting that point. It doesn't cover the case for unlogged
 tables. As suggested by you in one of your email in this mailing list, i am
 now not allowing vacuum to release lock on current page before acquiring
 lock on next page for unlogged tables. This will ensure that scan is always
 behind vacuum if they are running on the same bucket simultaneously.
 Therefore, there is danger in marking tuples as dead for unlogged pages 
 even
 if they are not having any lsn.

>>>
>>
>> Once again, Thank you for reviewing my patches.
>>
>>> In the last line, I guess you wanted to say "there is *no* danger
>>> ..."?
>>
>> Yes, i meant that because, it ensures that scan will always be following 
>> VACUUM.
>>
>> Today, while again thinking about this part of the patch
>>> (_hash_kill_items) it occurred to me that we can't rely on a pin on an
>>> overflow page to guarantee that it is not modified by Vacuum.
>>> Consider a case where vacuum started vacuuming the bucket before the
>>> scan and then in-between scan overtakes it.  Now, it is possible that
>>> even if a scan has a pin on a page (and no lock), vacuum might clean
>>> that page, if that happens, then we can't prevent the reuse of TID.
>>> What do you think?
>>>
>>
>> I think, you are talking about non-mvcc scan case, because in case of
>> mvcc scans, even if we have released both pin and lock on a page,
>> VACUUM can't remove tuples from a page if it is visible to some
>> concurrently running transactions (mvcc scan in our case).
>>
>
> I am talking about tuples that are marked as dead in heap. It has
> nothing to do with the visibility of tuple or type of scan (mvcc or
> non-mvcc).
>

Okay, I got your point now. I think, currently in _hash_kill_items(),
if an overflow page is pinned we do not check if it got modified since
the last read or
not. Hence, if the vacuum runs on an overflow page that is pinned and
also has some dead tuples in it then it could create a problem for
scan basically,
when scan would attempt to mark the killed items as dead. To get rid
of such problem, i think, even if an overflow page is pinned we should
check if it got
modified or not since the last read was performed on the page. If yes,
then do not allow scan to mark killed items as dead. Attached is the
newer version with these changes along with some other cosmetic
changes mentioned in your earlier email. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From cb724914ebb56c9b525e165b0117bfa70aac7692 Mon Sep 17 00:00:00 2001
From: ashu 
Date: Tue, 22 Aug 2017 18:53:47 +0530
Subject: [PATCH] Rewrite hash index scan to work page at a time.

Patch by Ashutosh Sharma 
---
 src/backend/access/hash/README   |  25 +-
 src/backend/access/hash/hash.c   | 146 ++--
 src/backend/access/hash/hashpage.c   |  10 +-
 src/backend/access/hash/hashsearch.c | 426 +++
 src/backend/access/hash/hashutil.c   |  72 +-
 src/include/access/hash.h|  55 -
 6 files changed, 547 insertions(+), 187 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index c8a0ec7..eef7d66 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -259,10 +259,11 @@ The reader algorithm is:
 -- then, per read request:
 	reacquire content lock on current page
 	step to next page if necessary (no chaining of content locks, but keep
- the pin on the primary bucket throughout the scan; we also maintain
- a pin on the page currently being scanned)
-	get tuple
-	release content lock
+ the pin on the primary bucket throughout the scan)
+save all the matching tuples from current index page into an items array
+release pin and content lock (but if it is primary bucket page retain
+it's pin till the end of scan)
+get tuple from an item array
 -- at scan shutdown:
 	release all pins still held
 
@@ -270,15 +271,13 @@ Holding the buffer pin on the primary bucket page for the whole scan prevents
 the reader's current-tuple pointer from being invalidated by splits or
 compactions.  (Of course, other buckets can still be split or compacted.)
 
-To keep concurrency re

Re: [HACKERS] Updating line length guidelines

2017-08-22 Thread Stephen Frost
Craig, all,

* Craig Ringer (cr...@2ndquadrant.com) wrote:
> On 21 August 2017 at 10:36, Michael Paquier 
> wrote:
> > On Mon, Aug 21, 2017 at 11:30 AM, Robert Haas 
> > wrote:
> > > On Sun, Aug 20, 2017 at 10:49 AM, Andres Freund 
> > wrote:
> > >> We currently still have the guideline that code should fit into an 80
> > >> character window. But an increasing amount of the code, and code
> > >> submissions, don't adhere to that (e.g. copy.c, which triggered me to
> > >> write this email). And I mean outside of accepted "exceptions" like
> > >> error messages.  And there's less need for such a relatively tight limit
> > >> these days.  Perhaps we should up the guideline to 90 or 100 chars?
> > >
> > > Or maybe we should go the other way and get a little more rigorous
> > > about enforcing that limit.  I realize 80 has nothing on its side but
> > > tradition, but I'm a traditionalist -- and I still do use 80 character
> > > windows a lot of the time.
> >
> > +1. FWIW, I find the non-truncation of some error messages a bit
> > annoying when reading them. And having a 80-character makes things
> > readable. For long URLs this enforcement makes little sense as those
> > strings cannot be split, but more effort could be done.
> > 
> 
> The main argument for not wrapping error messages is grep-ableness.

One option for this would be to move the long error messages into one
place and have a placeholder instead that's actually in-line with the
code, allowing one to grep for the message to pull the placeholder and
then it's a pretty quick cscope to find where it's used (or another
grep, I suppose).

> Personally I'd be fine with 100 or so, but when I'm using buffers side by
> side, or when I'm working in poor conditions where I've set my terminal to
> "giant old people text" sizes, I remember the advantages of a width limit.

I wouldn't be against 100 either really, but I don't really feel all
that strongly either way.  Then again, there is the back-patching pain
which would ensue to consider..

Thanks!

Stephen


signature.asc
Description: Digital signature


Fwd: [HACKERS] [PATCH] Pattern based listeners for asynchronous messaging (LISTEN/NOTIFY)

2017-08-22 Thread Markus Sintonen
I also encountered this when I built it with different configuration. I
attached updated patch with the correct number of arguments to
'similar_escape'. I also added preliminary documentation to the patch.
(Unfortunately unable to currently compile the documentation for testing
purpose on Windows probably because of commit https://github.com/
postgres/postgres/commit/510074f9f0131a04322d6a3d2a51c87e6db243f9. I
followed https://www.postgresql.org/docs/devel/static/install-
windows-full.html#idm45412738673840.)

What do you think about the syntax? There was a suggestion to specify type
of the pattern (eg ltree extension) but to me this feels like a overkill.
One option here would be eg:
LISTEN PATTERN 'foo%' TYPE 'similar'
LISTEN PATTERN 'foo.*' TYPE 'ltree'
... and so on

BR
-Markus

2017-08-19 2:36 GMT+03:00 Thomas Munro :

> On Tue, Aug 1, 2017 at 8:13 AM, Markus Sintonen
>  wrote:
> > This patch adds an ability to use patterns in LISTEN commands. Patch uses
> > 'SIMILAR TO' patterns for matching NOTIFY channel names
> > (https://www.postgresql.org/docs/9.0/static/functions-matchi
> ng.html#FUNCTIONS-SIMILARTO-REGEXP).
> >
> > This patch is related to old discussion in
> > https://www.postgresql.org/message-id/52693fc5.7070...@gmail.com. This
> > discussion contains the reasoning behind the pattern based matching of
> the
> > channel names.
>
> Nice idea.
>
> The "async" regression test consistently crashes on my FreeBSD box
> when built with -O2.  It doesn't crash on another system I tried, and
> I think that's just luck, because this:
>
> +   /* convert to regex pattern */
> +   datum = DirectFunctionCall1(similar_escape,
> CStringGetTextDatum(pattern));
>
> ... is calling a function that takes two arguments, but passing only
> one.  The second argument is random junk, so similar_escape bombs when
> it does this:
>
> esc_text = PG_GETARG_TEXT_PP(1);
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>


listen-pattern.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

2017-08-22 Thread Sokolov Yura
Simplified a bit and more commented patch version is in attach.

Algorithm were switched to linear probing, it makes code simpler and
clearer.
Flag usages were toggled: now it indicates that hash table were built,
it also makes code clearer.
XidInXip and some other functions got comments.

-- 
With regards,
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company
From 5851f01de6b793be5e682c67db8db339667be92d Mon Sep 17 00:00:00 2001
From: Sokolov Yura 
Date: Mon, 10 Jul 2017 12:34:48 +
Subject: [PATCH] Make hash table for xip in XidInMVCCSnapshot

When lot of concurrent transactions attempts to update single
row, then linear scan through running list in XidInMVCCSnapshot
became noticebale bottleneck.

With this change, hash table is built on first search of xid in
snapshot->xip and snapshot->subxip arrays.

If size of array is smaller than 60, then linear scan is still
used, cause there is no much benefits from building hash then.
(at least on Intel Xeon).
---
 src/backend/storage/ipc/procarray.c | 67 +--
 src/backend/utils/time/snapmgr.c| 24 ++
 src/backend/utils/time/tqual.c  | 91 -
 src/include/utils/snapshot.h| 11 +
 4 files changed, 151 insertions(+), 42 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index a7e8cf2d43..2c7e065342 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1469,6 +1469,54 @@ GetMaxSnapshotSubxidCount(void)
 }
 
 /*
+ * ExtendXipSizeForHash - calculate xip array size with space for hash table.
+ *
+ * Hash table should be at least twice larger than array to not depend on
+ * cleverness of algorithm.
+ *
+ * But if xipcnt < SnapshotMinHash, then no need in hash-table at all.
+ */
+int
+ExtendXipSizeForHash(int xipcnt, uint8* plog)
+{
+	int size;
+	uint8 log = 0;
+
+	size = xipcnt;
+	if (xipcnt >= SnapshotMinHash)
+	{
+		log = 1;
+		while (xipcnt) {
+			log++;
+			xipcnt >>= 1;
+		}
+		size += 1xip == NULL)
-			ereport(ERROR,
-	(errcode(ERRCODE_OUT_OF_MEMORY),
-	 errmsg("out of memory")));
-		Assert(snapshot->subxip == NULL);
-		snapshot->subxip = (TransactionId *)
-			malloc(GetMaxSnapshotSubxidCount() * sizeof(TransactionId));
-		if (snapshot->subxip == NULL)
-			ereport(ERROR,
-	(errcode(ERRCODE_OUT_OF_MEMORY),
-	 errmsg("out of memory")));
+		snapshot->xip = AllocateXip(GetMaxSnapshotXidCount(),
+&snapshot->xhlog);
+		snapshot->subxip = AllocateXip(GetMaxSnapshotSubxidCount(),
+&snapshot->subxhlog);
 	}
 
 	/*
@@ -1757,6 +1796,8 @@ GetSnapshotData(Snapshot snapshot)
 	snapshot->active_count = 0;
 	snapshot->regd_count = 0;
 	snapshot->copied = false;
+	snapshot->xhlog &= ~SnapshotHashBuilt;
+	snapshot->subxhlog &= ~SnapshotHashBuilt;
 
 	if (old_snapshot_threshold < 0)
 	{
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 08a08c8e8f..7c3fe7563e 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -662,13 +662,16 @@ CopySnapshot(Snapshot snapshot)
 	Snapshot	newsnap;
 	Size		subxipoff;
 	Size		size;
+	int			xcnt, subxcnt;
+	uint8		xhlog, subxhlog;
 
 	Assert(snapshot != InvalidSnapshot);
 
+	xcnt = ExtendXipSizeForHash(snapshot->xcnt, &xhlog);
+	subxcnt = ExtendXipSizeForHash(snapshot->subxcnt, &subxhlog);
 	/* We allocate any XID arrays needed in the same palloc block. */
-	size = subxipoff = sizeof(SnapshotData) +
-		snapshot->xcnt * sizeof(TransactionId);
-	if (snapshot->subxcnt > 0)
+	size = subxipoff = sizeof(SnapshotData) + xcnt * sizeof(TransactionId);
+	if (subxcnt > 0)
 		size += snapshot->subxcnt * sizeof(TransactionId);
 
 	newsnap = (Snapshot) MemoryContextAlloc(TopTransactionContext, size);
@@ -677,6 +680,8 @@ CopySnapshot(Snapshot snapshot)
 	newsnap->regd_count = 0;
 	newsnap->active_count = 0;
 	newsnap->copied = true;
+	newsnap->xhlog = xhlog;
+	newsnap->subxhlog = subxhlog;
 
 	/* setup XID array */
 	if (snapshot->xcnt > 0)
@@ -2124,16 +2129,18 @@ RestoreSnapshot(char *start_address)
 	Size		size;
 	Snapshot	snapshot;
 	TransactionId *serialized_xids;
+	int			xcnt, subxcnt;
+	uint8		xhlog, subxhlog;
 
 	memcpy(&serialized_snapshot, start_address,
 		   sizeof(SerializedSnapshotData));
 	serialized_xids = (TransactionId *)
 		(start_address + sizeof(SerializedSnapshotData));
 
+	xcnt = ExtendXipSizeForHash(serialized_snapshot.xcnt, &xhlog);
+	subxcnt = ExtendXipSizeForHash(serialized_snapshot.subxcnt, &subxhlog);
 	/* We allocate any XID arrays needed in the same palloc block. */
-	size = sizeof(SnapshotData)
-		+ serialized_snapshot.xcnt * sizeof(TransactionId)
-		+ serialized_snapshot.subxcnt * sizeof(TransactionId);
+	size = sizeof(SnapshotData) + (xcnt + subxcnt) * sizeof(TransactionId);
 
 	/* Copy all required fields */
 	sna

Re: [HACKERS] Hash Functions

2017-08-22 Thread amul sul
On Fri, Aug 18, 2017 at 11:01 PM, Robert Haas  wrote:

> On Fri, Aug 18, 2017 at 1:12 PM, amul sul  wrote:
> > I have a small query,  what if I want a cache entry with extended hash
> > function instead standard one, I might require that while adding
> > hash_array_extended function? Do you think we need to extend
> > lookup_type_cache() as well?
>
> Hmm, I thought you had changed the hash partitioning stuff so that it
> didn't rely on lookup_type_cache().  You have to look up the function
> using the opclass provided in the partition key definition;
> lookup_type_cache() will give you the default one for the datatype.
> Maybe just use get_opfamily_proc?
>
>
Yes, we can do that for
​the ​
partitioning code, but my concern is a little bit
different.  I apologize, I wasn't clear enough.

I am trying to extend hash_array & hash_range function. The hash_array and
hash_range function calculates hash by using the respective hash function
for
the given argument type (i.e. array/range element type), and those hash
functions are made available in the TypeCacheEntry via lookup_type_cache().
But
in the hash_array & hash_range extended version requires a respective
extended
hash function for those element type.

I have added hash_array_extended & hash_range_extended function in the
attached
patch 0001, which maintains a local copy of TypeCacheEntry with extended
hash
functions. But I am a little bit skeptic about this logic, any
​ ​
advice/suggestions will be
greatly appreciated.

The logic in the rest of the extended hash functions is same as the standard
one.

​Attaching patch 0002​ for the reviewer's testing.

​Regards,
Amul​


0001-add-optional-second-hash-proc-v2-wip.patch
Description: Binary data


0002-test-Hash_functions.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Default Partition for Range

2017-08-22 Thread Beena Emerson
On Tue, Aug 22, 2017 at 4:20 PM, Beena Emerson  wrote:
> Hi Jeevan,
>
> On Tue, Aug 22, 2017 at 7:53 AM, Jeevan Ladhe
>  wrote:
>>
>>
>> 4.
>>  static List *
>> -get_qual_for_range(PartitionKey key, PartitionBoundSpec *spec)
>> +get_qual_for_range(Relation parent, PartitionBoundSpec *spec,
>> +  bool for_default)
>>  {
>>
>> The addition and the way flag ‘for_default’ is being used is very confusing.
>> At this moment I am not able to think about a alternate solution to the
>> way you have used this flag. But will try and see if I can come up with
>> an alternate approach.
>
> Well, we need to skip the get_range_nulltest while collecting the qual
> of other partition to make one for default. We could skip this flag
> and remove the NullTest from the qual returned for each partition but
> I am not sure if thats a good way to go about it.
>
>

Please find attached a patch which removes the for_default flag from
the get_qual_for_range function. This patch is just to show an idea
and should be applied over my earlier patch. There could be a better
way to do this. Let me know your opinion.


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


remove_default_flag.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Default Partition for Range

2017-08-22 Thread Beena Emerson
Hi Jeevan,

On Tue, Aug 22, 2017 at 7:53 AM, Jeevan Ladhe
 wrote:
>
> Hi Beena,
>
> On Thu, Aug 17, 2017 at 4:59 PM, Beena Emerson 
> wrote:
>>
>> PFA the patch rebased over v25 patches of default list partition [1]
>
>
> Thanks for rebasing.
>
> Range partition review:

Thank you for your review.

>
> 3.
> @@ -2396,6 +2456,8 @@ make_one_range_bound(PartitionKey key, int index, List
> *datums, bool lower)
> ListCell   *lc;
> int i;
>
> +   Assert(datums != NULL);
> +
> bound = (PartitionRangeBound *) palloc0(sizeof(PartitionRangeBound));
> bound->index = index;
> bound->datums = (Datum *) palloc0(key->partnatts * sizeof(Datum));
>
> I am not really convinced, why should we have above Assert.

The function should never be called for default partition where datums = NULL.

>
> 4.
>  static List *
> -get_qual_for_range(PartitionKey key, PartitionBoundSpec *spec)
> +get_qual_for_range(Relation parent, PartitionBoundSpec *spec,
> +  bool for_default)
>  {
>
> The addition and the way flag ‘for_default’ is being used is very confusing.
> At this moment I am not able to think about a alternate solution to the
> way you have used this flag. But will try and see if I can come up with
> an alternate approach.

Well, we need to skip the get_range_nulltest while collecting the qual
of other partition to make one for default. We could skip this flag
and remove the NullTest from the qual returned for each partition but
I am not sure if thats a good way to go about it.


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Scan Mode in Hash Index

2017-08-22 Thread Amit Kapila
On Tue, Aug 22, 2017 at 2:28 PM, Ashutosh Sharma  wrote:
> On Sat, Aug 19, 2017 at 11:37 AM, Amit Kapila  wrote:
>> On Fri, Aug 11, 2017 at 6:51 PM, Ashutosh Sharma  
>> wrote:

 7.
 _hash_kill_items(IndexScanDesc scan)
 {
 ..
 + /*
 + * If page LSN differs it means that the page was modified since the
 + * last read. killedItems could be not valid so LP_DEAD hints apply-
 + * ing is not safe.
 + */
 + page = BufferGetPage(buf);
 + if (PageGetLSN(page) != so->currPos.lsn)
 + {
 + _hash_relbuf(rel, buf);
 + return;
 + }
 ..
 }

 How does this check cover the case of unlogged tables?
>>>
>>> Thanks for putting that point. It doesn't cover the case for unlogged
>>> tables. As suggested by you in one of your email in this mailing list, i am
>>> now not allowing vacuum to release lock on current page before acquiring
>>> lock on next page for unlogged tables. This will ensure that scan is always
>>> behind vacuum if they are running on the same bucket simultaneously.
>>> Therefore, there is danger in marking tuples as dead for unlogged pages even
>>> if they are not having any lsn.
>>>
>>
>
> Once again, Thank you for reviewing my patches.
>
>> In the last line, I guess you wanted to say "there is *no* danger
>> ..."?
>
> Yes, i meant that because, it ensures that scan will always be following 
> VACUUM.
>
> Today, while again thinking about this part of the patch
>> (_hash_kill_items) it occurred to me that we can't rely on a pin on an
>> overflow page to guarantee that it is not modified by Vacuum.
>> Consider a case where vacuum started vacuuming the bucket before the
>> scan and then in-between scan overtakes it.  Now, it is possible that
>> even if a scan has a pin on a page (and no lock), vacuum might clean
>> that page, if that happens, then we can't prevent the reuse of TID.
>> What do you think?
>>
>
> I think, you are talking about non-mvcc scan case, because in case of
> mvcc scans, even if we have released both pin and lock on a page,
> VACUUM can't remove tuples from a page if it is visible to some
> concurrently running transactions (mvcc scan in our case).
>

I am talking about tuples that are marked as dead in heap. It has
nothing to do with the visibility of tuple or type of scan (mvcc or
non-mvcc).



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: psql command \graw

2017-08-22 Thread Pavel Stehule
2017-08-22 10:36 GMT+02:00 Aleksander Alekseev :

> Hi Pavel,
>
> > I am thinking about printing graphs in psql (mainly some histograms). I
> > found so gnuplot is able do very good graphs in console. The one issue is
> > user friendly (with less steps) generating data in good format for this
> > application.
> >
> > One my idea is introduction new simple output format and execution
> command
> > with result in this format.
> >
> > It should work something like
> >
> > \setenv GNUPLOT_OPTION '..'
> >
> > SELECT * FROM data
> >
> > \graw | gnuplot ...
> >
> > It can be used for any other applications R, ggplot, ..
> >
> > Ideas, comments?
>
> Sounds cool. On the other hand, I think it's kind of too domain specific
> task. So I wonder whether it could be generalized somehow so anyone
> could write an extension that would export data in any format in a
> friendly way.
>
> For instance:
>
> create extension export_to_gnuplot;
> select * from data
> \export_to_gnuplot | gnuplot ...
>

you are mixing server side and client side code. Maybe some time we can
write psql custom commands.

can be nice to have more pipes in series

so you can do

select * from data
\graw | histogram | gnuplot ...





>
> --
> Best regards,
> Aleksander Alekseev
>


Re: [HACKERS] proposal: psql command \graw

2017-08-22 Thread Pavel Stehule
2017-08-22 10:46 GMT+02:00 Fabien COELHO :

>
> Hello Pavel,
>
> One my idea is introduction new simple output format and execution command
>> with result in this format.
>>
>> It should work something like
>>
>> \setenv GNUPLOT_OPTION '..'
>> SELECT * FROM data
>> \graw | gnuplot ...
>>
>
> I understand that it is kind of a shortcut for:
>
>   \pset fieldsep ' '
>   \pset format unaligned
>   \pset tuples_only on
>   -- possibly other settings...
>   SELECT * FROM data \g | gnuplot '...'


> And then you have to persuade gnuplot to take its data from stdin?


There are some methods

https://stackoverflow.com/questions/17543386/pipe-plot-data-to-gnuplot-script/17576571#17576571



>
> --
> Fabien.
>


Re: [HACKERS] Page Scan Mode in Hash Index

2017-08-22 Thread Ashutosh Sharma
On Sat, Aug 19, 2017 at 11:37 AM, Amit Kapila  wrote:
> On Fri, Aug 11, 2017 at 6:51 PM, Ashutosh Sharma  
> wrote:
>>>
>>> 7.
>>> _hash_kill_items(IndexScanDesc scan)
>>> {
>>> ..
>>> + /*
>>> + * If page LSN differs it means that the page was modified since the
>>> + * last read. killedItems could be not valid so LP_DEAD hints apply-
>>> + * ing is not safe.
>>> + */
>>> + page = BufferGetPage(buf);
>>> + if (PageGetLSN(page) != so->currPos.lsn)
>>> + {
>>> + _hash_relbuf(rel, buf);
>>> + return;
>>> + }
>>> ..
>>> }
>>>
>>> How does this check cover the case of unlogged tables?
>>
>> Thanks for putting that point. It doesn't cover the case for unlogged
>> tables. As suggested by you in one of your email in this mailing list, i am
>> now not allowing vacuum to release lock on current page before acquiring
>> lock on next page for unlogged tables. This will ensure that scan is always
>> behind vacuum if they are running on the same bucket simultaneously.
>> Therefore, there is danger in marking tuples as dead for unlogged pages even
>> if they are not having any lsn.
>>
>

Once again, Thank you for reviewing my patches.

> In the last line, I guess you wanted to say "there is *no* danger
> ..."?

Yes, i meant that because, it ensures that scan will always be following VACUUM.

Today, while again thinking about this part of the patch
> (_hash_kill_items) it occurred to me that we can't rely on a pin on an
> overflow page to guarantee that it is not modified by Vacuum.
> Consider a case where vacuum started vacuuming the bucket before the
> scan and then in-between scan overtakes it.  Now, it is possible that
> even if a scan has a pin on a page (and no lock), vacuum might clean
> that page, if that happens, then we can't prevent the reuse of TID.
> What do you think?
>

I think, you are talking about non-mvcc scan case, because in case of
mvcc scans, even if we have released both pin and lock on a page,
VACUUM can't remove tuples from a page if it is visible to some
concurrently running transactions (mvcc scan in our case). So, i don't
think it can happen in case of MVCC scans however it can happen for
non-mvcc scans and for that to handle i think, it is better that we
drop an idea of allowing scan to overtake
VACUUM (done by
0003-Improve-locking-startegy-during-VACUUM-in-Hash-Index_v5.patch).

However, B-Tree has handled this in _bt_drop_lock_and_maybe_pin()
where it releases both pin and lock on a page if it is MVCC snapshot
else just
releases lock on the page.

> Few other comments:
>
> 1.
> + * On failure exit (no more tuples), we return FALSE with pin
> + * pin held on bucket page but no pins or locks held on overflow
> + * page.
>   */
>  bool
>  _hash_next(IndexScanDesc scan, ScanDirection dir)
>
> In the above part of comment 'pin' is used twice.

OKay, I will remove one extra pin (from comment) in my next version of patch.

>
> 0003-Improve-locking-startegy-during-VACUUM-in-Hash-Index_v5
>
> 2.
> - * not at all by the rearrangement we are performing here.  To prevent
> - * any concurrent scan to cross the squeeze scan we use lock chaining
> - * similar to hasbucketcleanup.  Refer comments atop hashbucketcleanup.
> + * not at all by the rearrangement we are performing here.
>
> In _hash_squeezebucket, we still use lock chaining, so removing the
> above comment doesn't seem like a good idea.  I think you should copy
> part of a comment from hasbucketcleanup starting from "There can't be
> any concurrent .."

Okay, I will correct it in my next version of patch.

>
> 3.
> _hash_freeovflpage()
> {
> ..
>
> * Concurrency issues are avoided by using lock chaining as
> * described atop hashbucketcleanup.
> ..
> }
>
> After fixing #2, you need to change the function name in above comment.
>

Sure, I will correct that in my next version of patch.

With Regards,
Ashutosh Sharma.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: psql command \graw

2017-08-22 Thread Fabien COELHO


Hello Pavel,


One my idea is introduction new simple output format and execution command
with result in this format.

It should work something like

\setenv GNUPLOT_OPTION '..'
SELECT * FROM data
\graw | gnuplot ...


I understand that it is kind of a shortcut for:

  \pset fieldsep ' '
  \pset format unaligned
  \pset tuples_only on
  -- possibly other settings...
  SELECT * FROM data \g | gnuplot '...'

And then you have to persuade gnuplot to take its data from stdin?

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updating line length guidelines

2017-08-22 Thread Alvaro Herrera
Andres Freund wrote:

> We currently still have the guideline that code should fit into an 80
> character window. But an increasing amount of the code, and code
> submissions, don't adhere to that (e.g. copy.c, which triggered me to
> write this email). And I mean outside of accepted "exceptions" like
> error messages.  And there's less need for such a relatively tight limit
> these days.  Perhaps we should up the guideline to 90 or 100 chars?

I'd rather keep the existing limit, and enforce it more strongly.

With the recent pgindent changes, it's no longer possible to save a few
chars by moving the first (or any) argument of a function to a separate
line; in some cases, we're now forced to make function arguments shorter
in order to meet the 80-char standard.  In many cases that means adding
a variable declaration with a short name and a separate assignment.
That seems acceptable to me.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: psql command \graw

2017-08-22 Thread Aleksander Alekseev
Hi Pavel,

> I am thinking about printing graphs in psql (mainly some histograms). I
> found so gnuplot is able do very good graphs in console. The one issue is
> user friendly (with less steps) generating data in good format for this
> application.
> 
> One my idea is introduction new simple output format and execution command
> with result in this format.
> 
> It should work something like
> 
> \setenv GNUPLOT_OPTION '..'
> 
> SELECT * FROM data
> 
> \graw | gnuplot ...
> 
> It can be used for any other applications R, ggplot, ..
> 
> Ideas, comments?

Sounds cool. On the other hand, I think it's kind of too domain specific
task. So I wonder whether it could be generalized somehow so anyone
could write an extension that would export data in any format in a
friendly way.

For instance:

create extension export_to_gnuplot;
select * from data
\export_to_gnuplot | gnuplot ...

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-08-22 Thread Fabien COELHO



Why not allow -I as a short option for --custom-initialize?


Other options for similar purpose such as --foreign-keys also have
only a long option. Since I think --custom-initialize option is in the
same context as other options I didn't add short option to it for now.
Because the options name is found by a prefix searching we can use a
short name --cu for now.


Hmmm. I like short options:-)


I'm inclined to remove the restriction so that we can specify
--foreign-keys, --no-vacuum and --custom-initialize at the same time.


Ok. I favor that as well.


I think a list of char would be better here rather than a single
malloced string to remove particular initialization step easily.
Thought?


My thought is not to bother with a list of char.

To remove a char from a string, I suggest to allow ' ' to stand for 
nothing and be skipped, so that substituting a letter by space would 
simply remove an initialization phase.


For adding, realloc & append.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] proposal: psql command \graw

2017-08-22 Thread Pavel Stehule
Hi

I am thinking about printing graphs in psql (mainly some histograms). I
found so gnuplot is able do very good graphs in console. The one issue is
user friendly (with less steps) generating data in good format for this
application.

One my idea is introduction new simple output format and execution command
with result in this format.

It should work something like

\setenv GNUPLOT_OPTION '..'

SELECT * FROM data

\graw | gnuplot ...

It can be used for any other applications R, ggplot, ..

Ideas, comments?

Regards

Pavel


Re: [HACKERS] why not parallel seq scan for slow functions

2017-08-22 Thread Simon Riggs
On 21 August 2017 at 11:42, Amit Kapila  wrote:

>> or of 2)
>> treating cost = speed, so we actually reduce the cost of a parallel
>> plan rather than increasing it so it is more likely to be picked.
>>
>
> Yeah, this is what is being currently followed for costing of parallel
> plans and this patch also tries to follow the same.

OK, I understand this better now, thanks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers