Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-17 Thread Stephen Frost
Masahiko, Michael,

* Masahiko Sawada (sawada.m...@gmail.com) wrote:
> > This is beginning to shape.
> 
> Sorry, I missed lots of typo in the last patch. All comments from you
> are incorporated into the attached latest patch and I've checked it
> whether there is other typos. Please review it.

I've taken an initial look through the patch and it looks pretty
reasonable.  I need to go over it in more detail and work through
testing it myself next.

I expect to commit this (with perhaps some minor tweaks primairly around
punctuation/wording), barring any issues found, on Wednesday or Thursday
of this week.

Noah,

I'll provide an update regarding this open item by COB, Thursday July
20th.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] hash index on unlogged tables doesn't behave as expected

2017-07-17 Thread Amit Kapila
On Mon, Jul 17, 2017 at 10:21 PM, Michael Paquier
 wrote:
> On Mon, Jul 17, 2017 at 6:02 PM, Robert Haas  wrote:
>> It seems to me that this area might benefit from a broader rethink.
>> It's not very nice to impose a restriction that init forks can only be
>> constructed using log_newpage(); on the other hand, it is also not
>> very nice that Amit's patch needs to recapitulate the mysterious
>> incantation used by XLogReadBufferForRedoExtended in several more
>> places.  The basic problem here is that it would be really easy for
>> the next person who adds an index AM to make the exact same mistake
>> that Amit made here and that I failed to spot during code review.  It
>> would be nice to come up with some more generic solution to the
>> problem rather than relying on each AM to do insert this
>> FlushOneBuffer() incantation in each place where it's needed.  I think
>> there are probably several ways to do that; one idea might be to flush
>> all init-fork buffers in bulk at the end of recovery.
>
> Things are centralized in XLogReadBufferForRedoExtended() for init
> fork flushes, which is not something bad in itself as it is the unique
> place doing this work normally for other AMs. And I have to admit that
> the current coding for hash index WAL has the advantage to create less
> WAL quantity as the bitmap and metadata pages get initialized using
> the data of the records themselves. One idea I can think of would be
> to create a README in src/backend/access for index AMs that summarizes
> a basic set of recommendations for each routine used.
>

We already have quite a decent amount of information about indexes in
our docs [1][2].  We might want to extend that as well.

>> However, it doesn't really seem urgent to tinker with that; while I
>> think the fact that existing AMs use log_newpage() to populate the
>> init fork is mostly just luck, it might well be 10 or 20 years before
>> somebody adds another AM that happens to use any other technique.
>> Moreover, at the moment, we're trying to get a release out the door,
>> and to me that argues for keeping structural changes to a minimum.
>> Amit's patch seems like a pretty surgical fix to this problem with
>> minimal chance of breaking anything new, and that seems like the right
>> kind of fix for July.  So I plan to commit what Amit proposed (with a
>> few grammar corrections).
>

Thanks.  Do you have any suggestion for back-branches?  As of now, it
fails badly with below kind of error:

test=> SELECT * FROM t_u_hash;
ERROR:  could not open file "base/16384/16392": No such file or directory

It is explained in another thread [3] where it has been found that the
reason for such an error is that hash indexes are not WAL logged prior
to 10.  Now, we can claim that we don't recommend hash indexes to be
used prior to 10 in production, so such an error is okay even if there
is no crash has happened in the system.

[1] - https://www.postgresql.org/docs/devel/static/indexam.html
[2] - https://www.postgresql.org/docs/devel/static/index-functions.html
[3] - https://www.postgresql.org/message-id/20170630005634.GA4448%40momjian.us

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Oddity in error handling of constraint violation in ExecConstraints for partitioned tables

2017-07-17 Thread Robert Haas
On Mon, Jul 10, 2017 at 5:44 PM, Robert Haas  wrote:
> The posted patches look OK to me.  Barring developments, I will commit
> them on 2017-07-17, or send another update by then.

Committed them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2017-07-17 Thread David Fetter
On Thu, Jun 01, 2017 at 10:05:09PM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Tue, May 30, 2017 at 8:55 PM, David G. Johnston
> >  wrote:
> >>> Having --no-comments seems generally useful to me, in any case.
> 
> >> It smacks of being excessive to me.
> 
> > It sounds perfectly sensible to me.  It's not exactly an elegant
> > solution to the original problem, but it's a reasonable switch on
> > its own merits.
> 
> I dunno.  What's the actual use-case, other than as a bad workaround
> to a problem we should fix a different way?

The one I run into frequently is in a proprietary fork, RDS Postgres.
It'll happily dump out COMMENT ON EXTENSION plpgsq IS ...
which is great as far as it goes, but errors out when you try to
reload it.

While bending over backwards to support proprietary forks strikes me
as a terrible idea, I'd like to enable pg_dump to produce and consume
ToCs just as pg_restore does with its -l/-L options.  This would
provide the finest possible grain.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-17 Thread Mark Dilger

> On Jul 17, 2017, at 3:12 PM, Tom Lane  wrote:
> 
> Mark Dilger  writes:
>>> On Jul 17, 2017, at 2:14 PM, Tom Lane  wrote:
>>> Well, if you or somebody is willing to do the legwork, I'd be on board
>>> with a plan that says that every 68 years we redefine the origin of
>>> abstime.  I imagine it could be done so that currently-stored abstime
>>> values retain their present meaning as long as they're not too old.
>>> For example the initial change would toss abstimes before 1970 overboard,
>>> repurposing that range of values as being 2038-2106, but values between
>>> 1970 and 2038 still mean the same as they do today.  If anybody still
>>> cares in circa 2085, we toss 1970-2038 overboard and move the origin
>>> again, lather rinse repeat.
> 
>> Assuming other members of the community would not object to such
>> a plan, I'd be willing to step up to that plate.
> 
> So, thinking about how that would actually work ... the thing to do in
> order to preserve existing on-disk values is to alternate between signed
> and unsigned interpretations of abstimes.  That is, right now, abstime
> is signed with origin 1970.  The conversion I'm arguing we should make
> real soon now is to unsigned with origin 1970.  If the project lives
> so long, in circa 70 years we'd switch it to signed with origin 2106.
> Yadda yadda.
> 
> Now, this should mostly work conveniently, except that we have
> +/-infinity (NOEND_ABSTIME/NOSTART_ABSTIME) to deal with.  Those
> are nicely positioned at the ends of the signed range so that
> abstime_cmp_internal() doesn't need to treat them specially.
> In an unsigned world they'd be smack in the middle of the range.
> We could certainly teach abstime_cmp_internal to special-case them
> and deliver logically correct results, but there's a bigger problem:
> those will correspond to two seconds in January 2038 that will need
> to be representable when the time comes.  Maybe we can make that
> work by defining the values past 2038 as being two seconds less than
> you might otherwise expect, but I bet it's gonna be messy.  It might
> be saner to just desupport +/-infinity for abstime.
> 
> The same goes for INVALID_ABSTIME, which doesn't have any clear
> use-case that I know of, and is certainly not documented.

I use the abstime type in some catalog tables, and if I allowed those
fields to have something equivalent to NULL, which I do not, I would need
INVALID_ABSTIME, since NULL is not allowed in the constant header of a
catalog table.

I wonder if old versions of postgres had catalog tables with abstime used
in this way?  Other than that, I can't think of a reason to use INVALID_ABSTIME.

mark




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-17 Thread Enrique Meneses
There is a generic definition for any array added as part of
https://commitfest.postgresql.org/10/708/ (it may be the reason for the
duplicate error). I am not sure what your change is but I would review the
above just in case. There is also a defect with a misleading error that is
still being triggered for UUID arrays.

Enrique

On Mon, Jul 17, 2017 at 4:25 PM Mark Rofail  wrote:

> On Wed, Jul 12, 2017 at 12:53 AM, Alvaro Herrera  > wrote:
>>
>> We have one opclass for each type combination -- int4 to int2, int4 to
>> int4, int4 to int8, etc.  You just need to add the new strategy to all
>> the opclasses.
>
>
>  I tried this approach by manually declaring the operator multiple of
> times in pg_amop.h (src/include/catalog/pg_amop.h)
>
> so instead of the polymorphic declaration
> DATA(insert ( 2745   2277 2283 5 s 6108 2742 0 )); /* anyarray @>>
> anyelem */
>
> multiple declarations were used, for example for int4[] :
> DATA(insert ( 2745   1007 20 5 s 6108 2742 0 )); /* int4[] @>> int8 */
> DATA(insert ( 2745   1007 23 5 s 6108 2742 0 )); /* int4[] @>> int4 */
> DATA(insert ( 2745   1007 21 5 s 6108 2742 0 )); /* int4[] @>> int2 */
> DATA(insert ( 2745   1007 1700 5 s 6108 2742 0 ));/* int4[] @>> numeric */
>
> However, make check produced:
> could not create unique index "pg_amop_opr_fam_index"
> Key (amopopr, amoppurpose, amopfamily)=(6108, s, 2745) is duplicated.
>
> Am I implementing this the wrong way or do we need to look for another
> approach?
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Pluggable storage

2017-07-17 Thread Peter Geoghegan
On Mon, Jul 17, 2017 at 1:24 PM, Alexander Korotkov
 wrote:
> It's probably depends on particular storage (once we have pluggable
> storages).  Some storages would have additional level of indirection while
> others wouldn't.

Agreed. Like kill_prior_tuple, it's an optional capability, and where
implemented is implemented in a fairly consistent way.

> But even if unique index contain no true duplicates, it's
> still possible that true delete happen.  Then we still have to delete tuple
> even from unique index.

I think I agree. I've been looking over the ARIES paper [1] again
today. They say this:

"For index updates, in the interest of increasing concurrency, we do
not want to prevent the space released by one transaction from being
consumed by another before the commit of the first transaction."

You can literally reclaim space from an index tuple deletion
*immediately* with their design, which matters because you want to
reclaim space as early as possible, before a page spit is needed.
Obviously they understand how important this is.

This might not work so well with an MVCC system, where there are no
2PL predicate locks. You need to keep a "ghost record", even for
non-unique indexes, and the deletion can only happen when the xact
commits. Remember, the block number cannot be used to see if there was
changes against the page, unlike the heap, because you have to worry
about page splits and page merges/deletion. UNDO is entirely logical
for indexes for this reason. (This is why UNDO does not actually undo
page splits, relation extension, etc. Only REDO/WAL always works at
the level of individual pages in all cases. UNDO for MVCC is not as
different to our design as I once thought.).

The reason I want to at least start with unique indexes is because you
need a TID to make non-unique/secondary indexes have unique keys
(unique keys are always needed if retail index tuple insertion is
always supported). For unique indexes, you really can do an update in
the index (see my design below for one example of how that can work),
but I think you need something more like a deletion followed by an
insertion for non-unique indexes, because there the physical/heap TID
changed, and that's part of the key, and that might belong on a
different page. You therefore haven't really fixed the problem with
secondary indexes sometimes needing new index tuples even though user
visible attributes weren't updated.

You haven't fixed the problem with secondary index, unless, of course,
all secondary indexes have logical pointers to begin with, such as the
PK value. Then you only need to "insert and delete, not update" when
the PK value is updated or when a secondary index needs a new index
tuple with distinct user visible attribute values to the previous
version's -- you fix the secondary index problem. And, while your
"version chain overflow indirection" structure is basically something
that lives outside the heap, it is still only needed for one index,
and not all of them.

This new indirection structure is a really nice target for pruning,
because you can prune physical TIDs that no possible snapshot could
use, unlike with the heap, where EvalPlanQual() could make any heap
tuple visible to snapshots at or after the minimal snapshot horizon
implied by RecentGlobalXmin. And, because index scans on any index can
prune for everyone.

You could also do "true index deletes", as you suggest, but you'd need
to have ghost records there too, and you'd need an asynchronous
cleanup process to do the cleanup when the deleting xact committed.
I'm not sure if it's worth doing that eagerly. It may or may not be
better to hope for kill_prior_tuple to do the job for us. Not sure
where this leaves index-only scans on secondary indexes..."true index
deletes" might be justified by making index only scans work more often
in general, especially for secondary indexes with logical pointers.

I'm starting to think that you were right all along about indirect
indexes needing to store PK values. Perhaps we should just bite the
bullet...it's not like places like the bufpage.c index routines
actually know or care about whether or not the index tuple has a TID,
what a TID is, etc. They care about stuff like the header values of
index tuples, and the page ItemId array, but TID is, as you put it,
merely payload.

> It's possible to add indirection layer "on demand".  Thus, initially index
> tuples point directly to the heap tuple.  If tuple gets updates and doesn't
> fit to the page anymore, then it's moved to another place with redirect in
> the old place.  I think that if carefully designed, it's possible to
> guarantee there is at most one redirect.

This is actually what I was thinking. Here is a sketch:

When you start out, index tuples in nbtree are the same as today --
one physical pointer (TID). But, on the first update to a PK index,
they grow a new pointer, but this is not a physical/heap TID. It's a
pointer to some kind of 

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-17 Thread Mark Rofail
On Wed, Jul 12, 2017 at 12:53 AM, Alvaro Herrera 
 wrote:
>
> We have one opclass for each type combination -- int4 to int2, int4 to
> int4, int4 to int8, etc.  You just need to add the new strategy to all
> the opclasses.


 I tried this approach by manually declaring the operator multiple of times
in pg_amop.h (src/include/catalog/pg_amop.h)

so instead of the polymorphic declaration
DATA(insert ( 2745   2277 2283 5 s 6108 2742 0 )); /* anyarray @>> anyelem
*/

multiple declarations were used, for example for int4[] :
DATA(insert ( 2745   1007 20 5 s 6108 2742 0 )); /* int4[] @>> int8 */
DATA(insert ( 2745   1007 23 5 s 6108 2742 0 )); /* int4[] @>> int4 */
DATA(insert ( 2745   1007 21 5 s 6108 2742 0 )); /* int4[] @>> int2 */
DATA(insert ( 2745   1007 1700 5 s 6108 2742 0 ));/* int4[] @>> numeric */

However, make check produced:
could not create unique index "pg_amop_opr_fam_index"
Key (amopopr, amoppurpose, amopfamily)=(6108, s, 2745) is duplicated.

Am I implementing this the wrong way or do we need to look for another
approach?
diff --git a/src/backend/access/gin/ginarrayproc.c b/src/backend/access/gin/ginarrayproc.c
index a5238c3af5..9d6447923d 100644
--- a/src/backend/access/gin/ginarrayproc.c
+++ b/src/backend/access/gin/ginarrayproc.c
@@ -24,6 +24,7 @@
 #define GinContainsStrategy		2
 #define GinContainedStrategy	3
 #define GinEqualStrategy		4
+#define GinContainsElemStrategy	5
 
 
 /*
@@ -43,7 +44,7 @@ ginarrayextract(PG_FUNCTION_ARGS)
 	bool	   *nulls;
 	int			nelems;
 
-	get_typlenbyvalalign(ARR_ELEMTYPE(array),
+	get_typlenbyvalalign(ARR_ELEMTYPE(array),	
 		 , , );
 
 	deconstruct_array(array,
@@ -110,6 +111,11 @@ ginqueryarrayextract(PG_FUNCTION_ARGS)
 		case GinOverlapStrategy:
 			*searchMode = GIN_SEARCH_MODE_DEFAULT;
 			break;
+		case GinContainsElemStrategy:
+			/* only items that match the queried element 
+are considered candidate  */
+			*searchMode = GIN_SEARCH_MODE_DEFAULT;
+			break;
 		case GinContainsStrategy:
 			if (nelems > 0)
 *searchMode = GIN_SEARCH_MODE_DEFAULT;
@@ -171,6 +177,7 @@ ginarrayconsistent(PG_FUNCTION_ARGS)
 }
 			}
 			break;
+		case GinContainsElemStrategy:
 		case GinContainsStrategy:
 			/* result is not lossy */
 			*recheck = false;
@@ -258,7 +265,8 @@ ginarraytriconsistent(PG_FUNCTION_ARGS)
 }
 			}
 			break;
-		case GinContainsStrategy:
+			case GinContainsElemStrategy:
+			case GinContainsStrategy:
 			/* must have all elements in check[] true, and no nulls */
 			res = GIN_TRUE;
 			for (i = 0; i < nkeys; i++)
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 34dadd6e19..8c9eb0c676 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -4232,6 +4232,117 @@ arraycontained(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(result);
 }
 
+/*
+ * array_contains_elem : checks an array for a spefific element
+ */
+static bool
+array_contains_elem(AnyArrayType *array, Datum elem, Oid element_type,
+bool element_isnull, Oid collation,	void **fn_extra)
+{
+	Oid 		arr_type = AARR_ELEMTYPE(array);
+	TypeCacheEntry *typentry;
+	int 		nelems;
+	int			typlen;
+	bool		typbyval;
+	char		typalign;
+	int			i;
+	array_iter 	it1;
+	FunctionCallInfoData locfcinfo;
+
+	if (arr_type != element_type)
+		ereport(ERROR,
+(errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("cannot compare different element types")));
+	
+	if (element_isnull)
+		return false;
+		
+	/*
+	 * We arrange to look up the equality function only once per series of
+	 * calls, assuming the element type doesn't change underneath us.  The
+	 * typcache is used so that we have no memory leakage when being used as
+	 * an index support function.
+	 */
+	typentry = (TypeCacheEntry *)*fn_extra;
+	if (typentry == NULL ||
+		typentry->type_id != arr_type)
+	{
+		typentry = lookup_type_cache(arr_type,
+	 TYPECACHE_EQ_OPR_FINFO);
+		if (!OidIsValid(typentry->eq_opr_finfo.fn_oid))
+			ereport(ERROR,
+	(errcode(ERRCODE_UNDEFINED_FUNCTION),
+	 errmsg("could not identify an equality operator for type %s",
+			format_type_be(arr_type;
+		*fn_extra = (void *)typentry;
+	}
+	typlen = typentry->typlen;
+	typbyval = typentry->typbyval;
+	typalign = typentry->typalign;
+
+	/*
+	 * Apply the comparison operator to each pair of array elements.
+	 */
+	InitFunctionCallInfoData(locfcinfo, >eq_opr_finfo, 2,
+			 collation, NULL, NULL);
+
+	/* Loop over source data */
+	nelems = ArrayGetNItems(AARR_NDIM(array), AARR_DIMS(array));
+	array_iter_setup(, array);
+
+	for (i = 0; i < nelems; i++)
+	{
+		Datum elt1;
+		bool isnull;
+		bool oprresult;
+
+		/* Get element, checking for NULL */
+		elt1 = array_iter_next(, , i, typlen, typbyval, typalign);
+
+		/*
+		 * We assume that the comparison operator is strict, so a NULL can't
+		 * match anything.  XXX this diverges from the "NULL=NULL" behavior of
+		 * array_eq, should we act like that?
+		 */
+		if (isnull)
+			continue;
+
+		/*
+			* 

Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-17 Thread Mark Dilger

> On Jul 17, 2017, at 3:56 PM, Tom Lane  wrote:
> 
> Mark Dilger  writes:
>> On Jul 17, 2017, at 3:12 PM, Tom Lane  wrote:
>>> Now, this should mostly work conveniently, except that we have
>>> +/-infinity (NOEND_ABSTIME/NOSTART_ABSTIME) to deal with ... It might
>>> be saner to just desupport +/-infinity for abstime.
> 
>> I don't use those values, so it is no matter to me if we desupport them.  It
>> seems a bit pointless, though, because we still have to handle legacy
>> values that we encounter.  I assume some folks will have those values in
>> their tables when they upgrade.
> 
> Well, some folks will also have pre-1970 dates in their tables, no?
> We're just blowing those off.  They'll print out as some post-2038
> date or other, and too bad.
> 
> Basically, the direction this is going in is that abstime will become
> an officially supported type, but its range of supported values is "not
> too many decades either way from now".  If you are using it to store
> very old dates then You're Doing It Wrong, and eventually you'll get
> bitten.  Given that contract, I don't see a place for +/-infinity.

Works for me.

mark



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-17 Thread Tom Lane
Mark Dilger  writes:
> On Jul 17, 2017, at 3:12 PM, Tom Lane  wrote:
>> Now, this should mostly work conveniently, except that we have
>> +/-infinity (NOEND_ABSTIME/NOSTART_ABSTIME) to deal with ... It might
>> be saner to just desupport +/-infinity for abstime.

> I don't use those values, so it is no matter to me if we desupport them.  It
> seems a bit pointless, though, because we still have to handle legacy
> values that we encounter.  I assume some folks will have those values in
> their tables when they upgrade.

Well, some folks will also have pre-1970 dates in their tables, no?
We're just blowing those off.  They'll print out as some post-2038
date or other, and too bad.

Basically, the direction this is going in is that abstime will become
an officially supported type, but its range of supported values is "not
too many decades either way from now".  If you are using it to store
very old dates then You're Doing It Wrong, and eventually you'll get
bitten.  Given that contract, I don't see a place for +/-infinity.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-17 Thread Mark Dilger

> On Jul 17, 2017, at 3:12 PM, Tom Lane  wrote:
> 
> Mark Dilger  writes:
>>> On Jul 17, 2017, at 2:14 PM, Tom Lane  wrote:
>>> Well, if you or somebody is willing to do the legwork, I'd be on board
>>> with a plan that says that every 68 years we redefine the origin of
>>> abstime.  I imagine it could be done so that currently-stored abstime
>>> values retain their present meaning as long as they're not too old.
>>> For example the initial change would toss abstimes before 1970 overboard,
>>> repurposing that range of values as being 2038-2106, but values between
>>> 1970 and 2038 still mean the same as they do today.  If anybody still
>>> cares in circa 2085, we toss 1970-2038 overboard and move the origin
>>> again, lather rinse repeat.
> 
>> Assuming other members of the community would not object to such
>> a plan, I'd be willing to step up to that plate.
> 
> So, thinking about how that would actually work ... the thing to do in
> order to preserve existing on-disk values is to alternate between signed
> and unsigned interpretations of abstimes.  That is, right now, abstime
> is signed with origin 1970.  The conversion I'm arguing we should make
> real soon now is to unsigned with origin 1970.  If the project lives
> so long, in circa 70 years we'd switch it to signed with origin 2106.
> Yadda yadda.

Right, I already had the same idea.  That's not how I am doing it in my
fork currently, but that's what you would want to do to support any values
already stored somewhere.

> Now, this should mostly work conveniently, except that we have
> +/-infinity (NOEND_ABSTIME/NOSTART_ABSTIME) to deal with.  Those
> are nicely positioned at the ends of the signed range so that
> abstime_cmp_internal() doesn't need to treat them specially.
> In an unsigned world they'd be smack in the middle of the range.
> We could certainly teach abstime_cmp_internal to special-case them
> and deliver logically correct results, but there's a bigger problem:
> those will correspond to two seconds in January 2038 that will need
> to be representable when the time comes.  Maybe we can make that
> work by defining the values past 2038 as being two seconds less than
> you might otherwise expect, but I bet it's gonna be messy.  It might
> be saner to just desupport +/-infinity for abstime.

I don't use those values, so it is no matter to me if we desupport them.  It
seems a bit pointless, though, because we still have to handle legacy
values that we encounter.  I assume some folks will have those values in
their tables when they upgrade.

> The same goes for INVALID_ABSTIME, which doesn't have any clear
> use-case that I know of, and is certainly not documented.

Likewise, I don't know how to desupport this while still supporting legacy
tables.

mark



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2017-07-17 Thread Fabrízio Mello
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

It's a very simple change and I have not to complain about source and 
documentation changes.

But I wonder the lack of tap tests of this new pg_dump behavior. Shouldn't we 
add tap tests?

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-17 Thread Tom Lane
Mark Dilger  writes:
>> On Jul 17, 2017, at 2:14 PM, Tom Lane  wrote:
>> Well, if you or somebody is willing to do the legwork, I'd be on board
>> with a plan that says that every 68 years we redefine the origin of
>> abstime.  I imagine it could be done so that currently-stored abstime
>> values retain their present meaning as long as they're not too old.
>> For example the initial change would toss abstimes before 1970 overboard,
>> repurposing that range of values as being 2038-2106, but values between
>> 1970 and 2038 still mean the same as they do today.  If anybody still
>> cares in circa 2085, we toss 1970-2038 overboard and move the origin
>> again, lather rinse repeat.

> Assuming other members of the community would not object to such
> a plan, I'd be willing to step up to that plate.

So, thinking about how that would actually work ... the thing to do in
order to preserve existing on-disk values is to alternate between signed
and unsigned interpretations of abstimes.  That is, right now, abstime
is signed with origin 1970.  The conversion I'm arguing we should make
real soon now is to unsigned with origin 1970.  If the project lives
so long, in circa 70 years we'd switch it to signed with origin 2106.
Yadda yadda.

Now, this should mostly work conveniently, except that we have
+/-infinity (NOEND_ABSTIME/NOSTART_ABSTIME) to deal with.  Those
are nicely positioned at the ends of the signed range so that
abstime_cmp_internal() doesn't need to treat them specially.
In an unsigned world they'd be smack in the middle of the range.
We could certainly teach abstime_cmp_internal to special-case them
and deliver logically correct results, but there's a bigger problem:
those will correspond to two seconds in January 2038 that will need
to be representable when the time comes.  Maybe we can make that
work by defining the values past 2038 as being two seconds less than
you might otherwise expect, but I bet it's gonna be messy.  It might
be saner to just desupport +/-infinity for abstime.

The same goes for INVALID_ABSTIME, which doesn't have any clear
use-case that I know of, and is certainly not documented.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-17 Thread Mark Dilger

> On Jul 17, 2017, at 2:14 PM, Tom Lane  wrote:
> 
> Mark Dilger  writes:
>>> On Jul 17, 2017, at 12:54 PM, Mark Dilger  wrote:
>> On Jul 15, 2017, at 3:00 PM, Tom Lane  wrote:
 The types abstime, reltime, and tinterval need to go away, or be
 reimplemented, sometime well before 2038 when they will overflow.
> 
>>> These types provide a 4-byte datatype for storing real-world second
>>> precision timestamps, as occur in many log files.  Forcing people to
>>> switch to timestamp or timestamptz will incur a 4 byte per row
>>> penalty.  In my own builds, I have changed the epoch on these so
>>> they won't wrap until sometime after 2100 C.E.  I see little point in
>>> switching to an 8-byte millisecond precision datatype when a perfectly
>>> good 4-byte second precision datatype already serves the purpose.
> 
> Well, if you or somebody is willing to do the legwork, I'd be on board
> with a plan that says that every 68 years we redefine the origin of
> abstime.  I imagine it could be done so that currently-stored abstime
> values retain their present meaning as long as they're not too old.
> For example the initial change would toss abstimes before 1970 overboard,
> repurposing that range of values as being 2038-2106, but values between
> 1970 and 2038 still mean the same as they do today.  If anybody still
> cares in circa 2085, we toss 1970-2038 overboard and move the origin
> again, lather rinse repeat.
> 
> But we're already past the point where it would be time to make the
> first such switch, if we're gonna do it.  So I'd like to see somebody
> step up to the plate sooner not later.

Assuming other members of the community would not object to such
a plan, I'd be willing to step up to that plate.  I'll wait a respectable time,
maybe until tomorrow, to allow others to speak up.

> I'd be inclined to toss reltime and tinterval overboard in any case.

Yeah, I don't use those either, preferring to cast abstime to timestamp
(or timestamptz) prior to doing any math on them.

> 
>> Oh, and if you could be so kind, please remove them in a commit that
>> does nothing else.  It's much easier to skip the commit that way.
> 
> We doubtless would do that, but you're fooling yourself if you imagine
> that you can maintain such a fork for long while still tracking the
> community version otherwise.  Once those hand-assigned OIDs are free
> they'll soon be absorbed by future feature patches, and then you'll
> have enormous merge problems.

Oh, not so much.  Knowing that they were going to be deprecated, I
already moved the Oids for these to something higher than 1.  In
my own builds, the Oid generator does not assign Oids lower than 65536,
which leaves room for me to assign in the 1-65535 range without
merge headaches.  It would, however, be simpler if the types did not
go away, as that would cause minor merge headaches elsewhere, such
as in the regression tests.

mark





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-17 Thread Tom Lane
Mark Dilger  writes:
>> On Jul 17, 2017, at 12:54 PM, Mark Dilger  wrote:
> On Jul 15, 2017, at 3:00 PM, Tom Lane  wrote:
>>> The types abstime, reltime, and tinterval need to go away, or be
>>> reimplemented, sometime well before 2038 when they will overflow.

>> These types provide a 4-byte datatype for storing real-world second
>> precision timestamps, as occur in many log files.  Forcing people to
>> switch to timestamp or timestamptz will incur a 4 byte per row
>> penalty.  In my own builds, I have changed the epoch on these so
>> they won't wrap until sometime after 2100 C.E.  I see little point in
>> switching to an 8-byte millisecond precision datatype when a perfectly
>> good 4-byte second precision datatype already serves the purpose.

Well, if you or somebody is willing to do the legwork, I'd be on board
with a plan that says that every 68 years we redefine the origin of
abstime.  I imagine it could be done so that currently-stored abstime
values retain their present meaning as long as they're not too old.
For example the initial change would toss abstimes before 1970 overboard,
repurposing that range of values as being 2038-2106, but values between
1970 and 2038 still mean the same as they do today.  If anybody still
cares in circa 2085, we toss 1970-2038 overboard and move the origin
again, lather rinse repeat.

But we're already past the point where it would be time to make the
first such switch, if we're gonna do it.  So I'd like to see somebody
step up to the plate sooner not later.

I'd be inclined to toss reltime and tinterval overboard in any case.

> Oh, and if you could be so kind, please remove them in a commit that
> does nothing else.  It's much easier to skip the commit that way.

We doubtless would do that, but you're fooling yourself if you imagine
that you can maintain such a fork for long while still tracking the
community version otherwise.  Once those hand-assigned OIDs are free
they'll soon be absorbed by future feature patches, and then you'll
have enormous merge problems.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in HEAD when too many nested functions call

2017-07-17 Thread Julien Rouhaud
On 17/07/2017 16:57, Andres Freund wrote:
> The latter obviously isn't ready, but might make clearer what I'm
> thinking about.

It did for me, and FWIW I like this approach.

> If we were to go this route we'd have to probably move
> the callback assignment into the ExecInit* routines, and possibly
> replace the switch in ExecInitNode() with another callback, assigned in
> make_*, and implement callbacks for ExecShutdown, MultiExecProcNode etc.
> 
> This results in a good speedup in tpc-h btw:
> master total min: 46434.897 cb min: 44778.228 [diff -3.70]

Is it v11 material or is there any chance to make it in v10?

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-07-17 Thread Alexander Korotkov
On Mon, Jul 17, 2017 at 7:51 PM, Peter Geoghegan  wrote:

> On Mon, Jul 17, 2017 at 3:22 AM, Alexander Korotkov
>  wrote:
> > I think that "retail index tuple deletion" is the feature which could
> give
> > us some advantages even independently from pluggable storages.  For
> example,
> > imagine very large table with only small amount of dead tuples.  In this
> > case, it would be cheaper to delete index links to those dead tuples one
> by
> > one using "retail index tuple deletion", rather than do full scan of
> every
> > index to perform "bulk delete" of index tuples.  One may argue that you
> > shouldn't do vacuum of large table when only small amount of tuples are
> > dead.  But in terms of index bloat mitigation, very aggressive vacuum
> > strategy could be justified.
>
> Yes, definitely. Especially with the visibility map. Even still, I
> tend to think that for unique indexes, true duplicates should be
> disallowed, and dealt with with an additional layer of indirection. So
> this would be for secondary indexes.
>

It's probably depends on particular storage (once we have pluggable
storages).  Some storages would have additional level of indirection while
others wouldn't.  But even if unique index contain no true duplicates, it's
still possible that true delete happen.  Then we still have to delete tuple
even from unique index.

>> I agree with Robert that being able to store an arbitrary payload as a
> >> TID is probably not going to ever work very well.
> >
> >
> > Support of arbitrary payload as a TID doesn't sound easy.  However, that
> > doesn't mean it's unachievable. For me, it's more like long way which
> could
> > be traveled step by step.
>
> To be fair, it probably is achievable. Where there is a will, there is
> a way. I just think that it will be easier to find a different way of
> realizing similar benefits. I'm mostly talking about benefits around
> making it cheap to have many secondary indexes by having logical
> indirection instead of physical pointers (doesn't *have* to be
> user-visible primary key values).


It's possible to add indirection layer "on demand".  Thus, initially index
tuples point directly to the heap tuple.  If tuple gets updates and doesn't
fit to the page anymore, then it's moved to another place with redirect in
the old place.  I think that if carefully designed, it's possible to
guarantee there is at most one redirect.

But I sill think that evading arbitrary payload for indexes is delaying of
inevitable, if only we want pluggable storages and want them to reuse
existing index AMs.  So, for example, arbitrary payload together with
ability to update this payload allows us to make indexes separately
versioned (have separate garbage collection process more or less unrelated
to heap).  Despite overhead caused by MVCC attributes, I think such indexes
could give significant advantages in various workloads.


> HOT simply isn't effective enough at
> preventing UPDATE index tuple insertions for indexes on unchanged
> attributes, often just because pruning can fail to happen in time,
> which WARM will not fix.
>

Right.  I think HOT and WARM depend on factors which are hard to control:
distribution of UPDATEs between heap pages, oldest snapshot and so on.
It's quite hard for DBA to understand why table starts getting bloat while
it didn't before.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-17 Thread Mark Dilger

> On Jul 17, 2017, at 12:54 PM, Mark Dilger  wrote:
> 
> 
>> On Jul 15, 2017, at 3:00 PM, Tom Lane  wrote:
>> 
>> The types abstime, reltime, and tinterval need to go away, or be
>> reimplemented, sometime well before 2038 when they will overflow.
>> It's not too soon to start having a plan for that, especially seeing
>> that it seems to take a decade or more for us to actually get rid
>> of anything we've deprecated.
>> 
>> Right offhand, I don't think there is any functionality in these
>> types that isn't handled as well or better by timestamptz, interval,
>> and tstzrange respectively.  
> 
> These types provide a 4-byte datatype for storing real-world second
> precision timestamps, as occur in many log files.  Forcing people to
> switch to timestamp or timestamptz will incur a 4 byte per row
> penalty.  In my own builds, I have changed the epoch on these so
> they won't wrap until sometime after 2100 C.E.  I see little point in
> switching to an 8-byte millisecond precision datatype when a perfectly
> good 4-byte second precision datatype already serves the purpose.
> 
> That said, I am fully aware that these are deprecated and expect you
> will remove them, at which time I'll have to keep them in my tree
> and politely refuse to merge in your change which removes them.

Oh, and if you could be so kind, please remove them in a commit that
does nothing else.  It's much easier to skip the commit that way.

Thanks!

mark



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PgFDW connection invalidation by ALTER SERVER/ALTER USER MAPPING

2017-07-17 Thread Tom Lane
Kyotaro HORIGUCHI  writes:
> This is the revased and revised version of the previous patch.

A few more comments:

* Per the spec for CacheRegisterSyscacheCallback, a zero hash value means
to flush all associated state.  This isn't handling that case correctly.
Even when you are given a specific hash value, I think exiting after
finding one match is incorrect: there could be multiple connections
to the same server with different user mappings, or vice versa.

* I'm not really sure that it's worth the trouble to pay attention
to the hash value at all.  Very few other syscache callbacks do,
and the pg_server/pg_user_mapping catalogs don't seem likely to
have higher than average traffic.

* Personally I'd be inclined to register the callbacks at the same
time the hashtables are created, which is a pattern we use elsewhere
... in fact, postgres_fdw itself does it that way for the transaction
related callbacks, so it seems a bit weird that you are going in a
different direction for these callbacks.  That way avoids the need to
depend on a _PG_init function and means that the callbacks don't have to
worry about the hashtables not being valid.  Also, it seems a bit
pointless to have separate layers of postgresMappingSysCallback and
InvalidateConnectionForMapping functions.

* How about some regression test cases?  You couldn't really exercise
cross-session invalidation easily, but I don't think you need to.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-17 Thread Mark Dilger

> On Jul 15, 2017, at 3:00 PM, Tom Lane  wrote:
> 
> The types abstime, reltime, and tinterval need to go away, or be
> reimplemented, sometime well before 2038 when they will overflow.
> It's not too soon to start having a plan for that, especially seeing
> that it seems to take a decade or more for us to actually get rid
> of anything we've deprecated.
> 
> Right offhand, I don't think there is any functionality in these
> types that isn't handled as well or better by timestamptz, interval,
> and tstzrange respectively.  

These types provide a 4-byte datatype for storing real-world second
precision timestamps, as occur in many log files.  Forcing people to
switch to timestamp or timestamptz will incur a 4 byte per row
penalty.  In my own builds, I have changed the epoch on these so
they won't wrap until sometime after 2100 C.E.  I see little point in
switching to an 8-byte millisecond precision datatype when a perfectly
good 4-byte second precision datatype already serves the purpose.

That said, I am fully aware that these are deprecated and expect you
will remove them, at which time I'll have to keep them in my tree
and politely refuse to merge in your change which removes them.

mark



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why have we got both largeobject and large_object test files?

2017-07-17 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> I happened to notice that the regression tests contain both
>> largeobject.sql and large_object.sql.  This seems at best confusing and at
>> worst a source of mistakes.  The second file was added in March by commit
>> ff992c074, has never been touched by any other commit, and is only 8 lines
>> long.  Was there a really good reason not to incorporate that test into
>> largeobject.sql?

> Just to be clear that we're talking about the same thing- there is no
> 'largeobject.sql' in a clean source tree.  There is a 'largeobject.source'
> in src/test/regress/input which is converted to largeobject.sql.

Right, sorry for the imprecision.

> As for the general question of if the two could be merged, I can't think
> of any reason off-hand why that wouldn't work, nor do I have any
> particular recollection as to why I created a new file instead of using
> the existing.  My shell history tells me that I found largeobject.source
> while crafting the test case but not why I didn't use it.

You've got shell history back to March?  Wow.

> The main thing is that the large_object.sql was specifically added to
> test pg_upgrade/pg_dump, so the created object needs to be kept around
> in the regression database with the comment after the tests run for that
> to happen.

Right.  I was thinking of just appending large_object.sql to the end of
the other file.

It's possible that there's some issue involving race conditions against
some concurrent test, but I'll look around.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More flexible LDAP auth search filters?

2017-07-17 Thread Magnus Hagander
On Mon, Jul 17, 2017 at 6:47 PM, Mark Cave-Ayland <
mark.cave-ayl...@ilande.co.uk> wrote:

> On 17/07/17 13:09, Magnus Hagander wrote:
>
> Hi Magnus,
>
> Great to hear from you! It has definitely been a while...
>

Indeed. You should spend more time on these lists :P



>
> > Generally you find that you will be given the option to set the
> > attribute for the default search filter of the form
> > "(attribute=username)" which defaults to uid for UNIX-type systems
> and
> > sAMAccountName for AD. However there is always the ability to
> specify a
> > custom filter where the user is substituted via e.g. %u to cover all
> the
> > other use-cases.
> >
> > Right, but that's something we do already today. It just defaults to
> > uid, but it's easy to change.
>
> Yes, I think that bit is fine as long as the default can be overridden.
> There's always a choice as to whether the defaults favour a POSIX-based
> LDAP or AD environment but that happens with all installations so it's
> not a big deal.
>

It's easy enough to change.



> > As an example, I don't know if anyone would actually do this with
> > PostgreSQL but I've been asked on multiple occasions to configure
> > software so that users should be allowed to log in with either their
> > email address or username which is easily done with a custom LDAP
> filter
> > like "(|(mail=%u)(uid=%u))".
> >
> >
> > How would that actually work, though? Given the same user in ldap could
> > now potentially match multiple different users in PostgreSQL. Would you
> > then create two accounts for the user, one with the uid as name and one
> > with email as name? Wouldn't that actually cause more issues than it
> solves?
>
> Normally what happens for search+bind is that you execute the custom
> filter with substitutions in order to find the entry for your bind DN,
> but you also request the value of ldapsearchattribute (or equivalent) at
> the same time. Say for example you had an entry like this:
>
> dn: uid=mha,dc=users,dc=hagander,dc=net
> uid: mha
> mail: mag...@hagander.net
>
> Using the filter "(|(mail=%u)(uid=%u))" would locate the same bind DN
> "uid=mha,dc=users,dc=hagander,dc=net" with either one of your uid or
> email address.
>
> If the bind is successful then the current user identity should be set
> to the value of the ldapsearchattribute retrieved from the bind DN
> entry, which with the default of "uid" would be "mha". Hence you end up
> with the same user role "mha" regardless of whether a uid or email
> address was entered.
>

Right. This is the part that doesn't work for PostgreSQL. Because we have
already specified the username -- it goes in the startup packet in order to
pick the correct row from pg_hba.conf.

I guess in theory we could treat it like Kerberos or another one of the
systems where we get the username from an external entity. But then you'd
still have to specify the mapping again in pg_ident.conf, and it would be a
pretty strong break of backwards compatibility.



> In terms of matching multiple users, all LDAP authentication routines
> I've seen will fail if more than one DN matches the search filter, so
> this nicely handles the cases where either the custom filter is
> incorrect or more than one entry is accidentally matched in the directory.
>

So do we, in the current implementation. But it's a lot less likely to
happen in the current implementation, since we do a single equals lookup.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] Why have we got both largeobject and large_object test files?

2017-07-17 Thread Stephen Frost
Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> I happened to notice that the regression tests contain both
> largeobject.sql and large_object.sql.  This seems at best confusing and at
> worst a source of mistakes.  The second file was added in March by commit
> ff992c074, has never been touched by any other commit, and is only 8 lines
> long.  Was there a really good reason not to incorporate that test into
> largeobject.sql?

Just to be clear that we're talking about the same thing- there is no
'largeobject.sql' in a clean source tree.  There is a 'largeobject.source'
in src/test/regress/input which is converted to largeobject.sql.

As for the general question of if the two could be merged, I can't think
of any reason off-hand why that wouldn't work, nor do I have any
particular recollection as to why I created a new file instead of using
the existing.  My shell history tells me that I found largeobject.source
while crafting the test case but not why I didn't use it.

The main thing is that the large_object.sql was specifically added to
test pg_upgrade/pg_dump, so the created object needs to be kept around
in the regression database with the comment after the tests run for that
to happen.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Pluggable storage

2017-07-17 Thread Peter Geoghegan
On Mon, Jul 17, 2017 at 3:22 AM, Alexander Korotkov
 wrote:
> I think that "retail index tuple deletion" is the feature which could give
> us some advantages even independently from pluggable storages.  For example,
> imagine very large table with only small amount of dead tuples.  In this
> case, it would be cheaper to delete index links to those dead tuples one by
> one using "retail index tuple deletion", rather than do full scan of every
> index to perform "bulk delete" of index tuples.  One may argue that you
> shouldn't do vacuum of large table when only small amount of tuples are
> dead.  But in terms of index bloat mitigation, very aggressive vacuum
> strategy could be justified.

Yes, definitely. Especially with the visibility map. Even still, I
tend to think that for unique indexes, true duplicates should be
disallowed, and dealt with with an additional layer of indirection. So
this would be for secondary indexes.

>> I agree with Robert that being able to store an arbitrary payload as a
>> TID is probably not going to ever work very well.
>
>
> Support of arbitrary payload as a TID doesn't sound easy.  However, that
> doesn't mean it's unachievable. For me, it's more like long way which could
> be traveled step by step.

To be fair, it probably is achievable. Where there is a will, there is
a way. I just think that it will be easier to find a different way of
realizing similar benefits. I'm mostly talking about benefits around
making it cheap to have many secondary indexes by having logical
indirection instead of physical pointers (doesn't *have* to be
user-visible primary key values). HOT simply isn't effective enough at
preventing UPDATE index tuple insertions for indexes on unchanged
attributes, often just because pruning can fail to happen in time,
which WARM will not fix.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hash index on unlogged tables doesn't behave as expected

2017-07-17 Thread Michael Paquier
On Mon, Jul 17, 2017 at 6:02 PM, Robert Haas  wrote:
> It seems to me that this area might benefit from a broader rethink.
> It's not very nice to impose a restriction that init forks can only be
> constructed using log_newpage(); on the other hand, it is also not
> very nice that Amit's patch needs to recapitulate the mysterious
> incantation used by XLogReadBufferForRedoExtended in several more
> places.  The basic problem here is that it would be really easy for
> the next person who adds an index AM to make the exact same mistake
> that Amit made here and that I failed to spot during code review.  It
> would be nice to come up with some more generic solution to the
> problem rather than relying on each AM to do insert this
> FlushOneBuffer() incantation in each place where it's needed.  I think
> there are probably several ways to do that; one idea might be to flush
> all init-fork buffers in bulk at the end of recovery.

Things are centralized in XLogReadBufferForRedoExtended() for init
fork flushes, which is not something bad in itself as it is the unique
place doing this work normally for other AMs. And I have to admit that
the current coding for hash index WAL has the advantage to create less
WAL quantity as the bitmap and metadata pages get initialized using
the data of the records themselves. One idea I can think of would be
to create a README in src/backend/access for index AMs that summarizes
a basic set of recommendations for each routine used.

> However, it doesn't really seem urgent to tinker with that; while I
> think the fact that existing AMs use log_newpage() to populate the
> init fork is mostly just luck, it might well be 10 or 20 years before
> somebody adds another AM that happens to use any other technique.
> Moreover, at the moment, we're trying to get a release out the door,
> and to me that argues for keeping structural changes to a minimum.
> Amit's patch seems like a pretty surgical fix to this problem with
> minimal chance of breaking anything new, and that seems like the right
> kind of fix for July.  So I plan to commit what Amit proposed (with a
> few grammar corrections).

No objections to that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More flexible LDAP auth search filters?

2017-07-17 Thread Mark Cave-Ayland
On 17/07/17 13:09, Magnus Hagander wrote:

Hi Magnus,

Great to hear from you! It has definitely been a while...

> Generally you find that you will be given the option to set the
> attribute for the default search filter of the form
> "(attribute=username)" which defaults to uid for UNIX-type systems and
> sAMAccountName for AD. However there is always the ability to specify a
> custom filter where the user is substituted via e.g. %u to cover all the
> other use-cases.
> 
> Right, but that's something we do already today. It just defaults to
> uid, but it's easy to change. 

Yes, I think that bit is fine as long as the default can be overridden.
There's always a choice as to whether the defaults favour a POSIX-based
LDAP or AD environment but that happens with all installations so it's
not a big deal.

> As an example, I don't know if anyone would actually do this with
> PostgreSQL but I've been asked on multiple occasions to configure
> software so that users should be allowed to log in with either their
> email address or username which is easily done with a custom LDAP filter
> like "(|(mail=%u)(uid=%u))".
> 
> 
> How would that actually work, though? Given the same user in ldap could
> now potentially match multiple different users in PostgreSQL. Would you
> then create two accounts for the user, one with the uid as name and one
> with email as name? Wouldn't that actually cause more issues than it solves?

Normally what happens for search+bind is that you execute the custom
filter with substitutions in order to find the entry for your bind DN,
but you also request the value of ldapsearchattribute (or equivalent) at
the same time. Say for example you had an entry like this:

dn: uid=mha,dc=users,dc=hagander,dc=net
uid: mha
mail: mag...@hagander.net

Using the filter "(|(mail=%u)(uid=%u))" would locate the same bind DN
"uid=mha,dc=users,dc=hagander,dc=net" with either one of your uid or
email address.

If the bind is successful then the current user identity should be set
to the value of the ldapsearchattribute retrieved from the bind DN
entry, which with the default of "uid" would be "mha". Hence you end up
with the same user role "mha" regardless of whether a uid or email
address was entered.

In terms of matching multiple users, all LDAP authentication routines
I've seen will fail if more than one DN matches the search filter, so
this nicely handles the cases where either the custom filter is
incorrect or more than one entry is accidentally matched in the directory.


ATB,

Mark.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi column range partition table

2017-07-17 Thread Dean Rasheed
On 17 July 2017 at 16:34, Robert Haas  wrote:
> On Sun, Jul 16, 2017 at 6:40 AM, Dean Rasheed  
> wrote:
>> Technically, anything that can be done using INCLUSIVE/EXCLUSIVE can
>> also be done using using MINVALUE/MAXVALUE, by artificially adding
>> another partitioning column and making it unbounded above/below, but
>> that would really just be a hack, and it (artificially adding an extra
>> column) would be unnecessary if we added INCLUSIVE/EXCLUSIVE support
>> in a later release. Thus, I think the 2 features would complement each
>> other quite nicely.
>
> OK, works for me.  I'm not really keen about the MINVALUE/MAXVALUE
> syntax -- it's really +/- infinity, not a value at all -- but I
> haven't got a better proposal and yours at least has the virtue of
> perhaps being familiar to those who know about Oracle.
>

Cool. Sounds like we've reached a consensus, albeit with some
reservations around the fact that MINVALUE/MAXVALUE aren't actually
values, despite their names.

+/- infinity *are* values for some datatypes such as timestamps, so it
had to be something different from that, and MINVALUE/MAXVALUE are
quite short and simple, and match the syntax used by 3 other
databases.


> Do you want to own this open item, then?
>

OK.

I need to give the patch another read-through, and then I'll aim to
push it sometime in the next few days.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-17 Thread Robert Haas
On Mon, Jul 17, 2017 at 11:37 AM, Michael Paquier
 wrote:
> Exactly, having an exact deprecation policy would be nice. Of course
> this depends on the feature we are talking about. pg_dump for example
> supports servers older than what community still supports. But in most
> cases, like in core data types, it would be rather safe to say that it
> gets deprecated once all the versions supported by community share the
> same state (remember for example 17d436d2 that was kept around for a
> rather long time, or the handling of opaque function types for
> triggers and PLs still present for 15 years).

Yeah, but historically it never works out.  A relatively famous
example is contrib/xml2, whose documentation says:

==
>From PostgreSQL 8.3 on, there is XML-related functionality based on
the SQL/XML standard in the core server. That functionality covers XML
syntax checking and XPath queries, which is what this module does, and
more, but the API is not at all compatible. It is planned that this
module will be removed in a future version of PostgreSQL in favor of
the newer standard API, so you are encouraged to try converting your
applications. If you find that some of the functionality of this
module is not available in an adequate form with the newer API, please
explain your issue to  so that the
deficiency can be addressed.
==

But until 3163baa6d2d12c28f45fec60ab313537ea9a43a4 (February 24,
2013), it said that it would be removed in PostgreSQL 8.4 (July 1,
2009), a promise that could not be fulfilled without the use of a
serviceable time machine.  I proposed removing contrib/xml2 sometime
during the 8.4 or 9.0 cycle, IIRC, and got told "no".  While the
updated deprecation notice is less-obviously wrong, saying that
removal is "planned" is a pretty generous assessment.  It's unclear
that we've made any progress in addressing whatever the problems were
that caused the previous attempt at removal to get shot down, so it
might still be wrong: maybe xml2 is going to stick around until the
heat death of the universe.

Now, I agree that a date type which cannot represent dates after 2038
is of marginal and decreasing utility, and therefore the chances that
it will be removed are good.  Probably most other people will agree as
well.  But we could easily overlook the need to deliver on a promise
to remove it in a specific version, and the possibility that someone
will argue for keeping it as a matter of historical interest cannot be
ruled out.  In a community where anything at all can be relitigated at
the drop of a hat, making promises about what will happen in the
future is mostly wishful thinking.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hash index on unlogged tables doesn't behave as expected

2017-07-17 Thread Robert Haas
On Sat, Jul 15, 2017 at 4:25 AM, Michael Paquier
 wrote:
> On Sat, Jul 15, 2017 at 6:27 AM, Ashutosh Sharma  
> wrote:
>> I do agree with Amit. I think hash index is slightly trickier (in
>> terms of how it is organised) than other indexes and that could be the
>> reason for maintaining common code for hashbuild and hashbuildempty.
>
> Well, you both and Robert worked more on this code for PG10 than I
> did, so I am fine to rely on your judgement for the final result.
> Still I find this special handling quite surprising. All other AMs
> just always log FPWs for the init fork pages so I'd rather not break
> this treaty, but that's one against the world as things stand
> currently on this thread ;)

It seems to me that this area might benefit from a broader rethink.
It's not very nice to impose a restriction that init forks can only be
constructed using log_newpage(); on the other hand, it is also not
very nice that Amit's patch needs to recapitulate the mysterious
incantation used by XLogReadBufferForRedoExtended in several more
places.  The basic problem here is that it would be really easy for
the next person who adds an index AM to make the exact same mistake
that Amit made here and that I failed to spot during code review.  It
would be nice to come up with some more generic solution to the
problem rather than relying on each AM to do insert this
FlushOneBuffer() incantation in each place where it's needed.  I think
there are probably several ways to do that; one idea might be to flush
all init-fork buffers in bulk at the end of recovery.

However, it doesn't really seem urgent to tinker with that; while I
think the fact that existing AMs use log_newpage() to populate the
init fork is mostly just luck, it might well be 10 or 20 years before
somebody adds another AM that happens to use any other technique.
Moreover, at the moment, we're trying to get a release out the door,
and to me that argues for keeping structural changes to a minimum.
Amit's patch seems like a pretty surgical fix to this problem with
minimal chance of breaking anything new, and that seems like the right
kind of fix for July.  So I plan to commit what Amit proposed (with a
few grammar corrections).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Why have we got both largeobject and large_object test files?

2017-07-17 Thread Tom Lane
I happened to notice that the regression tests contain both
largeobject.sql and large_object.sql.  This seems at best confusing and at
worst a source of mistakes.  The second file was added in March by commit
ff992c074, has never been touched by any other commit, and is only 8 lines
long.  Was there a really good reason not to incorporate that test into
largeobject.sql?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unportable use of select for timeouts in PostgresNode.pm

2017-07-17 Thread Tom Lane
Andrew Dunstan  writes:
> I've been trying to get to the bottom of a nasty hang in buildfarm
> member jacana when running the pg_ctl TAP test. This test used to work,
> and was last known to work on June 22nd.

> My attention has become focussed on this change in commit de3de0afd:

> -   # Wait a second before retrying.
> -   sleep 1;
> +   # Wait 0.1 second before retrying.
> +   select undef, undef, undef, 0.1;

> This is a usage that is known not to work in Windows - IIRC we
> eliminated such calls from our C programs at the time of the Windows
> port - and it seems to me very likely to be the cause of the hang.

Ugh.

> Instead I think we should use the usleep() function from the standard
> (from 5.8) Perl module Time::HiRes, as recommended in the Perl docs for
> the sleep() function for situations where you need finer grained
> timeouts. I have verified that this works on jacana and friends.

> Unless I hear objections I'll prepare a patch along those lines.

WFM.  Thanks for taking care of it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-17 Thread Michael Paquier
On Sun, Jul 16, 2017 at 10:27 PM, Greg Stark  wrote:
> On 15 July 2017 at 23:00, Tom Lane  wrote:
>
>> While it's too late in the v10 cycle to do anything very meaningful
>> about this now, I am tempted to strengthen the deprecation notice's
>> wording from "might disappear" to "will disappear".
>
> "Will certainly disappear at some unspecified date" is a lot less
> convincing than "will disappear in 2021 in Postgres 14". The specific
> year and version number is irrelevant
> but picking and naming a specific one makes it a lot easier to follow
> through come that date.

Exactly, having an exact deprecation policy would be nice. Of course
this depends on the feature we are talking about. pg_dump for example
supports servers older than what community still supports. But in most
cases, like in core data types, it would be rather safe to say that it
gets deprecated once all the versions supported by community share the
same state (remember for example 17d436d2 that was kept around for a
rather long time, or the handling of opaque function types for
triggers and PLs still present for 15 years).

Coming back to this thread... Removing abstime and reltime sounds like
the good answer to conclude this thread once the deprecation period
expires.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi column range partition table

2017-07-17 Thread Robert Haas
On Sun, Jul 16, 2017 at 6:40 AM, Dean Rasheed  wrote:
> Technically, anything that can be done using INCLUSIVE/EXCLUSIVE can
> also be done using using MINVALUE/MAXVALUE, by artificially adding
> another partitioning column and making it unbounded above/below, but
> that would really just be a hack, and it (artificially adding an extra
> column) would be unnecessary if we added INCLUSIVE/EXCLUSIVE support
> in a later release. Thus, I think the 2 features would complement each
> other quite nicely.

OK, works for me.  I'm not really keen about the MINVALUE/MAXVALUE
syntax -- it's really +/- infinity, not a value at all -- but I
haven't got a better proposal and yours at least has the virtue of
perhaps being familiar to those who know about Oracle.

Do you want to own this open item, then?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2017-07-17 Thread Michael Paquier
On Thu, Jul 6, 2017 at 3:48 PM, Heikki Linnakangas  wrote:
> On 07/03/2017 06:30 PM, Chapman Flack wrote:
>> Although it's moot in the straightforward approach of re-zeroing in
>> the loop, it would still help my understanding of the system to know
>> if there is some subtlety that would have broken what I proposed
>> earlier, which was an extra flag to AdvanceXLInsertBuffer() that
>> would tell it not only to skip initializing headers, but also to
>> skip the WaitXLogInsertionsToFinish() check ... because I have
>> the entire region reserved and I hold all the writer slots
>> at that moment, it seems safe to assure AdvanceXLInsertBuffer()
>> that there are no outstanding writes to wait for.
>
> Yeah, I suppose that would work, too.

FWIW, I would rather see any optimization done in
AdvanceXLInsertBuffer() instead of seeing a second memset re-zeroing
the WAL page header after its data has been initialized by
AdvanceXLInsertBuffer() once. That's too late for 10, but you still
have time for a patch to be integrated in 11.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-17 Thread Robert Haas
On Sun, Jul 16, 2017 at 1:03 PM, Tom Lane  wrote:
> Or in other words, I don't want to go from
> "might disappear" in vN to gone in vN+1 with no intermediate state.

I see no problem with that.  First, we remove things all the time with
no deprecation warning at all when we judge that they are dead enough,
or just unsalvageable.  Second, if we have said that something might
disappear and then it disappears, anyone who is unhappy about that is
being unreasonable.

In other words, I don't want to have a project policy that we will not
only put a deprecation notice on everything we remove, but it will
always be worded in a certain way.  If you're trying to streamline the
process of deprecating features, that's going in the exact wrong
direction.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unportable use of select for timeouts in PostgresNode.pm

2017-07-17 Thread Michael Paquier
On Mon, Jul 17, 2017 at 4:48 PM, Andrew Dunstan
 wrote:
> This is a usage that is known not to work in Windows - IIRC we
> eliminated such calls from our C programs at the time of the Windows
> port - and it seems to me very likely to be the cause of the hang.
> Instead I think we should use the usleep() function from the standard
> (from 5.8) Perl module Time::HiRes, as recommended in the Perl docs for
> the sleep() function for situations where you need finer grained
> timeouts. I have verified that this works on jacana and friends.

Looking at my boxes (Arch, Mac, Windows), Time::Hires looks to be part
of the core set of packages, so there is visibly no real need to
incorporate a check in configure.in. So +1 for doing as you suggest.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2017-07-17 Thread Michael Paquier
On Fri, Jun 23, 2017 at 6:08 AM, Chapman Flack  wrote:
> Well, gzip was doing pretty well; it could get a 16 MB segment file down
> to under 27 kB, or less than 14 bytes for each of 2000 pages, when a page
> header is what, 20 bytes, it looks like? I'm not sure how much better
> I'd expect a (non-custom) compression scheme to do. The real difference
> comes between compressing (even well) a large unchanged area, versus being
> able to recognize (again with a non-custom tool) that the whole area is
> unchanged.

Have you tried as well lz4 for your cases? It performs faster than
gzip at minimum compression and compresses less, but I am really
wondering if for almost zero pages it performs actually better.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in HEAD when too many nested functions call

2017-07-17 Thread Andres Freund
Hi,

On 2017-07-17 00:26:29 -0700, Andres Freund wrote:
> I'm less convinced of that, due to the overhead argument.  I think
> there's a couple ways around that however:
> 
> 1) We could move ExecProcNode() to be callback based. The first time a
>node is executed a "wrapper" function is called that does the stack
>and potentially other checks. That also makes ExecProcNode() small
>enough to be inlined, which ends up being good for jump target
>prediction.   I've done something similar for v11 for expression
>evaluation, getting rid of EEOP_*_FIRST duplication etc, and it seems
>to work well.  The big disadvantage to that is that it's a bit
>invasive for v10, and very likely too invasive to backpatch.
> 2) I think there's some fair argument to be made that ExecInitNode()'s
>stack-space needs are similar enough to ExecProcNode()'s allowing us
>to put a check_stack_depth() into the former.  That seems like it's
>required anyway, since in many cases that's going to trigger
>stack-depth exhaustion first anyway (unless we hit it in parse
>analysis, which also seems quite common).

Attached is a trivial patch implementing 1) and a proof-of-concept hack
for 2).

The latter obviously isn't ready, but might make clearer what I'm
thinking about. If we were to go this route we'd have to probably move
the callback assignment into the ExecInit* routines, and possibly
replace the switch in ExecInitNode() with another callback, assigned in
make_*, and implement callbacks for ExecShutdown, MultiExecProcNode etc.

This results in a good speedup in tpc-h btw:
master total min: 46434.897 cb min: 44778.228 [diff -3.70]

- Andres
>From 06b88ddf63427e1f6aeb49feb48916c3351e3380 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 17 Jul 2017 00:33:49 -0700
Subject: [PATCH 1/2] Check stack-depth when initializing executor nodes.

Previously we only checked stack-depth during parse-analysis, but
that's not necessarily sufficient. In v10, where targetlist SRF
evaluation now is executor node based, this can indeed cause
problems. It's quite possible that earlier versions are also affected
in a bit different manner, or might become vulnerable due to future
changes.

To shore this up, add a stack-depth check to ExecInitNode(). As that's
called in the same recursive manner as ExecProcNode() /
MultiExecProcNode(), it should have similar stack usage as those,
without incurring noticeable overhead in queries processing many rows.

Author: Andres Freund
Reported-By: Julien Rouhaud
Discussion:
https://postgr.es/m/22833.1490390...@sss.pgh.pa.us
https://postgr.es/m/b0af9eaa-130c-60d0-9e4e-7a135b1e0...@dalibo.com
Backpatch: 9.2-, issue has existed for a long while
---
 src/backend/executor/execProcnode.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 294ad2cff9..360479d081 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -143,6 +143,15 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 	List	   *subps;
 	ListCell   *l;
 
+	/*
+	 * Make sure there's enough stack available. Check that here, rather than
+	 * ExecProcNode() / MultiExecProcNode(), to avoid incurring overhead for
+	 * every single tuple. The assumption making this valid is that the
+	 * difference in stack use between node initialization and execution
+	 * should be far less than the safety margin from check_stack_depth().
+	 */
+	check_stack_depth();
+
 	/*
 	 * do nothing when we get to the end of a leaf on tree.
 	 */
-- 
2.13.1.392.g8d1b10321b.dirty

>From 2321fb83a8973c970ec934d3c1f121e739a7a20a Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 17 Jul 2017 07:50:40 -0700
Subject: [PATCH 2/2] WIP: Move to callback based executor nodes.

---
 src/backend/executor/execProcnode.c | 436 +++-
 src/include/executor/executor.h |  30 ++-
 src/include/nodes/execnodes.h   |  13 ++
 3 files changed, 118 insertions(+), 361 deletions(-)

diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 360479d081..9d8727f41c 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -122,6 +122,15 @@
 #include "miscadmin.h"
 
 
+
+static TupleTableSlot *ExecProcNodeFirst(PlanState *node);
+static TupleTableSlot *ExecProcNodeInstr(PlanState *node);
+
+#define INIT_CB(node, name) \
+	(node)->ExecProcNode = ExecProcNodeFirst; \
+	(node)->ExecProcNodeReal = (ExecProcNodeCB) Exec##name; \
+	(node)->ExecEndNode = (ExecEndNodeCB) ExecEnd##name
+
 /* 
  *		ExecInitNode
  *
@@ -166,41 +175,49 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 		case T_Result:
 			result = (PlanState *) ExecInitResult((Result *) node,
   estate, eflags);
+			INIT_CB(result, Result);

[HACKERS] Unportable use of select for timeouts in PostgresNode.pm

2017-07-17 Thread Andrew Dunstan

I've been trying to get to the bottom of a nasty hang in buildfarm
member jacana when running the pg_ctl TAP test. This test used to work,
and was last known to work on June 22nd.

My attention has become focussed on this change in commit de3de0afd:

-   # Wait a second before retrying.
-   sleep 1;
+   # Wait 0.1 second before retrying.
+   select undef, undef, undef, 0.1;

This is a usage that is known not to work in Windows - IIRC we
eliminated such calls from our C programs at the time of the Windows
port - and it seems to me very likely to be the cause of the hang.
Instead I think we should use the usleep() function from the standard
(from 5.8) Perl module Time::HiRes, as recommended in the Perl docs for
the sleep() function for situations where you need finer grained
timeouts. I have verified that this works on jacana and friends.

Unless I hear objections I'll prepare a patch along those lines.


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore failed for foreign key constraint

2017-07-17 Thread Tom Lane
Neha Sharma  writes:
> I am getting the below error while restoring data having foreign key
> constraint.

> [edb@localhost bin]$ ./createdb test1
> [edb@localhost bin]$ ./pg_restore -d test1 -c -e a.dump
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 2903; 2606 16399 FK
> CONSTRAINT test_tbl test_tbl_fk_c1 edb
> pg_restore: [archiver (db)] could not execute query: ERROR:  schema "test"
> does not exist
> Command was: ALTER TABLE ONLY test.test_tbl DROP CONSTRAINT
> test_tbl_fk_c1;

> Is this an expected behaviour?

Yes, it is.  Either don't use -c (it's entirely useless when restoring
into an empty database), or specify --if-exists, or ignore the errors.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

2017-07-17 Thread Shubham Barai
Hi,

I am attaching a patch for predicate locking in gin index.


Regards,
Shubham



 Sent with Mailtrack


On 11 July 2017 at 19:10, Shubham Barai  wrote:

> Project: Explicitly support predicate locks in index AMs besides b-tree
>
>
> I have done following tasks during this week.
>
> 1) worked on how to detect rw conflicts when fast update is enabled
>
> 2) created tests for different gin operators
>
> 3) went through some patches on commitfest to review
>
> 4) solved some issues that came up while testing
>
> link to the code: https://github.com/shubhambaraiss/postgres/commit/
> 1365d75db36a4e398406dd266c3d4fe8e1ec30ff
>
>  Sent with Mailtrack
> 
>


0001-Predicate-locking-in-gin-index.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing initplans

2017-07-17 Thread Amit Kapila
On Tue, Mar 28, 2017 at 7:25 AM, Amit Kapila  wrote:
> On Thu, Mar 16, 2017 at 2:34 AM, Kuntal Ghosh
>  wrote:
>> On Tue, Mar 14, 2017 at 3:20 PM, Amit Kapila  wrote:
>>> Based on that idea, I have modified the patch such that it will
>>> compute the set of initplans Params that are required below gather
>>> node and store them as bitmap of initplan params at gather node.
>>> During set_plan_references, we can find the intersection of external
>>> parameters that are required at Gather or nodes below it with the
>>> initplans that are passed from same or above query level. Once the set
>>> of initplan params are established, we evaluate those (if they are not
>>> already evaluated) before execution of gather node and then pass the
>>> computed value to each of the workers.   To identify whether a
>>> particular param is parallel safe or not, we check if the paramid of
>>> the param exists in initplans at same or above query level.  We don't
>>> allow to generate gather path if there are initplans at some query
>>> level below the current query level as those plans could be
>>> parallel-unsafe or undirect correlated plans.
>>
>> I would like to mention different test scenarios with InitPlans that
>> we've considered while developing and testing of the patch.
>>
>
> Thanks a lot Kuntal for sharing different test scenarios.
> Unfortunately, this patch doesn't received any review till now, so
> there is no chance of making it in to PostgreSQL-10.  I have moved
> this to next CF.
>

Attached is a rebased version of the patch with below changes:
a. SubplanState now directly stores Subplan rather than ExprState, so
patch needs some adjustment in that regard.
b. While rejecting the paths (based on if there are initplans at level
below the current query level) for parallelism, the rejected paths
were not marked as parallel unsafe.  Due to this in
force_parallel_mode=regress, we were able to add gather node above
parallel unsafe paths.  The modified patch ensures to mark such paths
as parallel unsafe.
c. Added regression test.
d. Improve comments in the code.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pq_pushdown_initplan_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More flexible LDAP auth search filters?

2017-07-17 Thread Magnus Hagander
On Mon, Jul 17, 2017 at 1:23 AM, Stephen Frost  wrote:

>
> * Magnus Hagander (mag...@hagander.net) wrote:
> > On Sun, Jul 16, 2017 at 11:05 PM, Stephen Frost 
> wrote:
> > > I'd suggest that we try to understand why Kerberos couldn't be used in
> > > that environment.  I suspect in at least some cases what users would
> > > like is the ability to do Kerberos auth but then have LDAP checked to
> > > see if a given user (who has now auth'd through Kerberos) is allowed to
> > > connect.  We don't currently have any way to do that, but if we were
> > > looking for things to do, that's what I'd suggest working on rather
> than
> > > adding more to our LDAP auth system and implying by doing so that it's
> > > reasonable to use.
> > >
> > > I find it particularly disappointing to see recommendations for using
> > > LDAP auth, particularly in AD environments, that don't even mention
> > > Kerberos or bother to explain how using LDAP sends the user's PW to the
> > > server in cleartext.
> >
> > You do realize, I'm sure, that there are many LDAP servers out there that
> > are not AD, and that do not come with a Kerberos server attached to
> them...
>
> I am aware that some exist, I've even contributed to their development
> and packaging, but that doesn't make it a good idea to use them for
> authentication.
>

Pretty sure that doesn't include any of the ones I'm talking about, but
sure :)



> > I agree that Kerberos is usually the better choice *if it's available*.
>
> Which is the case in an AD environment..
>

Yes. But we shouldn't force everybody to use AD :P



> > It's several orders of magnitude more complicated to set up though, and
> > there are many environments that have ldap but don't have Kerberos.
>
> Frankly, I simply don't agree with this.
>

Really?

For LDAP auth I don't need to do *anything* in preparation. For Kerberos
auth I need to create an account, set encryption type, export keys, etc.
You can't possibly claim this is the same level of complexity?




> > Refusing to improve LDAP for the users who have no choice seems like a
> very
> > unfriendly thing to do.
>
> I'm fine with improving LDAP in general, but, as I tried to point out,
> having a way to make it easier to integrate PG into an AD environment
> would be better.  It's not my intent to stop this patch but rather to
> point out the issues with LDAP auth that far too frequently are not
> properly understood.
>

A documentation patch for that would certainly be a good place to start.
Perhaps with up to date instructions for how to actually set up Kerberos in
an AD environment including all steps required?



> > (And you can actually reasonably solve the case of
> > kerberos-for-auth-ldap-for-priv by syncing the groups into postgres
> roles)
>
> Yes, but sync'ing roles is a bit of a pain and it'd be nice if we could
> avoid it, or perhaps make it easier.
>

Certainly.

//Magnus


Re: [HACKERS] More flexible LDAP auth search filters?

2017-07-17 Thread Magnus Hagander
On Sun, Jul 16, 2017 at 7:58 PM, Mark Cave-Ayland <
mark.cave-ayl...@ilande.co.uk> wrote:

> On 16/07/17 00:08, Thomas Munro wrote:
>
> > On Fri, Jul 14, 2017 at 11:04 PM, Magnus Hagander 
> wrote:
> >> On Thu, Jul 13, 2017 at 9:31 AM, Thomas Munro
> >>  wrote:
> >>> A post on planet.postgresql.org today reminded me that a colleague had
> >>> asked me to post this POC patch here for discussion.  It allows custom
> >>> filters with ldapsearchprefix and ldapsearchsuffix.  Another approach
> >>> might be to take a filter pattern with "%USERNAME%" or whatever in it.
> >>> There's an existing precedent for the prefix and suffix approach, but
> >>> on the other hand a pattern approach would allow filters where the
> >>> username is inserted more than once.
> >>
> >>
> >> Do we even need prefix/suffix? If we just make it "ldapsearchpattern",
> then
> >> you could have something like:
> >>
> >> ldapsearchattribute="uid"
> >> ldapsearchfilter="|(memberof=cn=Paris DBA Team)(memberof=cn=Tokyo DBA
> Team)"
> >>
> >> We could then always to substitution of the kind:
> >> (&(attr=)())
> >>
> >> which would in this case give:
> >> (&(uid=mha)(|(memberof=cn=Paris DBA Team)(memberof=cn=Tokyo DBA Team)))
> >>
> >>
> >> Basically we'd always AND together the username lookup with the
> additional
> >> filter.
> >
> > Ok, so we have 3 ideas put forward:
> >
> > 1.  Wrap username with ldapsearchprefix ldapsearchsuffix to build
> > filter (as implemented by POC patch)
> > 2.  Optionally AND ldapsearchfilter with the existing
> > ldapsearchattribute-based filter (Magnus's proposal)
> > 3.  Pattern-based ldapsearchfilter so that %USER% is replaced with
> > username (my other suggestion)
> >
> > The main argument for approach 1 is that it follows the style of the
> > bind-only mode.
> >
> > With idea 2, I wonder if there are some more general kinds of things
> > that people might want to do that that wouldn't be possible because it
> > has to include (attribute=user)... perhaps something involving a
> > substring or other transformation functions (but I'm no LDAP expert,
> > that may not make sense).
> >
> > With idea 3 it would allow "(|(foo=%USER%)(bar=%USER%))", though I
> > don't know if any such multiple-mention filters would ever make sense
> > in a sane LDAP configuration.
> >
> > Any other views from LDAP-users?
>
> I've spent quite a bit of time integrating various bits of
> non-PostgreSQL software to LDAP and in my experience option 3 tends to
> be the standard.
>
> Generally you find that you will be given the option to set the
> attribute for the default search filter of the form
> "(attribute=username)" which defaults to uid for UNIX-type systems and
> sAMAccountName for AD. However there is always the ability to specify a
> custom filter where the user is substituted via e.g. %u to cover all the
> other use-cases.


Right, but that's something we do already today. It just defaults to uid,
but it's easy to change.



>
> As an example, I don't know if anyone would actually do this with
> PostgreSQL but I've been asked on multiple occasions to configure
> software so that users should be allowed to log in with either their
> email address or username which is easily done with a custom LDAP filter
> like "(|(mail=%u)(uid=%u))".
>

How would that actually work, though? Given the same user in ldap could now
potentially match multiple different users in PostgreSQL. Would you then
create two accounts for the user, one with the uid as name and one with
email as name? Wouldn't that actually cause more issues than it solves?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-17 Thread Fabien COELHO


Hello,

Is this bias expected from the drawing method, say because it is 
approximated and the approximation is weak at some points, or is there 
an issue with its implementation, says some shift which gets smoothed 
down for higher indexes?


I have checked paper where such implementation was proposed and there 
theta allowed only on range between 0 and 1. It seems like it is not 
guaranteed that it should work well when theta is more than 1.


Ok.

I see a significant issue with having a random_zipfian function which does 
not really return a zipfian distribution under some parameter values. If 
there is no better alternative, I would suggest to restrict the parameter 
for values between 0 and 1, or to find a better approximation for theta >= 
0.



I am attaching paper, see page 23.


Thanks for the paper. It reminds me that I intended to propose a 
parametric pseudo-random permutation for pgbench, some day.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-17 Thread Alik Khilazhev

> On 17 Jul 2017, at 13:51, Fabien COELHO  wrote:
> 
> 
> Is this bias expected from the drawing method, say because it is approximated 
> and the approximation is weak at some points, or is there an issue with its 
> implementation, says some shift which gets smoothed down for higher indexes?
> 

I have checked paper where such implementation was proposed and there theta 
allowed only on range between 0 and 1. It seems like it is not guaranteed that 
it should work well when theta is more than 1.

I am attaching paper, see page 23.


syntheticdatagen.pdf
Description: Adobe PDF document

—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-17 Thread Fabien COELHO



Ok, so you did not get the large bias for i=3. Strange.


I got large bias for i=3 and theta > 1 even with a million outcomes,


Ok. So this is similar to what I got.

Is this bias expected from the drawing method, say because it is 
approximated and the approximation is weak at some points, or is there an 
issue with its implementation, says some shift which gets smoothed down 
for higher indexes?


I am attaching patch v3. Among other things I fixed small typo in 
description of random_exponential function in pgbench.sgml file.


I'll look into that.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 10beta1/m68k: static assertion failed: "MAXALIGN too small to fit int32"

2017-07-17 Thread Christoph Berg
Re: Heikki Linnakangas 2017-05-18 <5b9085c2-2c18-e5e3-c340-c07d11a9c...@iki.fi>
> > Please go ahead, I don't think I have online access to a m68k machine.
> > (It got demoted to an unofficial port some time ago and the old Debian
> > porter machines got taken down).
> 
> Ok, pushed, let's see if the port machine likes it.

The build works now, thanks!

https://people.debian.org/~glaubitz/postgresql-10_10~beta2-1+b2_m68k.build

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_restore failed for foreign key constraint

2017-07-17 Thread Neha Sharma
Hi,

I am getting the below error while restoring data having foreign key
constraint.

./psql postgres
psql (10beta2)
Type "help" for help.

postgres=# create user test;
CREATE ROLE
postgres=# create schema test;
CREATE SCHEMA
postgres=# grant all on SCHEMA test to test;
GRANT
postgres=# set search_path to 'test';
SET
postgres=# CREATE TABLE test_tbl_fk (c1 INTEGER PRIMARY KEY);
CREATE TABLE
postgres=# CREATE TABLE test_tbl (c1 INTEGER PRIMARY KEY, c2 INTEGER, c3
VARCHAR,
postgres(#   CONSTRAINT test_tbl_fk_c1 FOREIGN KEY (c1) REFERENCES
test_tbl_fk(c1));
CREATE TABLE
postgres=# insert into test_tbl_fk values (1),(2),(3);
INSERT 0 3
postgres=# INSERT INTO test_tbl VALUES (1,1,'p11');
INSERT 0 1
postgres=# INSERT INTO test_tbl VALUES (2,2,'p11');
INSERT 0 1
postgres=# INSERT INTO test_tbl VALUES (3,3,'p11');
INSERT 0 1
postgres=# show  search_path ;
 search_path
-
 test
(1 row)

postgres=# \q
[edb@localhost bin]$ ./pg_dump -f a.dump -Fc postgres
[edb@localhost bin]$ ./createdb test1
[edb@localhost bin]$ ./pg_restore -d test1 -c -e a.dump
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 2903; 2606 16399 FK
CONSTRAINT test_tbl test_tbl_fk_c1 edb
pg_restore: [archiver (db)] could not execute query: ERROR:  schema "test"
does not exist
Command was: ALTER TABLE ONLY test.test_tbl DROP CONSTRAINT
test_tbl_fk_c1;

Is this an expected behaviour?

-- 
Regards,
Neha Sharma


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-17 Thread Alik Khilazhev
Hello Fabien,On 14 Jul 2017, at 17:51, Fabien COELHO  wrote:Ok, so you did not get the large bias for i=3. Strange.I got large bias for i=3 and theta > 1 even with a million outcomes, but for theta < 1 (I have tested on theta = 0.1 and 0.3) it showed quite good results.I am attaching patch v3. Among other things I fixed small typo in description of random_exponential function in pgbench.sgml file.

pgbench-zipf-03v.patch
Description: Binary data
—Thanks and Regards,Alik KhilazhevPostgres Professional:http://www.postgrespro.comThe Russian Postgres Company

Re: [HACKERS] Pluggable storage

2017-07-17 Thread Alexander Korotkov
On Sun, Jul 16, 2017 at 3:58 AM, Peter Geoghegan  wrote:

> I strongly agree. I simply don't understand how you can adopt UNDO for
> MVCC, and yet expect to get a benefit commensurate with the effort
> without also implementing "retail index tuple deletion" first.
> Pursuing UNDO this way has the same problem that WARM likely has -- it
> doesn't really help with the worst case, where users get big,
> unpleasant surprises. Postgres is probably the only major database
> system that doesn't support retail index tuple deletion. It's a basic
> thing, that has nothing to do with MVCC. Really, what do we have to
> lose?
>

I think that "retail index tuple deletion" is the feature which could give
us some advantages even independently from pluggable storages.  For
example, imagine very large table with only small amount of dead tuples.
In this case, it would be cheaper to delete index links to those dead
tuples one by one using "retail index tuple deletion", rather than do full
scan of every index to perform "bulk delete" of index tuples.  One may
argue that you shouldn't do vacuum of large table when only small amount of
tuples are dead.  But in terms of index bloat mitigation, very aggressive
vacuum strategy could be justified.

I agree with Robert that being able to store an arbitrary payload as a
> TID is probably not going to ever work very well.


Support of arbitrary payload as a TID doesn't sound easy.  However, that
doesn't mean it's unachievable. For me, it's more like long way which could
be traveled step by step.  Some of our existing index access methods
(B-tree, hash, GiST, SP-GiST) may support arbitrary payload relatively
easy, because they are not relying on its internal structure.  For others
(GIN, BRIN) arbitrary payload is much harder to support, but I wouldn't say
it's impossible.  However, if we make arbitrary payload support an option
of index AM and implement this support for first group of index AMs, it
would be already great step forward.  So, for sample, it would be possible
to use indirect indexes when primary key is not 6-bytes, if index AM
supports arbitrary payload.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] segfault in HEAD when too many nested functions call

2017-07-17 Thread Andres Freund
Hi,

On 2017-07-15 11:22:37 -0400, Tom Lane wrote:
> The thread drifted off without any resolution, but clearly we need to
> do something before 10.0 final.

We need to do something, I'm less convinced that it's really v10
specific :/


> "SELECT so()" has a nontrivial targetlist so we end up running
> ExecBuildProjectionInfo on that, meaning that a fresh expression
> compilation happens at every nesting depth, and there are
> check_stack_depth calls in expression compilation.  Surely that's
> something we'd try to improve someday.  Or it could accidentally get
> broken by unrelated changes in the way plpgsql sets up queries to be
> executed.

Independent of $subject: What are you thinking of here? You want to
avoid the ExecBuildProjectionInfo() in more cases - I'm unconvinced
that's actually helpful. In my WIP JIT compilation patch the
ExecBuildProjectionInfo() ends up being a good bit faster than paths
avoiding it.


> I still think that we really need to add a check in ExecProcNode().
> Even if there's an argument to be made that every recursion would
> somewhere go through ExecMakeTableFunctionResult, very large/complex
> queries could result in substantial stack getting chewed up before
> we get to that --- and we don't have an infinite amount of stack slop.

I'm less convinced of that, due to the overhead argument.  I think
there's a couple ways around that however:

1) We could move ExecProcNode() to be callback based. The first time a
   node is executed a "wrapper" function is called that does the stack
   and potentially other checks. That also makes ExecProcNode() small
   enough to be inlined, which ends up being good for jump target
   prediction.   I've done something similar for v11 for expression
   evaluation, getting rid of EEOP_*_FIRST duplication etc, and it seems
   to work well.  The big disadvantage to that is that it's a bit
   invasive for v10, and very likely too invasive to backpatch.
2) I think there's some fair argument to be made that ExecInitNode()'s
   stack-space needs are similar enough to ExecProcNode()'s allowing us
   to put a check_stack_depth() into the former.  That seems like it's
   required anyway, since in many cases that's going to trigger
   stack-depth exhaustion first anyway (unless we hit it in parse
   analysis, which also seems quite common).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers