Re: [HACKERS] 10.0

2016-06-16 Thread Craig Ringer
On 15 June 2016 at 06:48, David G. Johnston 
wrote:


>
> ​We could stand to be more explicit here.  The docs for version()
> indicated the server_version_num should be used for "machine processing".
>
> The implied correct way to access the canonical server version is thus:
>
> SELECT current_setting('server_version_num');
>
>
Or get server_version from the GUC_REPORT params sent at connect-time,
avoiding a round-trip. That's how drivers do it.

Client application should just ask their driver, they shouldn't need to be
poking around to get the version directly.

It'd be better if server_version_num was also GUC_REPORT, but it isn't. I
still think it should be.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] New design for FK-based join selectivity estimation

2016-06-16 Thread Tomas Vondra

Hi,

On 06/16/2016 01:00 AM, Tom Lane wrote:

Tomas Vondra  writes:

Attached is a reworked patch, mostly following the new design proposal
from this thread.


I've whacked this around quite a bit and am now reasonably happy with it.
The issue of planning speed hit should be pretty much gone, although I've
not done testing to prove that.  I've rearranged the actual selectivity
calculation some too, because tests I did here did not look very good for
anything but the plain-inner-join case.  It may be that more work is
needed there, but at least there's reasonable commentary about what we're
doing and why.


Thanks for getting the patch into a much better shape. I've quickly 
reviewed the patch this morning before leaving to the airport - I do 
plan to do additional review/testing once in the air or perhaps over the 
weekend. So at the moment I only have a few minor comments:


1) Shouldn't we define a new macro in copyfuncs.c, to do the memcpy for 
us? Seems a bit strange we have macros for everything else.


2) I'm wondering whether removing the restrict infos when

if (fkinfo->eclass[i] == rinfo->parent_ec)

is actually correct. Can't the EC include conditions that do not match 
the FK? I mean something like this:


  CREATE TABLE a (id1 INT PRIMARY KEY, id2 INT);
  CREATE TABLE b (id1 INT REFERENCES a (id1), id2 INT);

and then something like

  SELECT * FROM a JOIN b ON (a.id1 = b.id1 AND a.id1 = b.id2)

I've been unable to trigger this issue with your patch, but I do 
remember I've ran into that with my version, which is why I explicitly 
checked the rinfo again before removing it. Maybe that was incorrect, or 
perhaps your patch does something smart. Or maybe it's just masked by 
the fact that we 'push' the EC conditions to the base relations (which 
however means we're stuck with default equality estimate).


3) I think this comment in get_foreign_key_join_selectivity is wrong and 
should instead say 'to FK':


  /* Otherwise, see if rinfo was previously matched to EC */



I have not adopted the plan of ignoring single-column FKs.  While it would
only take a couple more lines to do so, I think the argument that there
will be a material planning speed hit no longer has much force.  And I see
a couple of arguments in favor of allowing this code to trigger on
single-column FKs.  First, it should work well even when pg_statistic data
is missing or out of date, which would be an improvement; and second, we
are likely not going to get much beta testing of the behavior if we
restrict it to multi-col FKs.  So I think we should ship it like this for
beta even if we end up adding a filter against single-column FKs for
release.


OK

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MultiXactId error after upgrade to 9.3.4

2016-06-16 Thread Alvaro Herrera
Alvaro Herrera wrote:
> Andrew Gierth wrote:

> > Why is the correct rule not "check for and ignore pre-upgrade mxids
> > before even trying to fetch members"?
> 
> I propose something like the attached patch, which implements that idea.

Hm, this doesn't apply cleanly to 9.4.  I'll need to come up with a
merge tomorrow.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist

2016-06-16 Thread Amit Kapila
On Fri, Jun 17, 2016 at 3:20 AM, Robert Haas  wrote:
>
> On Thu, Jun 16, 2016 at 12:16 PM, Robert Haas 
wrote:
> > On Thu, Jun 16, 2016 at 8:00 AM, Amit Kapila 
wrote:
> >>> 1. The case originally reported by Thomas Munro still fails.  To fix
> >>> that, we probably need to apply scanjoin_target to each partial path.
> >>> But we can only do that if it's parallel-safe.  It seems like what we
> >>> want is something like this: (1) During scan/join planning, somehow
> >>> skip calling generate_gather_paths for the topmost scan/join rel as we
> >>> do to all the others.  (2) If scanjoin_target is not parallel-safe,
> >>> build a path for the scan/join phase that applies a Gather node to the
> >>> cheapest path and does projection at the Gather node.  Then forget all
> >>> the partial paths so we can't do any bogus upper planning.  (3) If
> >>> scanjoin_target is parallel-safe, replace the list of partial paths
> >>> for the topmost scan/join rel with a new list where scanjoin_target
> >>> has been applied to each one.  I haven't tested this so I might be
> >>> totally off-base about what's actually required here...
> >>
> >> I think we can achieve it just by doing something like what you have
> >> mentioned in (2) and (3).  I am not sure if there is a need to skip
> >> generation of gather paths for top scan/join node.  Please see the
patch
> >> attached.  I have just done some minimal testing to ensure that problem
> >> reported by Thomas Munro in this thread is fixed and verified that the
fix
> >> is sane for problems [1][2] reported by sqlsmith. If you think this is
on
> >> right lines, I can try to do more verification and try to add tests.
> >
> > You can't do it this way because of the issue Tom discussed here:
> >
> > https://www.postgresql.org/message-id/16421.1465828...@sss.pgh.pa.us
>

I think although we are always adding a projection path in
apply_projection_to_path() to subpath of Gather path if target is parallel
safe, still they can be used directly if create_projection_plan() doesn't
add a Result node.  So changing them directly after being pushed below
Gather is not a safe bet.

> Something like what you have there might work if you use
> create_projection_path instead of apply_projection_to_path.
>

Okay, I think you mean to say use create_projection_path() while applying
scanjoin_target to partial path lists in below code:
foreach(lc, current_rel->partial_pathlist)
{
Path   *subpath = (Path *) lfirst(lc);
Assert(subpath->param_info == NULL);
lfirst(lc) = apply_projection_to_path(root, current_rel,
subpath, scanjoin_target,
scanjoin_target_parallel_safe);
}

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Reviewing freeze map code

2016-06-16 Thread Noah Misch
On Wed, Jun 15, 2016 at 08:56:52AM -0400, Robert Haas wrote:
> On Wed, Jun 15, 2016 at 2:41 AM, Thomas Munro
>  wrote:
> > I spent some time chasing down the exact circumstances.  I suspect
> > that there may be an interlocking problem in heap_update.  Using the
> > line numbers from cae1c788 [1], I see the following interaction
> > between the VACUUM, UPDATE and SELECT (pg_check_visible) backends, all
> > in reference to the same block number:
> >
> >   [VACUUM] sets all visible bit
> >
> >   [UPDATE] heapam.c:3931 HeapTupleHeaderSetXmax(oldtup.t_data, 
> > xmax_old_tuple);
> >   [UPDATE] heapam.c:3938 LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
> >
> >   [SELECT] LockBuffer(buffer, BUFFER_LOCK_SHARE);
> >   [SELECT] observes VM_ALL_VISIBLE as true
> >   [SELECT] observes tuple in HEAPTUPLE_DELETE_IN_PROGRESS state
> >   [SELECT] barfs
> >
> >   [UPDATE] heapam.c:4116 visibilitymap_clear(...)
> 
> Yikes: heap_update() sets the tuple's XMAX, CMAX, infomask, infomask2,
> and CTID without logging anything or clearing the all-visible flag and
> then releases the lock on the heap page to go do some more work that
> might even ERROR out.  Only if that other work all goes OK do we
> relock the page and perform the WAL-logged actions.
> 
> That doesn't seem like a good idea even in existing releases, because
> you've taken a tuple on an all-visible page and made it not
> all-visible, and you've made a page modification that is not
> necessarily atomic without logging it.  This is is particularly bad in
> 9.6, because if that page is also all-frozen then XMAX will eventually
> be pointing into space and VACUUM will never visit the page to
> re-freeze it the way it would have done in earlier releases.  However,
> even in older releases, I think there's a remote possibility of data
> corruption.  Backend #1 makes these changes to the page, releases the
> lock, and errors out.  Backend #2 writes the page to the OS.  DBA
> takes a hot backup, tearing the page in the middle of XMAX.  Oops.

I agree the non-atomic, unlogged change is a problem.  A related threat
doesn't require a torn page:

  AssignTransactionId() xid=123
  heapam.c:3931 HeapTupleHeaderSetXmax(oldtup.t_data, 123);
  some ERROR before heap_update() finishes
  rollback;  -- xid=123
  some backend flushes the modified page
  immediate shutdown
  AssignTransactionId() xid=123
  commit;  -- xid=123

If nothing wrote an xlog record that witnesses xid 123, the cluster can
reassign it after recovery.  The failed update is now considered a successful
update, and the row improperly becomes dead.  That's important.

I don't know whether the 9.6 all-frozen mechanism materially amplifies the
consequences of this bug.  The interaction with visibility map and freeze map
is not all bad; indeed, it can reduce the risk of experiencing consequences
from the non-atomic, unlogged change bug.  If the row is all-visible when
heap_update() starts, every transaction should continue to consider the row
visible until heap_update() finishes successfully.  If an ERROR interrupts
heap_update(), visibility verdicts should be as though the heap_update() never
happened.  If one of the previously-described mechanisms would make an xmax
visibility test give the wrong answer, an all-visible bit could mask the
problem for awhile.  Having said that, freeze map hurts in scenarios involving
toast_insert_or_update() failures and no crash recovery.  Instead of VACUUM
cleaning up the aborted xmax, that xmax could persist long enough for its xid
to be reused in a successful transaction.  When some other modification
finally clears all-frozen and all-visible, the row improperly becomes dead.
Both scenarios are fairly rare; I don't know which is more rare.  [Disclaimer:
I have not built tests cases to verify those alleged failure mechanisms.]

If we made this pre-9.6 bug a 9.6 open item, would anyone volunteer to own it?
Then we wouldn't need to guess whether 9.6 will be safer with the freeze map
or safer without the freeze map.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist

2016-06-16 Thread Tom Lane
Amit Kapila  writes:
> On Thu, Jun 16, 2016 at 11:49 PM, Tom Lane  wrote:
>> But you need to include  to use INT_MAX ...

> I wonder why it has not given me any compilation error/warning.  I have
> tried it on both Windows and CentOS, before sending the patch.

Some platforms seem to make it available from other headers too.
But AFAIK, POSIX only requires  to provide it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MultiXactId error after upgrade to 9.3.4

2016-06-16 Thread Alvaro Herrera
Andrew Gierth wrote:

> But that leaves an obvious third issue: it's all very well to ignore the
> pre-upgrade (pre-9.3) mxid if it's older than the cutoff or it's in the
> future, but what if it's actually inside the currently valid range?
> Looking it up as though it were a valid mxid in that case seems to be
> completely wrong and could introduce more subtle errors.

You're right, we should not try to resolve a multixact coming from the
old install in any case.

> (It can, AFAICT, be inside the currently valid range due to wraparound,
> i.e. without there being a valid pg_multixact entry for it, because
> AFAICT in 9.2, once the mxid is hinted dead it is never again either
> looked up or cleared, so it can sit in the tuple xmax forever, even
> through multiple wraparounds.)

HeapTupleSatisfiesVacuum removes very old multixacts; see the
HEAP_IS_LOCKED block starting in line 1162 where we set
HEAP_XMAX_INVALID for multixacts that are not running by falling
through.  It's not a strict guarantee but this probably explains why
this problem is not more common.

> Why is the correct rule not "check for and ignore pre-upgrade mxids
> before even trying to fetch members"?

I propose something like the attached patch, which implements that idea.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index 88f7137..e20e7f8 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -158,8 +158,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 
 values[Atnum_ismulti] = pstrdup("true");
 
-allow_old = !(infomask & HEAP_LOCK_MASK) &&
-	(infomask & HEAP_XMAX_LOCK_ONLY);
+allow_old = HEAP_LOCKED_UPGRADED(infomask);
 nmembers = GetMultiXactIdMembers(xmax, , allow_old,
  false);
 if (nmembers == -1)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 22b3f5f..9d2de7f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -5225,8 +5225,7 @@ l5:
 		 * pg_upgrade; both MultiXactIdIsRunning and MultiXactIdExpand assume
 		 * that such multis are never passed.
 		 */
-		if (!(old_infomask & HEAP_LOCK_MASK) &&
-			HEAP_XMAX_IS_LOCKED_ONLY(old_infomask))
+		if (HEAP_LOCKED_UPGRADED(old_infomask))
 		{
 			old_infomask &= ~HEAP_XMAX_IS_MULTI;
 			old_infomask |= HEAP_XMAX_INVALID;
@@ -5586,6 +5585,7 @@ l4:
 int			i;
 MultiXactMember *members;
 
+/* XXX do we need a HEAP_LOCKED_UPGRADED test? */
 nmembers = GetMultiXactIdMembers(rawxmax, , false,
 	 HEAP_XMAX_IS_LOCKED_ONLY(old_infomask));
 for (i = 0; i < nmembers; i++)
@@ -6144,14 +6144,14 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	bool		has_lockers;
 	TransactionId update_xid;
 	bool		update_committed;
-	bool		allow_old;
 
 	*flags = 0;
 
 	/* We should only be called in Multis */
 	Assert(t_infomask & HEAP_XMAX_IS_MULTI);
 
-	if (!MultiXactIdIsValid(multi))
+	if (!MultiXactIdIsValid(multi) ||
+		HEAP_LOCKED_UPGRADED(t_infomask))
 	{
 		/* Ensure infomask bits are appropriately set/reset */
 		*flags |= FRM_INVALIDATE_XMAX;
@@ -6164,14 +6164,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 		 * was a locker only, it can be removed without any further
 		 * consideration; but if it contained an update, we might need to
 		 * preserve it.
-		 *
-		 * Don't assert MultiXactIdIsRunning if the multi came from a
-		 * pg_upgrade'd share-locked tuple, though, as doing that causes an
-		 * error to be raised unnecessarily.
 		 */
-		Assert((!(t_infomask & HEAP_LOCK_MASK) &&
-HEAP_XMAX_IS_LOCKED_ONLY(t_infomask)) ||
-			   !MultiXactIdIsRunning(multi,
+		Assert(!MultiXactIdIsRunning(multi,
 	 HEAP_XMAX_IS_LOCKED_ONLY(t_infomask)));
 		if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))
 		{
@@ -6213,10 +6207,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	 * anything.
 	 */
 
-	allow_old = !(t_infomask & HEAP_LOCK_MASK) &&
-		HEAP_XMAX_IS_LOCKED_ONLY(t_infomask);
 	nmembers =
-		GetMultiXactIdMembers(multi, , allow_old,
+		GetMultiXactIdMembers(multi, , false,
 			  HEAP_XMAX_IS_LOCKED_ONLY(t_infomask));
 	if (nmembers <= 0)
 	{
@@ -6777,14 +6769,15 @@ static bool
 DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
 		LockTupleMode lockmode)
 {
-	bool		allow_old;
 	int			nmembers;
 	MultiXactMember *members;
 	bool		result = false;
 	LOCKMODE	wanted = tupleLockExtraInfo[lockmode].hwlock;
 
-	allow_old = !(infomask & HEAP_LOCK_MASK) && HEAP_XMAX_IS_LOCKED_ONLY(infomask);
-	nmembers = GetMultiXactIdMembers(multi, , allow_old,
+	if (HEAP_LOCKED_UPGRADED(infomask))
+		return false;
+
+	nmembers = GetMultiXactIdMembers(multi, , false,
 	 HEAP_XMAX_IS_LOCKED_ONLY(infomask));
 	if (nmembers >= 0)
 	{
@@ -6867,15 +6860,15 @@ Do_MultiXactIdWait(MultiXactId multi, MultiXactStatus status,
    Relation rel, ItemPointer ctid, 

Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist

2016-06-16 Thread Amit Kapila
On Thu, Jun 16, 2016 at 11:49 PM, Tom Lane  wrote:
>
> Amit Kapila  writes:
> > On Mon, Jun 13, 2016 at 10:36 PM, Tom Lane  wrote:
> >> min_parallel_relation_size, or min_parallelizable_relation_size, or
> >> something like that?
>
> > You are right that such a variable will make it simpler to write tests
for
> > parallel query.  I have implemented such a guc and choose to keep the
name
> > as min_parallel_relation_size.
>
> Pushed with minor adjustments.


Thanks.

> > One thing to note is that in function
> > create_plain_partial_paths(), curently it is using PG_INT32_MAX/3 for
> > parallel_threshold to check for overflow, I have changed it to
INT_MAX/3 so
> > as to be consistent with guc.c.  I am not sure if it is advisable to use
> > PG_INT32_MAX in guc.c as other similar parameters use INT_MAX.
>
> I agree that using INT_MAX is more consistent with the code elsewhere in
> guc.c, and more correct given that we declare the variable in question
> as int not int32.  But you need to include  to use INT_MAX ...
>

I wonder why it has not given me any compilation error/warning.  I have
tried it on both Windows and CentOS, before sending the patch.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [sqlsmith] Failed assertion in postgres_fdw/deparse.c:1116

2016-06-16 Thread Etsuro Fujita

On 2016/06/16 22:00, Robert Haas wrote:

On Thu, Jun 16, 2016 at 7:05 AM, Etsuro Fujita
 wrote:

ISTM that a robuster solution to this is to push down the ft1-ft2-ft3 join
with the PHV by extending deparseExplicitTargetList() and/or something else
so that we can send the remote query like:

select ft1.a, (ft3.a IS NOT NULL) from (ft1 inner join ft2 on ft1.a = ft2.a)
left join ft3 on ft1.a = ft3.a



I completely agree we should have that, but not for 9.6.


OK, I'll work on that for 10.0.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallelized polymorphic aggs, and aggtype vs aggoutputtype

2016-06-16 Thread Tom Lane
Robert Haas  writes:
> On Thu, Jun 16, 2016 at 6:40 PM, Tom Lane  wrote:
>> Meh.  I think this is probably telling us that trying to repurpose Aggref
>> as the representation of a partial aggregate may have been mistaken.

> I don't mind if you feel the need to refactor this.

I'm not sure yet.  What I think we need to work out first is exactly
how polymorphic parallel aggregates ought to identify which concrete
data types they are using.  There were well-defined rules before,
but how do we adapt those to two levels of aggregate evaluation?
Are we hard-wiring an assumption that the aggregate transtype is the
same at both levels?  What assumptions do we want to make about the
relationship of the data transfer type (the lower level's output type)
to the polymorphic input and trans types?

A key point here is that the way that polymorphic aggregate support
functions used to find out what they were doing is that we'd set up
dummy FuncExprs that would satisfy get_call_expr_argtype().  That
relied on the fact that the transfn and finalfn calls were specified
in a way that corresponded to legal SQL function calls.  It's not
clear to me whether that statement still holds for parallelized
aggregates.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallelized polymorphic aggs, and aggtype vs aggoutputtype

2016-06-16 Thread Robert Haas
On Thu, Jun 16, 2016 at 6:40 PM, Tom Lane  wrote:
> Meh.  I think this is probably telling us that trying to repurpose Aggref
> as the representation of a partial aggregate may have been mistaken.
> Or maybe just that fix_combine_agg_expr was a bad idea.  It seems likely
> to me that that could have been dispensed with altogether in return for
> slightly more work in create_grouping_paths, for instance transforming
> the upper-level Aggrefs into something that looks more like
> Aggref(PartialAggref(original-arguments))
> whereupon the pre-existing match logic should be sufficient to replace
> the "PartialAggref(original-arguments)" subtree with a suitable Var
> in the upper aggregation plan node.

I don't mind if you feel the need to refactor this.  In David's
original patch, he had a PartialAggref node that basically acted as a
wrapper for an Aggref node, effectively behaving like a flag on the
underlying Aggref.  I thought that was ugly and argued for including
the required information in the Aggref itself.  Now what you are
proposing here is a bit different and it may work better, but there
are tricky things like how it all deparses in the EXPLAIN output - you
may recall that I fixed a problem in that area a while back.  We're
trying to represent a concept that doesn't exist at the SQL level, and
that makes things a bit complicated: if the Aggref's input is a
PartialAggref, will the deparse code be able to match that up to the
correct catalog entry?  But if you've got a good idea, have at it.

>> aggtype is one of the fields
>> that gets compared as part of that process, and it won't be the same
>> if you make aggtype mean the result of that particular node rather
>> than the result of the aggregate.  nodeAgg.c also uses aggtype for
>> matching purposes; not sure if there is a similar problem there or
>> not.
>
> AFAICS, nodeAgg.c's check on that is not logically necessary: if you have
> identical inputs and the same aggregate function then you should certainly
> expect the same output type.  The only reason we're making it is that the
> code originally used equal() to detect identical aggregates, and somebody
> slavishly copied all the comparisons when splitting up that test so as to
> distinguish "identical inputs" from "identical aggregates".  I'll reserve
> judgement on whether search_indexed_tlist_for_partial_aggref has any idea
> what it's doing in this regard.

Sure, it's all cargo-cult programming.  I don't want to presume to
read David's mind, but I think we were both thinking that minimizing
the amount of tinkering that we did would cut down on the likelihood
of bugs.  Now, you've found a bug that got through, so we can either
double down on the design we've already picked, or we can change it.
I'm not set on one approach or the other; I thought what we ended up
was fairly reasonable given the bloated mess that is struct Aggref,
but I've got no deep attachment to it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 10.0

2016-06-16 Thread Greg Stark
On 15 Jun 2016 2:59 pm, "David G. Johnston" 
wrote:
>
> IIRC the plan is to have the machine version behave as if the middle
number is present and always zero.  It would be (the?) one place that the
historical behavior remains visible but it is impossible to have a totally
clean break.

I haven't been keeping up with hackers, sorry if this has been suggested
already but...

Why don't we just *actually* keep the middle digit and *actually* have it
always be zero?

So we would release 10.0.0 and 10.0.1 and the next major release would be
11.0.0.

This would have two benefits:

1) It emphasises that minor releases continue to be safe minor updates that
offer the same stability guarantees. Users would be less likely to be
intimidated by 10.0.1 than they would be 10.1. And it gives users a
consistent story they can apply to any version whether 9.x or 10.0+

2) If we ever do release incompatible feature releases on older branches --
or more likely some fork does -- it gives them a natural way to number
their release.


Re: [HACKERS] Parallelized polymorphic aggs, and aggtype vs aggoutputtype

2016-06-16 Thread Tom Lane
Robert Haas  writes:
> On Thu, Jun 16, 2016 at 4:31 PM, Tom Lane  wrote:
>> However, before trying to fix any of that, I'd like to demand an
>> explanation as to why aggoutputtype exists at all.  It seems incredibly
>> confusing to have both that and aggtype, and certainly the code comments
>> give no hint of what the semantics are supposed to be when those fields
>> are different.  I think aggoutputtype needs to go away again.

> fix_combine_agg_expr, or more search_indexed_tlist_for_partial_aggref,
> needs to be able to match a finalize-aggregate node type to a
> partial-aggregate node type beneath it.

Meh.  I think this is probably telling us that trying to repurpose Aggref
as the representation of a partial aggregate may have been mistaken.
Or maybe just that fix_combine_agg_expr was a bad idea.  It seems likely
to me that that could have been dispensed with altogether in return for
slightly more work in create_grouping_paths, for instance transforming
the upper-level Aggrefs into something that looks more like
Aggref(PartialAggref(original-arguments))
whereupon the pre-existing match logic should be sufficient to replace
the "PartialAggref(original-arguments)" subtree with a suitable Var
in the upper aggregation plan node.

> aggtype is one of the fields
> that gets compared as part of that process, and it won't be the same
> if you make aggtype mean the result of that particular node rather
> than the result of the aggregate.  nodeAgg.c also uses aggtype for
> matching purposes; not sure if there is a similar problem there or
> not.

AFAICS, nodeAgg.c's check on that is not logically necessary: if you have
identical inputs and the same aggregate function then you should certainly
expect the same output type.  The only reason we're making it is that the
code originally used equal() to detect identical aggregates, and somebody
slavishly copied all the comparisons when splitting up that test so as to
distinguish "identical inputs" from "identical aggregates".  I'll reserve
judgement on whether search_indexed_tlist_for_partial_aggref has any idea
what it's doing in this regard.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Update pg_stat_statements extension for parallel query.

2016-06-16 Thread Robert Haas
On Thu, Jun 16, 2016 at 4:51 PM, Vik Fearing  wrote:
> On 10/06/16 17:01, Robert Haas wrote:
>> Update pg_stat_statements extension for parallel query.
>
> I couldn't readily find a review for this patch, and I am unsatisfied
> with it.  I think it's very strange that a 1.4 version would call a
> function labeled 1.3, and when we make a 1.5 the code will look really
> weird because it'll be missing a version.
>
> Attached is my attempt to fix this.  It might not be the best way to do
> it, but I feel that *something* should be done.

Hmm.  I don't think this is solving any real problem, is it?  You're
just adding backward compatibility code to the C files that doesn't
really need to be there.  I don't think it's particularly confusing
that the extension version might sometimes get bumped without changing
the SRF columns.

Another problem with this change is that dropping and redefining the
view will prevent anyone who has a dependency on the view from being
able to update to the latest extension.  It doesn't seem like a wise
idea to force that on users unnecessarily.

(I am sorry you are unsatisfied, though.  I didn't feel a need to post
a detailed review of each of these many patches on the relevant
thread, because they are mostly pretty boilerplate.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallelized polymorphic aggs, and aggtype vs aggoutputtype

2016-06-16 Thread Robert Haas
On Thu, Jun 16, 2016 at 4:31 PM, Tom Lane  wrote:
> However, before trying to fix any of that, I'd like to demand an
> explanation as to why aggoutputtype exists at all.  It seems incredibly
> confusing to have both that and aggtype, and certainly the code comments
> give no hint of what the semantics are supposed to be when those fields
> are different.  I think aggoutputtype needs to go away again.

fix_combine_agg_expr, or more search_indexed_tlist_for_partial_aggref,
needs to be able to match a finalize-aggregate node type to a
partial-aggregate node type beneath it.  aggtype is one of the fields
that gets compared as part of that process, and it won't be the same
if you make aggtype mean the result of that particular node rather
than the result of the aggregate.  nodeAgg.c also uses aggtype for
matching purposes; not sure if there is a similar problem there or
not.

On the other hand, exprType() needs to return the actual output type
of that particular node, which is given by aggoutputtype.  I think
that requirement is non-negotiable, so I suspect the only way to get
by without two fields is if all the stuff for which aggtype is being
used is inessential; in that case, we could remove it and then rename
aggoutputtype back to aggtype.

Thanks for looking at this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist

2016-06-16 Thread Robert Haas
On Thu, Jun 16, 2016 at 12:16 PM, Robert Haas  wrote:
> On Thu, Jun 16, 2016 at 8:00 AM, Amit Kapila  wrote:
>>> 1. The case originally reported by Thomas Munro still fails.  To fix
>>> that, we probably need to apply scanjoin_target to each partial path.
>>> But we can only do that if it's parallel-safe.  It seems like what we
>>> want is something like this: (1) During scan/join planning, somehow
>>> skip calling generate_gather_paths for the topmost scan/join rel as we
>>> do to all the others.  (2) If scanjoin_target is not parallel-safe,
>>> build a path for the scan/join phase that applies a Gather node to the
>>> cheapest path and does projection at the Gather node.  Then forget all
>>> the partial paths so we can't do any bogus upper planning.  (3) If
>>> scanjoin_target is parallel-safe, replace the list of partial paths
>>> for the topmost scan/join rel with a new list where scanjoin_target
>>> has been applied to each one.  I haven't tested this so I might be
>>> totally off-base about what's actually required here...
>>
>> I think we can achieve it just by doing something like what you have
>> mentioned in (2) and (3).  I am not sure if there is a need to skip
>> generation of gather paths for top scan/join node.  Please see the patch
>> attached.  I have just done some minimal testing to ensure that problem
>> reported by Thomas Munro in this thread is fixed and verified that the fix
>> is sane for problems [1][2] reported by sqlsmith. If you think this is on
>> right lines, I can try to do more verification and try to add tests.
>
> You can't do it this way because of the issue Tom discussed here:
>
> https://www.postgresql.org/message-id/16421.1465828...@sss.pgh.pa.us

Something like what you have there might work if you use
create_projection_path instead of apply_projection_to_path.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: integration bloat tables (indexes) to core

2016-06-16 Thread Thomas Kellerer
Tom Lane-2 wrote
> The problem with an extension is: when we make a core change that breaks
> one of these views, which we will, how can you pg_upgrade a database
> with the extension installed?  There's no provision for upgrading an
> extension concurrently with the core upgrade.  Maybe there should be,
> but I'm unclear how we could make that work.
> 
> At the same time, I'm pretty suspicious of putting stuff like this in
> core, because the expectations for cross-version compatibility go up
> by orders of magnitude as soon as we do that.

Why not provide a "SQL" or "Admin Scripts" directory as part of the
installation that contains community "recommended" scripts for things like
that? As those aren't extensions or somehow part of the data directory they
don't need to be migrated and pg_upgrade does not need to take care of that. 

When installing a new version, the new scripts that work with the new
version are installed automatically but will not overwrite the old version's
scripts as the new version typically is stored in a different directory. 

 




--
View this message in context: 
http://postgresql.nabble.com/proposal-integration-bloat-tables-indexes-to-core-tp5907511p5908273.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Parallelized polymorphic aggs, and aggtype vs aggoutputtype

2016-06-16 Thread Tom Lane
I believe I've narrowed down the cause of this failure [1]:

regression=# explain SELECT min(col) FROM enumtest;
ERROR:  type matched to anyenum is not an enum type: anyenum

The core reason for this seems to be that apply_partialaggref_adjustment
fails to consider the possibility that it's dealing with a polymorphic
transtype.  It will happily label the partial aggregate with a polymorphic
aggoutputtype, which is broken in itself (since we won't know the
properties of the output type for purposes such as datum-copying).  But
the above error comes out before we have any opportunity to crash from
that, because we then end up with the upper-level combining aggregate
having an input that is shown as being of polymorphic type, which means
it can't resolve what its transtype is supposed to be.

The business about having a (newly added) separate aggserialtype makes
this doubly dangerous, because if that applies we would be showing the
input to the combining aggregate as being of the serial type, which might
have *nothing* to do with the original input type, thereby breaking
polymorphic transtype resolution even further.  But whether we're using
that or just the transtype, showing the input to the combining aggregate
as being of some type other than the original input is likely to break any
agg implementation function that uses get_call_expr_argtype or similar to
figure out what type it's supposed to be working on.

We might need to fix this by adding the resolved transtype to Aggref
nodes, rather than trying to re-resolve it at executor start.  Some
tracing I did on the way to identifying this bug suggests that that
would save a fair amount of planner and executor-start overhead,
because we are calling get_aggregate_argtypes and thereby
enforce_generic_type_consistency half a dozen times per aggregate,
not to mention resolve_aggregate_transtype itself.

However, before trying to fix any of that, I'd like to demand an
explanation as to why aggoutputtype exists at all.  It seems incredibly
confusing to have both that and aggtype, and certainly the code comments
give no hint of what the semantics are supposed to be when those fields
are different.  I think aggoutputtype needs to go away again.

regards, tom lane

[1] https://www.postgresql.org/message-id/22782.1466100...@sss.pgh.pa.us


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: integration bloat tables (indexes) to core

2016-06-16 Thread Vik Fearing
On 16/06/16 20:31, Jim Nasby wrote:
> On 6/13/16 12:16 PM, Tom Lane wrote:
>> At the same time, I'm pretty suspicious of putting stuff like this in
>> core, because the expectations for cross-version compatibility go up
>> by orders of magnitude as soon as we do that.
> 
> On a first go-round, I don't think we should add entire views, but
> rather functions that serve specific purposes. For table bloat that
> means a function that returns what the heap size should be based on
> pg_stats. For locking, it means providing information about which PID is
> blocking which PID. After that, most everything else is just window
> dressing.

We already have that second one: pg_blocking_pids(integer)
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: integration bloat tables (indexes) to core

2016-06-16 Thread Pavel Stehule
2016-06-16 20:31 GMT+02:00 Jim Nasby :

> On 6/13/16 12:16 PM, Tom Lane wrote:
>
>> At the same time, I'm pretty suspicious of putting stuff like this in
>> core, because the expectations for cross-version compatibility go up
>> by orders of magnitude as soon as we do that.
>>
>
> On a first go-round, I don't think we should add entire views, but rather
> functions that serve specific purposes. For table bloat that means a
> function that returns what the heap size should be based on pg_stats. For
> locking, it means providing information about which PID is blocking which
> PID. After that, most everything else is just window dressing.
>

could be

if you look on current bloating queries, then you can see pretty complex
queries due implementation on high level. C implementation should be more
faster. There are lot of changes in core, but these queries is working for
PostgreSQL 8.2 to today, so they are relatively stable.

Regards

Pavel


> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 855-TREBLE2 (855-873-2532)   mobile: 512-569-9461
>


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Andres Freund
On 2016-06-16 11:57:48 -0700, Andres Freund wrote:
> See 
> https://www.postgresql.org/message-id/20160616183207.wygoktoplycdz...@alap3.anar

For posterity's sake, that was supposed to be
https://www.postgresql.org/message-id/20160616183207.wygoktoplycdz...@alap3.anarazel.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Andres Freund
On 2016-06-16 13:53:01 -0500, Kevin Grittner wrote:
> On Thu, Jun 16, 2016 at 1:19 PM, Kevin Grittner  wrote:
> > On Thu, Jun 16, 2016 at 11:54 AM, Andres Freund  wrote:
> >> On 2016-06-16 12:43:34 -0400, Robert Haas wrote:
> 
> >>> Maybe it would help if you lay out the whole sequence of events, like:
> >>>
> >>> S1: Does this.
> >>> S2: Does that.
> >>> S1: Now does something else.
> >>
> >> I presume it'd be something like:
> >>
> >> Assuming a 'toasted' table, which contains one row, with a 1GB field.
> >>
> >> S1: BEGIN REPEATABLE READ;
> >> S1: SELECT SUM(length(one_gb_record)) FROM toasted;
> >> S2: DELETE FROM toasted;
> >> AUTOVAC: vacuum toasted's toast table, it's large. skip toasted, it's small
> >> S1: SELECT SUM(length(one_gb_record)) FROM toasted;
> >> 
> >
> > I'll put together a test like that and post in a bit.
> 
> old_snapshot_threshold = '1min'
> autovacuum_vacuum_threshold = 0\
> autovacuum_vacuum_scale_factor = 0.01
> 
> test=# CREATE TABLE gb (rec bytea not null);
> CREATE TABLE
> test=# ALTER TABLE gb ALTER COLUMN rec SET STORAGE external;
> ALTER TABLE
> test=# INSERT INTO gb SELECT t FROM (SELECT repeat('x',
> 10)::bytea) x(t);
> INSERT 0 1
> test=# BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
> BEGIN
> test=# SELECT SUM(length(rec)) FROM gb;
> sum
> 
>  10
> (1 row)
> 
> [wait for autovacuum to run]
> 
> test=# SELECT SUM(length(rec)) FROM gb;
> ERROR:  snapshot too old

See 
https://www.postgresql.org/message-id/20160616183207.wygoktoplycdz...@alap3.anar
for a recipe that reproduce the issue. I presume your example also
vacuums the main table due to the threshold and scale factor you set
(which will pretty much alwasy vacuum a table, no?).

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Kevin Grittner
On Thu, Jun 16, 2016 at 1:32 PM, Andres Freund  wrote:

> With old_snapshot_threshold=1 I indeed can reproduce the issue. I
> disabled autovacuum, to make the scheduling more predictable. But it
> should "work" just as well with autovacuum.
>
> S1: CREATE TABLE toasted(largecol text);
> INSERT INTO toasted SELECT string_agg(random()::text, '-') FROM 
> generate_series(1, 1000);
> BEGIN;
> DELETE FROM toasted;
> S2: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
> S2: SELECT hashtext(largecol), length(largecol) FROM toasted;
>> ...
> S1: COMMIT;
> S2: SELECT hashtext(largecol), length(largecol) FROM toasted;
>> ...
> S1: /* wait for snapshot threshold to be passed */
> S1: VACUUM pg_toast.pg_toast_16437;
>> INFO:  0: "pg_toast_16437": found 61942 removable, 0 nonremovable row 
>> versions in 15486 out of 15486 pages
>> DETAIL:  0 dead row versions cannot be removed yet.
> S2: SELECT hashtext(largecol), length(largecol) FROM toasted;
> ERROR:  XX000: missing chunk number 0 for toast value 16455 in pg_toast_16437
> LOCATION:  toast_fetch_datum, tuptoaster.c:1945

Thanks!  That's something I should be able to work with.
Unfortunately, I am going to be on vacation, so I won't have any
results until sometime after 28 June.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Kevin Grittner
On Thu, Jun 16, 2016 at 1:19 PM, Kevin Grittner  wrote:
> On Thu, Jun 16, 2016 at 11:54 AM, Andres Freund  wrote:
>> On 2016-06-16 12:43:34 -0400, Robert Haas wrote:

>>> Maybe it would help if you lay out the whole sequence of events, like:
>>>
>>> S1: Does this.
>>> S2: Does that.
>>> S1: Now does something else.
>>
>> I presume it'd be something like:
>>
>> Assuming a 'toasted' table, which contains one row, with a 1GB field.
>>
>> S1: BEGIN REPEATABLE READ;
>> S1: SELECT SUM(length(one_gb_record)) FROM toasted;
>> S2: DELETE FROM toasted;
>> AUTOVAC: vacuum toasted's toast table, it's large. skip toasted, it's small
>> S1: SELECT SUM(length(one_gb_record)) FROM toasted;
>> 
>
> I'll put together a test like that and post in a bit.

old_snapshot_threshold = '1min'
autovacuum_vacuum_threshold = 0\
autovacuum_vacuum_scale_factor = 0.01

test=# CREATE TABLE gb (rec bytea not null);
CREATE TABLE
test=# ALTER TABLE gb ALTER COLUMN rec SET STORAGE external;
ALTER TABLE
test=# INSERT INTO gb SELECT t FROM (SELECT repeat('x',
10)::bytea) x(t);
INSERT 0 1
test=# BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN
test=# SELECT SUM(length(rec)) FROM gb;
sum

 10
(1 row)

[wait for autovacuum to run]

test=# SELECT SUM(length(rec)) FROM gb;
ERROR:  snapshot too old

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Andres Freund
Hi,

On 2016-06-16 13:19:13 -0500, Kevin Grittner wrote:
> On Thu, Jun 16, 2016 at 11:54 AM, Andres Freund  wrote:
> > On 2016-06-16 12:43:34 -0400, Robert Haas wrote:
> 
> >> The root of my confusion is: if we prune a tuple, we'll bump the page
> >> LSN, so any session that is still referencing that tuple will error
> >> out as soon as it touches the page on which that tuple used to exist.
> >
> > Right. On the main table. But we don't peform that check on the toast
> > table/pages. So if we prune toast tuples, which are still referenced by
> > (unvacuumed) main relation, we can get into trouble.
> 
> I thought that we should never be using visibility information from
> the toast table; that the visibility information in the heap should
> control.

We use visibility information for vacuuming, toast vacuum puts toast
chunk tuples through HeapTupleSatisfiesVacuum(), just like for normal
tuples.  Otherwise we'd have to collect dead toast tuples during the
normal vacuum, and then do explicit vacuums for those. That'd end up
being pretty expensive.


> If that's the case, how would we prune toast rows without
> pruning the heap?

I'm not sure what you mean? We prune toast tuples by checking xmin/xmax,
and then comparing with OldestXmin. Without STO that's safe, because we
know nobody could lookup up those toast tuples.


> You pointed out that the *reverse* case has an
> option bit -- if that is ever set there could be toasted values
> which would not have a row.

We vacuum toast tables without the main table, by simply calling
vacuum() on the toast relation.  So you can get the case that only the
normal relation is vacuumed *or* that only the toast relation is vacuumed.


> Do they still have a line pointer in the heap, like "dead" index
> entries?

You can have non-pruned toast tuples, where any evidence of the
referencing main-heap tuple is gone.


> How are they cleaned up in current production versions?

There's simply no interlock except OldestXmin preveting referenced toast
tuples to be vacuumed, as long as any alive snapshot can read them.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: integration bloat tables (indexes) to core

2016-06-16 Thread Jim Nasby

On 6/13/16 12:16 PM, Tom Lane wrote:

At the same time, I'm pretty suspicious of putting stuff like this in
core, because the expectations for cross-version compatibility go up
by orders of magnitude as soon as we do that.


On a first go-round, I don't think we should add entire views, but 
rather functions that serve specific purposes. For table bloat that 
means a function that returns what the heap size should be based on 
pg_stats. For locking, it means providing information about which PID is 
blocking which PID. After that, most everything else is just window 
dressing.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Andres Freund
On 2016-06-16 13:16:35 -0500, Kevin Grittner wrote:
> On Thu, Jun 16, 2016 at 1:01 PM, Andres Freund  wrote:
> 
> > The relevant part is the HeapTupleSatisfiesMVCC check, we're using
> > SatisfiesToast for toast lookups.
> >
> > FWIW, I just tried to reproduce this with old_snapshot_threshold = 0 -
> > but ran into the problem that I couldn't get it to vacuum anything away
> > (neither main nor toast rel).   That appears to be
> > if (old_snapshot_threshold == 0)
> > {
> > if (TransactionIdPrecedes(latest_xmin, 
> > MyPgXact->xmin)
> > && TransactionIdFollows(latest_xmin, 
> > xlimit))
> > xlimit = latest_xmin;
> > because latest_xmin always is equal to MyPgXact->xmin, which is actually
> > kinda unsurprising?
> 
> Sure -- the STO feature *never* advances the point for early
> pruning past the earliest still-active transaction ID.  If it did
> we would have all sorts of weird problems.

Both latest_xmin, MyPgXact->xmin are equivalent to txid_current() here.
Note that a threshold of 1 actually vacuums in this case (after waiting
obviously), but 0 never does.  Afaics that's because before
TransactionIdLimitedForOldSnapshots() is reached,
MaintainOldSnapshotTimeMapping will have updated latest_xmin to the
current value.


With old_snapshot_threshold=1 I indeed can reproduce the issue. I
disabled autovacuum, to make the scheduling more predictable. But it
should "work" just as well with autovacuum.

S1: CREATE TABLE toasted(largecol text);
INSERT INTO toasted SELECT string_agg(random()::text, '-') FROM 
generate_series(1, 1000);
BEGIN;
DELETE FROM toasted;
S2: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
S2: SELECT hashtext(largecol), length(largecol) FROM toasted;
> ...
S1: COMMIT;
S2: SELECT hashtext(largecol), length(largecol) FROM toasted;
> ...
S1: /* wait for snapshot threshold to be passed */
S1: VACUUM pg_toast.pg_toast_16437;
> INFO:  0: "pg_toast_16437": found 61942 removable, 0 nonremovable row 
> versions in 15486 out of 15486 pages
> DETAIL:  0 dead row versions cannot be removed yet.
S2: SELECT hashtext(largecol), length(largecol) FROM toasted;
ERROR:  XX000: missing chunk number 0 for toast value 16455 in pg_toast_16437
LOCATION:  toast_fetch_datum, tuptoaster.c:1945

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist

2016-06-16 Thread Tom Lane
Amit Kapila  writes:
> On Mon, Jun 13, 2016 at 10:36 PM, Tom Lane  wrote:
>> min_parallel_relation_size, or min_parallelizable_relation_size, or
>> something like that?

> You are right that such a variable will make it simpler to write tests for
> parallel query.  I have implemented such a guc and choose to keep the name
> as min_parallel_relation_size.

Pushed with minor adjustments.  My first experiments with this say that
we should have done this long ago:
https://www.postgresql.org/message-id/22782.1466100...@sss.pgh.pa.us

> One thing to note is that in function
> create_plain_partial_paths(), curently it is using PG_INT32_MAX/3 for
> parallel_threshold to check for overflow, I have changed it to INT_MAX/3 so
> as to be consistent with guc.c.  I am not sure if it is advisable to use
> PG_INT32_MAX in guc.c as other similar parameters use INT_MAX.

I agree that using INT_MAX is more consistent with the code elsewhere in
guc.c, and more correct given that we declare the variable in question
as int not int32.  But you need to include  to use INT_MAX ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Kevin Grittner
On Thu, Jun 16, 2016 at 11:54 AM, Andres Freund  wrote:
> On 2016-06-16 12:43:34 -0400, Robert Haas wrote:

>> The root of my confusion is: if we prune a tuple, we'll bump the page
>> LSN, so any session that is still referencing that tuple will error
>> out as soon as it touches the page on which that tuple used to exist.
>
> Right. On the main table. But we don't peform that check on the toast
> table/pages. So if we prune toast tuples, which are still referenced by
> (unvacuumed) main relation, we can get into trouble.

I thought that we should never be using visibility information from
the toast table; that the visibility information in the heap should
control.  If that's the case, how would we prune toast rows without
pruning the heap?  You pointed out that the *reverse* case has an
option bit -- if that is ever set there could be toasted values
which would not have a row.  Do they still have a line pointer in
the heap, like "dead" index entries?  How are they cleaned up in
current production versions?  (Note the question mark -- I'm not
big on using that with assertions and rarely fall back on
rhetorical questions.)

>> It won't even survive long enough to care that the tuple isn't there
>> any more.
>>
>> Maybe it would help if you lay out the whole sequence of events, like:
>>
>> S1: Does this.
>> S2: Does that.
>> S1: Now does something else.
>
> I presume it'd be something like:
>
> Assuming a 'toasted' table, which contains one row, with a 1GB field.
>
> S1: BEGIN REPEATABLE READ;
> S1: SELECT SUM(length(one_gb_record)) FROM toasted;
> S2: DELETE FROM toasted;
> AUTOVAC: vacuum toasted's toast table, it's large. skip toasted, it's small
> S1: SELECT SUM(length(one_gb_record)) FROM toasted;
> 

I'll put together a test like that and post in a bit.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Kevin Grittner
On Thu, Jun 16, 2016 at 1:01 PM, Andres Freund  wrote:

> The relevant part is the HeapTupleSatisfiesMVCC check, we're using
> SatisfiesToast for toast lookups.
>
> FWIW, I just tried to reproduce this with old_snapshot_threshold = 0 -
> but ran into the problem that I couldn't get it to vacuum anything away
> (neither main nor toast rel).   That appears to be
> if (old_snapshot_threshold == 0)
> {
> if (TransactionIdPrecedes(latest_xmin, MyPgXact->xmin)
> && TransactionIdFollows(latest_xmin, xlimit))
> xlimit = latest_xmin;
> because latest_xmin always is equal to MyPgXact->xmin, which is actually
> kinda unsurprising?

Sure -- the STO feature *never* advances the point for early
pruning past the earliest still-active transaction ID.  If it did
we would have all sorts of weird problems.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] More parallel-query fun

2016-06-16 Thread Tom Lane
As of HEAD you can exercise quite a lot of parallel query behavior
by running the regression tests with these settings applied:

force_parallel_mode = regress
max_parallel_workers_per_gather = 2-- this is default at the moment
min_parallel_relation_size = 0
parallel_setup_cost = 0
parallel_tuple_cost = 0

This results in multiple interesting failures, including a core dump
here:

Program terminated with signal 11, Segmentation fault.
#0  shm_mq_set_handle (mqh=0x0, handle=0x1ac3090) at shm_mq.c:312
312 Assert(mqh->mqh_handle == NULL);
(gdb) bt
#0  shm_mq_set_handle (mqh=0x0, handle=0x1ac3090) at shm_mq.c:312
#1  0x004e0fd9 in LaunchParallelWorkers (pcxt=0x1ac2dd8)
at parallel.c:479
#2  0x005f40fd in ExecGather (node=0x1b05508) at nodeGather.c:168
#3  0x005e3011 in ExecProcNode (node=0x1b05508) at execProcnode.c:515
#4  0x005fe795 in ExecNestLoop (node=0x1afe7f0) at nodeNestloop.c:174
#5  0x005e2f87 in ExecProcNode (node=0x1afe7f0) at execProcnode.c:476
#6  0x0060135b in ExecSort (node=0x1afe520) at nodeSort.c:103
#7  0x005e2fc7 in ExecProcNode (node=0x1afe520) at execProcnode.c:495
#8  0x005e15c8 in ExecutePlan (queryDesc=0x1a44f98, 
direction=NoMovementScanDirection, count=0) at execMain.c:1567
#9  standard_ExecutorRun (queryDesc=0x1a44f98, 
direction=NoMovementScanDirection, count=0) at execMain.c:338
#10 0x005e16b6 in ExecutorRun (queryDesc=, 
direction=, count=)
at execMain.c:286

(gdb) p debug_query_string 
$1 = 0x1a965e8 "SELECT n.nspname as \"Schema\",\n  p.proname AS \"Name\",\n  
pg_catalog.format_type(p.prorettype, NULL) AS \"Result data type\",\n  CASE 
WHEN p.pronargs = 0\nTHEN CAST('*' AS pg_catalog.text)\nELSE pg_ca"...

The statement that triggers this varies from run to run, but the proximate
cause, namely error_mqh being null at parallel.c:479, seems consistent.
It looks to me like parallel.c's handling of insufficiently-many-workers
is a few bricks shy of a load.


I saw another previously-unreported problem before getting to the crash:

*** /home/postgres/pgsql/src/test/regress/expected/enum.out Mon Oct 20 
10:50:24 2014
--- /home/postgres/pgsql/src/test/regress/results/enum.out  Thu Jun 16 
14:00:58 2016
***
*** 284,306 
  -- Aggregates
  --
  SELECT min(col) FROM enumtest;
!  min 
! -
!  red
! (1 row)
! 
  SELECT max(col) FROM enumtest;
!   max   
! 
!  purple
! (1 row)
! 
  SELECT max(col) FROM enumtest WHERE col < 'green';
!   max   
! 
!  yellow
! (1 row)
! 
  --
  -- Index tests, force use of index
  --
--- 284,294 
  -- Aggregates
  --
  SELECT min(col) FROM enumtest;
! ERROR:  type matched to anyenum is not an enum type: anyenum
  SELECT max(col) FROM enumtest;
! ERROR:  type matched to anyenum is not an enum type: anyenum
  SELECT max(col) FROM enumtest WHERE col < 'green';
! ERROR:  type matched to anyenum is not an enum type: anyenum
  --
  -- Index tests, force use of index
  --

Haven't tried to trace that one down yet.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Andres Freund
On 2016-06-16 13:44:34 -0400, Robert Haas wrote:
> On Thu, Jun 16, 2016 at 12:54 PM, Andres Freund  wrote:
> >> The root of my confusion is: if we prune a tuple, we'll bump the page
> >> LSN, so any session that is still referencing that tuple will error
> >> out as soon as it touches the page on which that tuple used to exist.
> >
> > Right. On the main table. But we don't peform that check on the toast
> > table/pages. So if we prune toast tuples, which are still referenced by
> > (unvacuumed) main relation, we can get into trouble.
> 
> OK, if it's true that we don't perform that check on the TOAST table,
> then I agree there's a potential problem there.  I don't immediately
> know where in the code to look to check that.

static inline void
TestForOldSnapshot(Snapshot snapshot, Relation relation, Page page)
{
Assert(relation != NULL);

if (old_snapshot_threshold >= 0
&& (snapshot) != NULL
&& (snapshot)->satisfies == HeapTupleSatisfiesMVCC
&& !XLogRecPtrIsInvalid((snapshot)->lsn)
&& PageGetLSN(page) > (snapshot)->lsn)
TestForOldSnapshot_impl(snapshot, relation);
}

The relevant part is the HeapTupleSatisfiesMVCC check, we're using
SatisfiesToast for toast lookups.


FWIW, I just tried to reproduce this with old_snapshot_threshold = 0 -
but ran into the problem that I couldn't get it to vacuum anything away
(neither main nor toast rel).   That appears to be
if (old_snapshot_threshold == 0)
{
if (TransactionIdPrecedes(latest_xmin, MyPgXact->xmin)
&& TransactionIdFollows(latest_xmin, xlimit))
xlimit = latest_xmin;
because latest_xmin always is equal to MyPgXact->xmin, which is actually
kinda unsurprising?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] PgQ and pg_dump

2016-06-16 Thread Martín Marqués
Hi,

2016-06-16 9:48 GMT-03:00 Michael Paquier :
> On Thu, Jun 16, 2016 at 8:37 PM, Martín Marqués  
> wrote:
>> El 16/06/16 a las 00:08, Michael Paquier escribió:
>>> On Wed, Jun 15, 2016 at 7:19 PM, Martín Marqués  
>>> wrote:

 How would the recovery process work? We expect the schema to be there
 when restoring the tables?
>>>
>>> pg_dump creates the schema first via the CREATE EXTENSION command,
>>> then tables dependent on this schema that are not created by the
>>> extension are dumped individually.
>>
>> That's not the behavior I'm seeing here:
>> [long test]
>
> Yes, that's why I completely agree that this is a bug :)
> I am seeing the same behavior as you do.

That's nice, we agree to agree! :)

So, after reading back and forth, the reason why the tables are not
being dumped is noted here in the code:

/*
 * If specific tables are being dumped, dump just those
tables; else, dump
 * according to the parent namespace's dump flag.
 */
if (table_include_oids.head != NULL)
tbinfo->dobj.dump = simple_oid_list_member(_include_oids,

tbinfo->dobj.catId.oid) ?
DUMP_COMPONENT_ALL : DUMP_COMPONENT_NONE;
else
tbinfo->dobj.dump = tbinfo->dobj.namespace->dobj.dump_contains;


The comment is accurate on what is going to be dumpable and what's not
from the code. In our case, as the pgq schema is not dumpable becaause
it comes from an extension, other objects it contain will not be
dumpable as well.

That's the reason why the PgQ event tables created by
pgq.create_queue() are not dumped.

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Robert Haas
On Thu, Jun 16, 2016 at 12:54 PM, Andres Freund  wrote:
>> The root of my confusion is: if we prune a tuple, we'll bump the page
>> LSN, so any session that is still referencing that tuple will error
>> out as soon as it touches the page on which that tuple used to exist.
>
> Right. On the main table. But we don't peform that check on the toast
> table/pages. So if we prune toast tuples, which are still referenced by
> (unvacuumed) main relation, we can get into trouble.

OK, if it's true that we don't perform that check on the TOAST table,
then I agree there's a potential problem there.  I don't immediately
know where in the code to look to check that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist

2016-06-16 Thread Robert Haas
On Thu, Jun 16, 2016 at 12:50 PM, Tom Lane  wrote:
> Amit Kapila  writes:
>> On Thu, Jun 16, 2016 at 9:26 AM, Robert Haas  wrote:
>>> 3. In https://www.postgresql.org/message-id/25521.1465837...@sss.pgh.pa.us
>>> Tom proposed adding a GUC to set the minimum relation size required
>>> for consideration of parallelism.
>
>> I have posted a patch for this upthread [3].  The main thing to verify is
>> the default value of guc.
>
> I'll review and push this patch, unless Robert or somebody else has
> already started on it.

Great, it's all yours.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Andres Freund
On 2016-06-16 12:43:34 -0400, Robert Haas wrote:
> On Thu, Jun 16, 2016 at 12:14 PM, Andres Freund  wrote:
> >> > The issue isn't there without the feature, because we (should) never
> >> > access a tuple/detoast a column when it's invisible enough for the
> >> > corresponding toast tuple to be vacuumed away. But with
> >> > old_snapshot_timeout that's obviously (intentionally) not the case
> >> > anymore.  Due to old_snapshot_threshold we'll prune tuples which,
> >> > without it, would still be considered HEAPTUPLE_RECENTLY_DEAD.
> >>
> >> Is there really an assumption that the heap and the TOAST heap are
> >> only ever vacuumed with the same OldestXmin value?  Because that seems
> >> like it would be massively flaky.
> >
> > There's not. They can be vacuumed days apart. But if we vacuum the toast
> > table with an OldestXmin, and encounter a dead toast tuple, by the
> > definition of OldestXmin (excluding STO), there cannot be a session
> > reading the referencing tuple anymore - so that shouldn't matter.
> 
> I don't understand how STO changes that.  I'm not saying it doesn't
> change it, but I don't understand why it would.

Because we advance OldestXmin more aggressively, while allowing
snapshots that are *older* than OldestXmin to access old tuples on pages
which haven't been touched.


> The root of my confusion is: if we prune a tuple, we'll bump the page
> LSN, so any session that is still referencing that tuple will error
> out as soon as it touches the page on which that tuple used to exist.

Right. On the main table. But we don't peform that check on the toast
table/pages. So if we prune toast tuples, which are still referenced by
(unvacuumed) main relation, we can get into trouble.


> It won't even survive long enough to care that the tuple isn't there
> any more.
> 
> Maybe it would help if you lay out the whole sequence of events, like:
> 
> S1: Does this.
> S2: Does that.
> S1: Now does something else.

I presume it'd be something like:

Assuming a 'toasted' table, which contains one row, with a 1GB field.

S1: BEGIN REPEATABLE READ;
S1: SELECT SUM(length(one_gb_record)) FROM toasted;
S2: DELETE FROM toasted;
AUTOVAC: vacuum toasted's toast table, it's large. skip toasted, it's small
S1: SELECT SUM(length(one_gb_record)) FROM toasted;



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist

2016-06-16 Thread Tom Lane
Amit Kapila  writes:
> On Thu, Jun 16, 2016 at 9:26 AM, Robert Haas  wrote:
>> 3. In https://www.postgresql.org/message-id/25521.1465837...@sss.pgh.pa.us
>> Tom proposed adding a GUC to set the minimum relation size required
>> for consideration of parallelism.

> I have posted a patch for this upthread [3].  The main thing to verify is
> the default value of guc.

I'll review and push this patch, unless Robert or somebody else has
already started on it.  I concur with your choice of setting the default
value to 1024 blocks; that's close to the existing behavior but as a power
of 2 will look cleaner in GUC size units.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Robert Haas
On Thu, Jun 16, 2016 at 12:14 PM, Andres Freund  wrote:
>> > The issue isn't there without the feature, because we (should) never
>> > access a tuple/detoast a column when it's invisible enough for the
>> > corresponding toast tuple to be vacuumed away. But with
>> > old_snapshot_timeout that's obviously (intentionally) not the case
>> > anymore.  Due to old_snapshot_threshold we'll prune tuples which,
>> > without it, would still be considered HEAPTUPLE_RECENTLY_DEAD.
>>
>> Is there really an assumption that the heap and the TOAST heap are
>> only ever vacuumed with the same OldestXmin value?  Because that seems
>> like it would be massively flaky.
>
> There's not. They can be vacuumed days apart. But if we vacuum the toast
> table with an OldestXmin, and encounter a dead toast tuple, by the
> definition of OldestXmin (excluding STO), there cannot be a session
> reading the referencing tuple anymore - so that shouldn't matter.

I don't understand how STO changes that.  I'm not saying it doesn't
change it, but I don't understand why it would.

The root of my confusion is: if we prune a tuple, we'll bump the page
LSN, so any session that is still referencing that tuple will error
out as soon as it touches the page on which that tuple used to exist.
It won't even survive long enough to care that the tuple isn't there
any more.

Maybe it would help if you lay out the whole sequence of events, like:

S1: Does this.
S2: Does that.
S1: Now does something else.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-16 Thread Tom Lane
Teodor Sigaev  writes:
>> So I think there's a reasonable case for decreeing that  should only
>> match lexemes *exactly* N apart.  If we did that, we would no longer have
>> the misbehavior that Jean-Pierre is complaining about, and we'd not need
>> to argue about whether <0> needs to be treated specially.

> Agree, seems that's easy to change.
> ...
> Patch is attached

Hmm, couldn't the loop logic be simplified a great deal if this is the
definition?  Or are you leaving it like that with the idea that we might
later introduce another operator with the less-than-or-equal behavior?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist

2016-06-16 Thread Robert Haas
On Thu, Jun 16, 2016 at 8:00 AM, Amit Kapila  wrote:
>> 1. The case originally reported by Thomas Munro still fails.  To fix
>> that, we probably need to apply scanjoin_target to each partial path.
>> But we can only do that if it's parallel-safe.  It seems like what we
>> want is something like this: (1) During scan/join planning, somehow
>> skip calling generate_gather_paths for the topmost scan/join rel as we
>> do to all the others.  (2) If scanjoin_target is not parallel-safe,
>> build a path for the scan/join phase that applies a Gather node to the
>> cheapest path and does projection at the Gather node.  Then forget all
>> the partial paths so we can't do any bogus upper planning.  (3) If
>> scanjoin_target is parallel-safe, replace the list of partial paths
>> for the topmost scan/join rel with a new list where scanjoin_target
>> has been applied to each one.  I haven't tested this so I might be
>> totally off-base about what's actually required here...
>
> I think we can achieve it just by doing something like what you have
> mentioned in (2) and (3).  I am not sure if there is a need to skip
> generation of gather paths for top scan/join node.  Please see the patch
> attached.  I have just done some minimal testing to ensure that problem
> reported by Thomas Munro in this thread is fixed and verified that the fix
> is sane for problems [1][2] reported by sqlsmith. If you think this is on
> right lines, I can try to do more verification and try to add tests.

You can't do it this way because of the issue Tom discussed here:

https://www.postgresql.org/message-id/16421.1465828...@sss.pgh.pa.us

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Andres Freund
On 2016-06-16 12:02:39 -0400, Robert Haas wrote:
> On Thu, Jun 16, 2016 at 11:37 AM, Andres Freund  wrote:
> >> If that were really true, why would we not have the problem in
> >> current production versions that the toast table could be vacuumed
> >> before the heap, leading to exactly the issue you are talking
> >> about?
> >
> > The issue isn't there without the feature, because we (should) never
> > access a tuple/detoast a column when it's invisible enough for the
> > corresponding toast tuple to be vacuumed away. But with
> > old_snapshot_timeout that's obviously (intentionally) not the case
> > anymore.  Due to old_snapshot_threshold we'll prune tuples which,
> > without it, would still be considered HEAPTUPLE_RECENTLY_DEAD.
> 
> Is there really an assumption that the heap and the TOAST heap are
> only ever vacuumed with the same OldestXmin value?  Because that seems
> like it would be massively flaky.

There's not. They can be vacuumed days apart. But if we vacuum the toast
table with an OldestXmin, and encounter a dead toast tuple, by the
definition of OldestXmin (excluding STO), there cannot be a session
reading the referencing tuple anymore - so that shouldn't matter.

IIRC we actually reverted a patch that caused significant problems
around this. I think there's a small race condition around
ProcessStandbyHSFeedbackMessage(), and you can restart with a different
vacuum_defer_cleanup_age (we should just remove that), but other than
that we shouldn't run into any issues without STO.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Robert Haas
On Thu, Jun 16, 2016 at 11:37 AM, Andres Freund  wrote:
>> If that were really true, why would we not have the problem in
>> current production versions that the toast table could be vacuumed
>> before the heap, leading to exactly the issue you are talking
>> about?
>
> The issue isn't there without the feature, because we (should) never
> access a tuple/detoast a column when it's invisible enough for the
> corresponding toast tuple to be vacuumed away. But with
> old_snapshot_timeout that's obviously (intentionally) not the case
> anymore.  Due to old_snapshot_threshold we'll prune tuples which,
> without it, would still be considered HEAPTUPLE_RECENTLY_DEAD.

Is there really an assumption that the heap and the TOAST heap are
only ever vacuumed with the same OldestXmin value?  Because that seems
like it would be massively flaky.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prevent ALTER TABLE DROP NOT NULL on child tables if parent column has it

2016-06-16 Thread Tom Lane
Michael Paquier  writes:
> On Wed, Jun 15, 2016 at 10:34 PM, Tom Lane  wrote:
>> My feeling is that we'd keep
>> the pg_attribute.attnotnull field and continue to drive actual enforcement
>> off that, but it would just reflect a summary of the pg_constraint state.

> OK, I see. Hm, by storing this information I would actually think that
> we want to drop this attnotnull so as we don't need to bother about
> updating pg_attribute through the whole tree when dropping a NOT NULL
> constraint on the parent, and we do not actually need to store this
> information in two different places..

There are a couple of reasons for not removing attnotnull: one is to not
need to touch the executor for this, and another is to not break
client-side code that is accustomed to looking at attnotnull.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Andres Freund
On 2016-06-16 09:50:09 -0500, Kevin Grittner wrote:
> On Wed, Jun 15, 2016 at 8:57 PM, Andres Freund  wrote:
> 
> > The simplest version of the scenario I'm concerned about is that a tuple
> > in a tuple is *not* vacuumed, even though it's elegible to be removed
> > due to STO. If that tuple has toasted columns, it could be the that the
> > toast table was independently vacuumed (autovac considers main/toast
> > tables separately,
> 
> If that were really true, why would we not have the problem in
> current production versions that the toast table could be vacuumed
> before the heap, leading to exactly the issue you are talking
> about?

The issue isn't there without the feature, because we (should) never
access a tuple/detoast a column when it's invisible enough for the
corresponding toast tuple to be vacuumed away. But with
old_snapshot_timeout that's obviously (intentionally) not the case
anymore.  Due to old_snapshot_threshold we'll prune tuples which,
without it, would still be considered HEAPTUPLE_RECENTLY_DEAD.


> It seems to me that a simple test shows that it is not the
> case that the heap is vacuumed without considering toast:

That's why I mentioned autovacuum:
/*
 * Scan pg_class to determine which tables to vacuum.
 *
 * We do this in two passes: on the first one we collect the list of 
plain
 * relations and materialized views, and on the second one we collect
 * TOAST tables. The reason for doing the second pass is that during it 
we
 * want to use the main relation's pg_class.reloptions entry if the 
TOAST
 * table does not have any, and we cannot obtain it unless we know
 * beforehand what's the main  table OID.
 *
 * We need to check TOAST tables separately because in cases with short,
 * wide tables there might be proportionally much more activity in the
 * TOAST table than in its parent.
 */
...
tab->at_vacoptions = VACOPT_SKIPTOAST |
(dovacuum ? VACOPT_VACUUM : 0) |
(doanalyze ? VACOPT_ANALYZE : 0) |
(!wraparound ? VACOPT_NOWAIT : 0);
(note the skiptoast)
...
/*
 * Remember the relation's TOAST relation for later, if the caller asked
 * us to process it.  In VACUUM FULL, though, the toast table is
 * automatically rebuilt by cluster_rel so we shouldn't recurse to it.
 */
if (!(options & VACOPT_SKIPTOAST) && !(options & VACOPT_FULL))
toast_relid = onerel->rd_rel->reltoastrelid;
else
toast_relid = InvalidOid;
...
if (toast_relid != InvalidOid)
vacuum_rel(toast_relid, relation, options, params);


> > or the horizon could change between the two heap scans,
> 
> Not a problem in current production why?

Because the horizon will never go to a value which allows "surely dead"
tuples to be read, thus we never detoast columns from a tuple for which
we'd removed toast data. That's why we're performing visibility tests
(hopefully) everywhere, before accessing tuple contents (as opposed to
inspecting the header).


> > or pins could prevent vacuuming of one page, or ...).
> 
> Not a problem in current production why?

Same reason.


> So far I am not seeing any way for TOAST tuples to be pruned in
> advance of the referencing heap tuples, nor any way for the problem
> you describe to happen in the absence of that.

Didn't I just list three different ways, only one of which you doubted
the veracity of? Saying "Not a problem in current production why"
doesn't change it being a problem.


> > It seems the easiest way to fix this would be to make
> > TestForOldSnapshot() "accept" SnapshotToast as well.
> 
> I don't think that would be appropriate without testing the
> characteristics of the underlying table to see whether it should be
> excluded.

You mean checking whether it's a toast table? We could check that, but
since we never use a toast scan outside of toast, it doesn't seem
necessary.


> But is the TOAST data ever updated without an update to
> the referencing heap tuple?

It shouldn't.


> If not, I don't see any benefit.

Huh?


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] forcing a rebuild of the visibility map

2016-06-16 Thread Masahiko Sawada
On Thu, Jun 16, 2016 at 10:03 PM, Robert Haas  wrote:
> On Thu, Jun 16, 2016 at 2:33 AM, Masahiko Sawada  
> wrote:
>> Option name DISABLE_PAGE_SKIPPING is good to me.
>> I'm still working on this, but here are some review comments.
>>
>> ---
>> +CREATE FUNCTION pg_truncate_visibility_map(regclass)
>> +RETURNS void
>> +AS 'MODULE_PATHNAME', 'pg_truncate_visibility_map'
>> +LANGUAGE C STRICT
>> +PARALLEL UNSAFE;  -- let's not make this any more dangerous
>> +
>>
>> "REVOKE ALL ON FUNCTION pg_truncate_visibility_map(regclass) FROM
>> PUBLIC;" is missing in pg_visibility/pg_visibility--1.0--1.1.sql.
>
> OK, thanks.  I'll fix that.
>
>> I think that VACUUM with VERBOSE option can show the information for
>> how many frozen pages we skipped like autovacuum does. Thought?
>> Patch attached.
>
> I'm not sure.  The messages VACUUM emits are already quite long and
> hard to read, and adding more lines to them might make that problem
> worse.  On the other hand, having more details can be helpful, too.

Because these informations are emitted only when VERBOSE option is
specified, I think that it would be better to output that rather than
nothing, although it makes messages more complicated.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Kevin Grittner
On Wed, Jun 15, 2016 at 8:57 PM, Andres Freund  wrote:

> The simplest version of the scenario I'm concerned about is that a tuple
> in a tuple is *not* vacuumed, even though it's elegible to be removed
> due to STO. If that tuple has toasted columns, it could be the that the
> toast table was independently vacuumed (autovac considers main/toast
> tables separately,

If that were really true, why would we not have the problem in
current production versions that the toast table could be vacuumed
before the heap, leading to exactly the issue you are talking
about?  It seems to me that a simple test shows that it is not the
case that the heap is vacuumed without considering toast:

test=# create table tt (c1 text not null);
CREATE TABLE
test=# insert into tt select repeat(md5(n::text),10) from (select
generate_series(1,100)) x(n);
INSERT 0 100
test=# delete from tt;
DELETE 100
test=# vacuum verbose tt;
INFO:  vacuuming "public.tt"
INFO:  "tt": removed 100 row versions in 1 pages
INFO:  "tt": found 100 removable, 0 nonremovable row versions in 1 out
of 1 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  vacuuming "pg_toast.pg_toast_16552"
INFO:  scanned index "pg_toast_16552_index" to remove 1900 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_16552": removed 1900 row versions in 467 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "pg_toast_16552_index" now contains 0 row versions in 8 pages
DETAIL:  1900 index row versions were removed.
5 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_16552": found 1900 removable, 0 nonremovable row
versions in 467 out of 467 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

> or the horizon could change between the two heap scans,

Not a problem in current production why?

> or pins could prevent vacuuming of one page, or ...).

Not a problem in current production why?

So far I am not seeing any way for TOAST tuples to be pruned in
advance of the referencing heap tuples, nor any way for the problem
you describe to happen in the absence of that.  If you see such,
could you provide a more detailed description or a reproducible
test case?

> Toast accesses via tuptoaster.c currently don't perform
> TestForOldSnapshot_impl(), because they use SnapshotToastData, not
> SnapshotMVCC.
>
> That seems to means two things:
>
> 1) You might get 'missing chunk number ...' errors on access to toasted
>columns. Misleading error, but ok.
>
> 2) Because the tuple has been pruned from the toast table, it's possible
>that the toast oid/va_valueid is reused, because now there's no
>conflict with GetNewOidWithIndex() anymore. In that case the
>toast_fetch_datum() might return a tuple from another column & type
>(all columns in a table share the same toast table), which could lead
>to almost arbitrary problems.  That's not super likely to happen, but
>could have quite severe consequences once it starts.
>
> It seems the easiest way to fix this would be to make
> TestForOldSnapshot() "accept" SnapshotToast as well.

I don't think that would be appropriate without testing the
characteristics of the underlying table to see whether it should be
excluded.  But is the TOAST data ever updated without an update to
the referencing heap tuple?  If not, I don't see any benefit.  And
we certainly don't want to add some new way to prune TOAST tuples
which might still have referencing heap tuples; that could provide
a route to *create* the problem you describe.

I am on vacation tomorrow (Friday the 17th) through Monday the
27th, so I will be unable to respond to further issues during that
time.  Hopefully I can get this particular issue sorted out before
I go.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade vs vacuum_cost_delay

2016-06-16 Thread Magnus Hagander
On Thu, Jun 16, 2016 at 4:35 PM, Euler Taveira  wrote:

> On 16-06-2016 09:05, Magnus Hagander wrote:
> > Shouldn't pg_upgrade turn off vacuum cost delay when it vacuums the new
> > cluster? Not talking about the post-analyze script, but when it runs
> > vacuumdb to analyze and freeze before loading the new schema, in
> > prepare_new_cluster()? Those run during downtime, so it seems like you'd
> > want those to run as fast as possible.
> >
> Doesn't --new-options do the job?
>

You could, but it seems like it should do it by default.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] pg_upgrade vs vacuum_cost_delay

2016-06-16 Thread Euler Taveira
On 16-06-2016 09:05, Magnus Hagander wrote:
> Shouldn't pg_upgrade turn off vacuum cost delay when it vacuums the new
> cluster? Not talking about the post-analyze script, but when it runs
> vacuumdb to analyze and freeze before loading the new schema, in
> prepare_new_cluster()? Those run during downtime, so it seems like you'd
> want those to run as fast as possible.
> 
Doesn't --new-options do the job?


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel safety tagging of extension functions

2016-06-16 Thread Euler Taveira
On 01-06-2016 20:52, Andreas Karlsson wrote:
> I think at least three of the four aggregate functions are little used,
> so I do not think many users would be affected. And only min(citext) and
> max(citext) can make use of the parallel aggregation.
> 
> The functions are:
> 
> min(citext)
> max(citext)
> int_array_aggregate(int4)
> rewrite(tsquery[])
> 
I don't think those functions are used a lot (as you said) to justify
adding more code in 9.6. Let's not be in a hurry on something that can
wait some months.


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel safety tagging of extension functions

2016-06-16 Thread Andreas Karlsson

On 06/14/2016 09:55 PM, Robert Haas wrote:

On Tue, Jun 14, 2016 at 1:51 PM, Robert Haas  wrote:

On Tue, Jun 14, 2016 at 6:37 AM, Andreas Karlsson  wrote:

I have rebased all my patches on the current master now (and skipped the
extensions I previously listed).


Thanks, this is really helpful.  It was starting to get hard to keep
track of what hadn't been applied yet.  I decided to prioritize
getting committed the patches where the extension version had already
been bumped by 749a787c5b25ae33b3d4da0ef12aa05214aa73c7, so I've now
committed the patches for cube, hstore, intarray, ltree, pg_trgm, and
seg.


I've now also committed the patches for sslinfo, unaccept, uuid-ossp, and xml2.


Thanks!


I took at look at the patch for tsearch2, but I think token_type() is
mismarked.  You have it marked PARALLEL SAFE but seems to depend on
the result of GetCurrentParser(), which returns a backend-private
state variable.


Hm, as far as I can tell that is only for token_type() which I made 
RESTRICTED while token_type(int4) and token_type(text) do not call 
GetCurrentParser().



That was the only clear mistake I found, but I tend
to think that changing the markings on anything defined by
UNSUPPORTED_FUNCTION() is pretty silly, because there's no point in
going to extra planner effort to generate a parallel plan only to
error out as soon as we try to execute it.  I think you should leave
all of those out of the patch.


I will fix this.


I also took a look at the patch for tablefunc.  I think that you've
got the markings right, here, but I think that it would be good to add
PARALLEL UNSAFE explicitly to the 1.1 version of the file for the
functions are unsafe, and add a comment like "-- query might do
anything" or some other indication as to why they are so marked, for
the benefit of future readers.


Good suggestion.

Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] PgQ and pg_dump

2016-06-16 Thread Martín Marqués
El 16/06/16 a las 09:48, Michael Paquier escribió:
> On Thu, Jun 16, 2016 at 8:37 PM, Martín Marqués  
> wrote:
> 
>> This problem came up due to a difference between pg_dump on 9.1.12 and
>> 9.1.22 (I believe it was due to a patch on pg_dump that excluded the
>> dependent objects from being dumped), but here I'm using 9.5.3:
> 
> Hm. I don't recall anything in pg_dump lately except ebd092b, but that
> fixed another class of problems.

I believe it was this one:

commit 5108013dbbfedb5e5af6a58cde5f074d895c46bf
Author: Tom Lane 
Date:   Wed Jan 13 18:55:27 2016 -0500

Handle extension members when first setting object dump flags in
pg_dump.

Regards,

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] forcing a rebuild of the visibility map

2016-06-16 Thread Robert Haas
On Thu, Jun 16, 2016 at 2:33 AM, Masahiko Sawada  wrote:
> Option name DISABLE_PAGE_SKIPPING is good to me.
> I'm still working on this, but here are some review comments.
>
> ---
> +CREATE FUNCTION pg_truncate_visibility_map(regclass)
> +RETURNS void
> +AS 'MODULE_PATHNAME', 'pg_truncate_visibility_map'
> +LANGUAGE C STRICT
> +PARALLEL UNSAFE;  -- let's not make this any more dangerous
> +
>
> "REVOKE ALL ON FUNCTION pg_truncate_visibility_map(regclass) FROM
> PUBLIC;" is missing in pg_visibility/pg_visibility--1.0--1.1.sql.

OK, thanks.  I'll fix that.

> I think that VACUUM with VERBOSE option can show the information for
> how many frozen pages we skipped like autovacuum does. Thought?
> Patch attached.

I'm not sure.  The messages VACUUM emits are already quite long and
hard to read, and adding more lines to them might make that problem
worse.  On the other hand, having more details can be helpful, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in postgres_fdw/deparse.c:1116

2016-06-16 Thread Robert Haas
On Thu, Jun 16, 2016 at 7:05 AM, Etsuro Fujita
 wrote:
> ISTM that a robuster solution to this is to push down the ft1-ft2-ft3 join
> with the PHV by extending deparseExplicitTargetList() and/or something else
> so that we can send the remote query like:
>
> select ft1.a, (ft3.a IS NOT NULL) from (ft1 inner join ft2 on ft1.a = ft2.a)
> left join ft3 on ft1.a = ft3.a

I completely agree we should have that, but not for 9.6.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] PgQ and pg_dump

2016-06-16 Thread Michael Paquier
On Thu, Jun 16, 2016 at 8:37 PM, Martín Marqués  wrote:
> El 16/06/16 a las 00:08, Michael Paquier escribió:
>> On Wed, Jun 15, 2016 at 7:19 PM, Martín Marqués  
>> wrote:
>>>
>>> How would the recovery process work? We expect the schema to be there
>>> when restoring the tables?
>>
>> pg_dump creates the schema first via the CREATE EXTENSION command,
>> then tables dependent on this schema that are not created by the
>> extension are dumped individually.
>
> That's not the behavior I'm seeing here:
> [long test]

Yes, that's why I completely agree that this is a bug :)
I am seeing the same behavior as you do.

> This problem came up due to a difference between pg_dump on 9.1.12 and
> 9.1.22 (I believe it was due to a patch on pg_dump that excluded the
> dependent objects from being dumped), but here I'm using 9.5.3:

Hm. I don't recall anything in pg_dump lately except ebd092b, but that
fixed another class of problems.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade vs vacuum_cost_delay

2016-06-16 Thread Magnus Hagander
Shouldn't pg_upgrade turn off vacuum cost delay when it vacuums the new
cluster? Not talking about the post-analyze script, but when it runs
vacuumdb to analyze and freeze before loading the new schema, in
prepare_new_cluster()? Those run during downtime, so it seems like you'd
want those to run as fast as possible.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist

2016-06-16 Thread Amit Kapila
On Thu, Jun 16, 2016 at 9:26 AM, Robert Haas  wrote:
>
> On Tue, Jun 14, 2016 at 12:18 PM, Tom Lane  wrote:
> > Amit Kapila  writes:
> >> On Mon, Jun 13, 2016 at 9:37 PM, Tom Lane  wrote:
> >>> ...  I got a core dump in the window.sql test:
> >>> which I think may be another manifestation of the
failure-to-apply-proper-
> >>> pathtarget issue we're looking at in this thread.  Or maybe it's just
> >>> an unjustified assumption in make_partialgroup_input_target that the
> >>> input path must always have some sortgrouprefs assigned.
> >
> >> I don't see this problem after your recent commit
> >> - 89d53515e53ea080029894939118365b647489b3.  Are you still getting it?
> >
> > No.  I am suspicious that there's still a bug there, ie we're looking at
> > the wrong pathtarget; but the regression test doesn't actually crash.
> > That might only be because we don't choose the parallelized path.
>
> OK, so there are quite a number of separate things here:
>
> 1. The case originally reported by Thomas Munro still fails.  To fix
> that, we probably need to apply scanjoin_target to each partial path.
> But we can only do that if it's parallel-safe.  It seems like what we
> want is something like this: (1) During scan/join planning, somehow
> skip calling generate_gather_paths for the topmost scan/join rel as we
> do to all the others.  (2) If scanjoin_target is not parallel-safe,
> build a path for the scan/join phase that applies a Gather node to the
> cheapest path and does projection at the Gather node.  Then forget all
> the partial paths so we can't do any bogus upper planning.  (3) If
> scanjoin_target is parallel-safe, replace the list of partial paths
> for the topmost scan/join rel with a new list where scanjoin_target
> has been applied to each one.  I haven't tested this so I might be
> totally off-base about what's actually required here...
>

I think we can achieve it just by doing something like what you have
mentioned in (2) and (3).  I am not sure if there is a need to skip
generation of gather paths for top scan/join node.  Please see the patch
attached.  I have just done some minimal testing to ensure that problem
reported by Thomas Munro in this thread is fixed and verified that the fix
is sane for problems [1][2] reported by sqlsmith. If you think this is on
right lines, I can try to do more verification and try to add tests.

> 2. In https://www.postgresql.org/message-id/15695.1465827...@sss.pgh.pa.us
> Tom raised the issue that we need some test cases covering this area.
>
> 3. In https://www.postgresql.org/message-id/25521.1465837...@sss.pgh.pa.us
> Tom proposed adding a GUC to set the minimum relation size required
> for consideration of parallelism.
>

I have posted a patch for this upthread [3].  The main thing to verify is
the default value of guc.


[1] -
https://www.postgresql.org/message-id/CAA4eK1Ky2=HsTsT4hmfL=eal5rv0_t59tvwzvk9hqkvn6do...@mail.gmail.com
[2] -
https://www.postgresql.org/message-id/CAFiTN-vzg5BkK6kAh3OMhvgRu-uJvkjz47ybtopMAfGJp=z...@mail.gmail.com
[3] -
https://www.postgresql.org/message-id/CA%2BkptmAU4xkzBpd8tie3X6o9_tE2oKm-0kDn8%2BZOF%3D2_qOMZNA%40mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_scanjoin_pathtarget_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] PgQ and pg_dump

2016-06-16 Thread Martín Marqués
El 16/06/16 a las 00:08, Michael Paquier escribió:
> On Wed, Jun 15, 2016 at 7:19 PM, Martín Marqués  
> wrote:
>>
>> How would the recovery process work? We expect the schema to be there
>> when restoring the tables?
> 
> pg_dump creates the schema first via the CREATE EXTENSION command,
> then tables dependent on this schema that are not created by the
> extension are dumped individually.

That's not the behavior I'm seeing here:

pruebas=# create extension pgq;
CREATE EXTENSION

pruebas=# select pgq.create_queue('personas');
 create_queue
--
1
(1 fila)

pruebas=# select pgq.create_queue('usuarios');
 create_queue
--
1
(1 fila)

pruebas=# select pgq.create_queue('usuarios_activos');
 create_queue
--
1
(1 fila)

pruebas=# select pgq.create_queue('usuarios_inactivos');
 create_queue
--
1
(1 fila)

pruebas=# select count(*) from pgq.tick;
 count
---
 4
(1 fila)

pruebas=# \dt pgq.*
Listado de relaciones
 Esquema | Nombre | Tipo  |  Dueño
-++---+--
 pgq | consumer   | tabla | postgres
 pgq | event_1| tabla | postgres
 pgq | event_1_0  | tabla | postgres
 pgq | event_1_1  | tabla | postgres
 pgq | event_1_2  | tabla | postgres
 pgq | event_2| tabla | postgres
 pgq | event_2_0  | tabla | postgres
 pgq | event_2_1  | tabla | postgres
 pgq | event_2_2  | tabla | postgres
 pgq | event_3| tabla | postgres
 pgq | event_3_0  | tabla | postgres
 pgq | event_3_1  | tabla | postgres
 pgq | event_3_2  | tabla | postgres
 pgq | event_4| tabla | postgres
 pgq | event_4_0  | tabla | postgres
 pgq | event_4_1  | tabla | postgres
 pgq | event_4_2  | tabla | postgres
 pgq | event_template | tabla | postgres
 pgq | queue  | tabla | postgres
 pgq | retry_queue| tabla | postgres
 pgq | subscription   | tabla | postgres
 pgq | tick   | tabla | postgres
(22 filas)

And just to add something else into the whole annoyance, I'll add a user
table:

pruebas=# create table pgq.test_pgq_dumpable (id int primary key);
CREATE TABLE
pruebas=# \dt pgq.test_pgq_dumpable
 Listado de relaciones
 Esquema |  Nombre   | Tipo  |  Dueño
-+---+---+--
 pgq | test_pgq_dumpable | tabla | postgres
(1 fila)


To check that all objects are dumped, I just pipe the pg_dump to psql on
a new DB:

-bash-4.3$ pg_dump  pruebas | psql -d pruebas_pgq

Now, let's check what we have on this new DB:

pruebas_pgq=# \dt pgq.test_pgq_dumpable
No se encontraron relaciones coincidentes.
pruebas_pgq=# \dt pgq.*
Listado de relaciones
 Esquema | Nombre | Tipo  |  Dueño
-++---+--
 pgq | consumer   | tabla | postgres
 pgq | event_template | tabla | postgres
 pgq | queue  | tabla | postgres
 pgq | retry_queue| tabla | postgres
 pgq | subscription   | tabla | postgres
 pgq | tick   | tabla | postgres
(6 filas)


This problem came up due to a difference between pg_dump on 9.1.12 and
9.1.22 (I believe it was due to a patch on pg_dump that excluded the
dependent objects from being dumped), but here I'm using 9.5.3:

pruebas_pgq=# select version();
 version

-
 PostgreSQL 9.5.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 5.3.1
20160406 (Red Hat 5.3.1-6), 64-bit
(1 fila)


I'll file a bug report in a moment.
-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in postgres_fdw/deparse.c:1116

2016-06-16 Thread Etsuro Fujita

On 2016/06/15 9:13, Amit Langote wrote:

On 2016/06/15 0:50, Robert Haas wrote:

On Tue, Jun 14, 2016 at 4:06 AM, Amit Langote wrote:

Attached new version of the patch with following changes:



OK, I committed this version with some cosmetic changes.


Thank you all for working on this!

While reviewing the patch, I noticed that the patch is still 
restrictive.  Consider:


postgres=# explain verbose select ft1.a, (ft3.a IS NOT NULL) from (ft1 
inner join ft2 on ft1.a = ft2.a) left join ft3 on ft1.a = ft3.a;

 QUERY PLAN

  Foreign Scan  (cost=100.00..103.10 rows=2 width=5)
Output: ft1.a, (ft3.a IS NOT NULL)
Relations: ((public.ft1) INNER JOIN (public.ft2)) LEFT JOIN 
(public.ft3)
Remote SQL: SELECT r1.a, r4.a FROM ((public.t1 r1 INNER JOIN 
public.t2 r2 ON (((r1.a = r2.a LEFT JOIN public.t3 r4 ON (((r1.a = 
r4.a

(4 rows)

That's great, but:

postgres=# explain verbose select * from t1 left join (select ft1.a, 
(ft3.a IS NOT NULL) from (ft1 inner join ft2 on ft1.a = ft2.a) left join 
ft3 on ft1.a = ft3.a) ss (a, b) on t1.a = ss.a;

QUERY PLAN

  Hash Right Join  (cost=202.11..204.25 rows=3 width=13)
Output: t1.a, t1.b, ft1.a, ((ft3.a IS NOT NULL))
Hash Cond: (ft1.a = t1.a)
->  Hash Left Join  (cost=201.04..203.15 rows=2 width=5)
  Output: ft1.a, (ft3.a IS NOT NULL)
  Hash Cond: (ft1.a = ft3.a)
  ->  Foreign Scan  (cost=100.00..102.09 rows=2 width=4)
Output: ft1.a
Relations: (public.ft1) INNER JOIN (public.ft2)
Remote SQL: SELECT r4.a FROM (public.t1 r4 INNER JOIN 
public.t2 r5 ON (((r4.a = r5.a

  ->  Hash  (cost=101.03..101.03 rows=1 width=4)
Output: ft3.a
->  Foreign Scan on public.ft3  (cost=100.00..101.03 
rows=1 width=4)

  Output: ft3.a
  Remote SQL: SELECT a FROM public.t3
->  Hash  (cost=1.03..1.03 rows=3 width=8)
  Output: t1.a, t1.b
  ->  Seq Scan on public.t1  (cost=0.00..1.03 rows=3 width=8)
Output: t1.a, t1.b
(19 rows)

As in the example shown upthread, we could still push down the 
ft1-ft2-ft3 join and then perform the join between the result and t1. 
However, the patch doesn't allow that, because ph_eval_at is (b 4 7) and 
relids for the ft1-ft2-ft3 join is (b 4 5 7), and so the 
bms_nonempty_difference(relids, phinfo->ph_eval_at) test returns true.


ISTM that a robuster solution to this is to push down the ft1-ft2-ft3 
join with the PHV by extending deparseExplicitTargetList() and/or 
something else so that we can send the remote query like:


select ft1.a, (ft3.a IS NOT NULL) from (ft1 inner join ft2 on ft1.a = 
ft2.a) left join ft3 on ft1.a = ft3.a


Right?

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MultiXactId error after upgrade to 9.3.4

2016-06-16 Thread Andrew Gierth
> "Alvaro" == Alvaro Herrera  writes:

 Alvaro> I think that was a good choice in general so that
 Alvaro> possibly-data-eating bugs could be reported, but there's a
 Alvaro> problem in the specific case of tuples carried over by
 Alvaro> pg_upgrade whose Multixact is "further in the future" compared
 Alvaro> to the nextMultiXactId counter.  I think it's pretty clear that
 Alvaro> we should let that error be downgraded to DEBUG too, like the
 Alvaro> other checks.

But that leaves an obvious third issue: it's all very well to ignore the
pre-upgrade (pre-9.3) mxid if it's older than the cutoff or it's in the
future, but what if it's actually inside the currently valid range?
Looking it up as though it were a valid mxid in that case seems to be
completely wrong and could introduce more subtle errors.

(It can, AFAICT, be inside the currently valid range due to wraparound,
i.e. without there being a valid pg_multixact entry for it, because
AFAICT in 9.2, once the mxid is hinted dead it is never again either
looked up or cleared, so it can sit in the tuple xmax forever, even
through multiple wraparounds.)

Why is the correct rule not "check for and ignore pre-upgrade mxids
before even trying to fetch members"?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sslmode=require fallback

2016-06-16 Thread Magnus Hagander
On Thu, Jun 16, 2016 at 10:39 AM, Jakob Egger  wrote:

> Hi!
>
> I've looked at the way libpq handles TLS certificates and plaintext
> fallback, and I am somewhat surprised.
>
> The default ssmode is prefer. According to the documentation, this will
> make libpq use an SSL connection if possible, but will use a plain text
> connection as a fallback. The certificate will not be verified.
>
> If, however, there is a root certificate in ~/.postgresql/root.crt, libpq
> will check if the server cert matches this certificate, and refuse any
> certfificates that don't match. This means that libpq will fall back to a
> plain text connection!
>
> This is very unexpected behavior! Shouldn't libpq prefer an
> *unauthenticated but encrypted* connection over an *unauthenticated and
> unencrypted* connection?
>

You would think so.

The default mode of "prefer" is ridiculous in a lot of ways. If you are
using SSL in any shape or form you should simply not use "prefer". That's
really the only answer at this point, unfortunately.


This behavior also causes sslmode=require to behave like sslmode=verify-ca
> when ~/.postgresql/root.crt exists.
>

Correct. That's mainly for really old backwards compatibility. We could
have a "sslmode=verify-none" to reverse that, I guess. I'm not sure if this
scenario is common enough to care about though?


>From my limited understanding, it seems the way to fix this would be in
> fe-secure-openssl.c, to change initialize_SSL() to only read the root
> certificate file when sslmode=verify_*
>
> However, if this is the expected behavior, the documentation at
> https://www.postgresql.org/docs/current/static/libpq-ssl.html should be
> updated to make this more clear. It should be made clear that the existence
> of the file ~/.postgresql/root.crt changes the behavior of sslmode=require
> and sslmode=prefer.
>


Agreed. It's basically backwards compatibility with something that was
badly documented in the first place :) That's not a particularly strong
argument for the way it is. Clarifying the documentation would definitely
be a good improvement.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


[HACKERS] sslmode=require fallback

2016-06-16 Thread Jakob Egger
Hi!

I've looked at the way libpq handles TLS certificates and plaintext fallback, 
and I am somewhat surprised.

The default ssmode is prefer. According to the documentation, this will make 
libpq use an SSL connection if possible, but will use a plain text connection 
as a fallback. The certificate will not be verified.

If, however, there is a root certificate in ~/.postgresql/root.crt, libpq will 
check if the server cert matches this certificate, and refuse any certfificates 
that don't match. This means that libpq will fall back to a plain text 
connection!

This is very unexpected behavior! Shouldn't libpq prefer an *unauthenticated 
but encrypted* connection over an *unauthenticated and unencrypted* connection?

This behavior also causes sslmode=require to behave like sslmode=verify-ca when 
~/.postgresql/root.crt exists.

From my limited understanding, it seems the way to fix this would be in 
fe-secure-openssl.c, to change initialize_SSL() to only read the root 
certificate file when sslmode=verify_*

However, if this is the expected behavior, the documentation at 
https://www.postgresql.org/docs/current/static/libpq-ssl.html 
 should be 
updated to make this more clear. It should be made clear that the existence of 
the file ~/.postgresql/root.crt changes the behavior of sslmode=require and 
sslmode=prefer.

Best regards,
Jakob

Re: [HACKERS] Hash Indexes

2016-06-16 Thread Amit Kapila
On Tue, May 10, 2016 at 5:39 PM, Amit Kapila 
wrote:
>
>
> Incomplete Splits
> --
> Incomplete splits can be completed either by vacuum or insert as both
> needs exclusive lock on bucket.  If vacuum finds split-in-progress flag on
> a bucket then it will complete the split operation, vacuum won't see this
> flag if actually split is in progress on that bucket as vacuum needs
> cleanup lock and split retains pin till end of operation.  To make it work
> for Insert operation, one simple idea could be that if insert finds
> split-in-progress flag, then it releases the current exclusive lock on
> bucket and tries to acquire a cleanup lock on bucket, if it gets cleanup
> lock, then it can complete the split and then the insertion of tuple, else
> it will have a exclusive lock on bucket and just perform the insertion of
> tuple.  The disadvantage of trying to complete the split in vacuum is that
> split might require new pages and allocating new pages at time of vacuum is
> not advisable.  The disadvantage of doing it at time of Insert is that
> Insert might skip it even if there is some scan on the bucket is going on
> as scan will also retain pin on the bucket, but I think that is not a big
> deal.  The actual completion of split can be done in two ways: (a) scan
> the new bucket and build a hash table with all of the TIDs you find
> there.  When copying tuples from the old bucket, first probe the hash
> table; if you find a match, just skip that tuple (idea suggested by
> Robert Haas offlist) (b) delete all the tuples that are marked as
> moved_by_split in the new bucket and perform the split operation from the
> beginning using old bucket.
>
>
I have completed the patch with respect to incomplete splits and delayed
cleanup of garbage tuples.  For incomplete splits, I have used the option
(a) as mentioned above.  The incomplete splits are completed if the
insertion sees split-in-progress flag in a bucket.  The second major thing
this new version of patch has achieved is cleanup of garbage tuples i.e the
tuples that are left in old bucket during split.  Currently (in HEAD), as
part of a split operation, we clean the tuples from old bucket after moving
them to new bucket, as we have heavy-weight locks on both old and new
bucket till the whole split operation.  In the new design, we need to take
cleanup lock on old bucket and exclusive lock on new bucket to perform the
split operation and we don't retain those locks till the end (release the
lock as we move on to overflow buckets).  Now to cleanup the tuples we need
a cleanup lock on a bucket which we might not have at split-end.  So I
choose to perform the cleanup of garbage tuples during vacuum and when
re-split of the bucket happens as during both the operations, we do hold
cleanup lock.  We can extend the cleanup of garbage to other operations as
well if required.

I have done some performance tests with this new version of patch and
results are on same lines as in my previous e-mail.  I have done some
functional testing of the patch as well.  I think more detailed testing is
required, however it is better to do that once the design is discussed and
agreed upon.

I have improved the code comments to make the new design clear, but still
one can have questions related to locking decisions I have taken in patch.
I think one of the important thing to verify in the patch is locking
strategy used for different operations.  I have changed heavy-weight locks
to a light-weight read and write locks and a cleanup lock for vacuum and
split operation.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


concurrent_hash_index_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] forcing a rebuild of the visibility map

2016-06-16 Thread Masahiko Sawada
On Thu, Jun 16, 2016 at 12:06 PM, Robert Haas  wrote:
> [ Changing subject line in the hopes of keeping separate issues in
> separate threads. ]
>
> On Mon, Jun 6, 2016 at 11:00 AM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> I'm intuitively sympathetic to the idea that we should have an option
>>> for this, but I can't figure out in what case we'd actually tell
>>> anyone to use it.  It would be useful for the kinds of bugs listed
>>> above to have VACUUM (rebuild_vm) to blow away the VM fork and rebuild
>>> it, but that's different semantics than what we proposed for VACUUM
>>> (even_frozen_pages).  And I'd be sort of inclined to handle that case
>>> by providing some other way to remove VM forks (like a new function in
>>> the pg_visibilitymap contrib module, maybe?) and then just tell people
>>> to run regular VACUUM afterwards, rather than putting the actual VM
>>> fork removal into VACUUM.
>>
>> There's a lot to be said for that approach.  If we do it, I'd be a bit
>> inclined to offer an option to blow away the FSM as well.
>
> After having been scared out of my mind by the discovery of
> longstanding breakage in heap_update[1], it occurred to me that this
> is an excellent example of a case in which the option for which Andres
> was agitating - specifically forcing VACUUM to visit ever single page
> in the heap - would be useful.  Even if there's functionality
> available to blow away the visibility map forks, you might not want to
> just go remove them all and re-vacuum everything, because while you
> were rebuilding the VM forks, index-only scans would stop being
> index-only, and that might cause an operational problem.  Being able
> to simply force every page to be visited is better.  Patch for that
> attached.  I decided to call the option disable_page_skipping rather
> than even_frozen_pages because it should also force visiting pages
> that are all-visible but not all-frozen.  (I was sorely tempted to go
> with the competing proposal of calling it VACUUM (THEWHOLEDAMNTHING)
> but I refrained.)
>
> However, I also think that the approach described above - providing a
> way to excise VM forks specifically - is useful.  Patch for that
> attached, too.  It turns out that can't be done without either adding
> a new WAL record type or extending an existing one; I chose to add a
> "flags" argument to XLOG_SMGR_TRUNCATE, specifying the forks to be
> truncated.  Since this will require bumping the XLOG version, if we're
> going to do it, and I think we should, it would be good to try to get
> it done before beta2 rather than closer to release.  However, I don't
> want to commit this without some review, so please review this.
> Actually, please review both patches.[2]  The same core support could
> be used to add an option to truncate the FSM, but I didn't write a
> patch for that because I'm incredibly { busy | lazy }.
>

Option name DISABLE_PAGE_SKIPPING is good to me.
I'm still working on this, but here are some review comments.

---
+CREATE FUNCTION pg_truncate_visibility_map(regclass)
+RETURNS void
+AS 'MODULE_PATHNAME', 'pg_truncate_visibility_map'
+LANGUAGE C STRICT
+PARALLEL UNSAFE;  -- let's not make this any more dangerous
+

"REVOKE ALL ON FUNCTION pg_truncate_visibility_map(regclass) FROM
PUBLIC;" is missing in pg_visibility/pg_visibility--1.0--1.1.sql.

---
I think that VACUUM with VERBOSE option can show the information for
how many frozen pages we skipped like autovacuum does. Thought?
Patch attached.

Regards,

--
Masahiko Sawada
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 7a67fa5..ddbb835 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -1329,6 +1329,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	"Skipped %u pages due to buffer pins.\n",
 	vacrelstats->pinskipped_pages),
 	 vacrelstats->pinskipped_pages);
+	appendStringInfo(, _("Skipped %u frozen pages.\n"),
+	 vacrelstats->frozenskipped_pages);
 	appendStringInfo(, ngettext("%u page is entirely empty.\n",
 	"%u pages are entirely empty.\n",
 	empty_pages),

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers