Re: [HACKERS] TEXT vs VARCHAR join qual push down diffrence, bug or expected?

2015-09-24 Thread Jeevan Chalke
On Wed, Sep 23, 2015 at 7:29 PM, Tom Lane  wrote:

> Removing that entirely would be quite incorrect, because then you'd be
> lying to the parent node about what collation your node outputs.
>

Yes. I too thought so and thus wanted to fix that code block by
considering the default collation.


>
> After thinking a bit more about the existing special case for non-foreign
> Vars, I wonder if what we should do is change these code blocks to look
> like
>
> collation = r->resultcollid;
> if (collation == InvalidOid)
> state = FDW_COLLATE_NONE;
> else if (inner_cxt.state == FDW_COLLATE_SAFE &&
>  collation == inner_cxt.collation)
> state = FDW_COLLATE_SAFE;
> +   else if (collation == DEFAULT_COLLATION_OID)
> +   state = FDW_COLLATE_NONE;
> else
> state = FDW_COLLATE_UNSAFE;
>
> That is, only explicit introduction of a non-default collation causes
> a subexpression to get labeled FDW_COLLATE_UNSAFE.  Default collations
> would lose out when getting merged with a nondefault collation from a
> foreign Var, so they should work all right.
>

Agree.
I had suggested similar changes in approach (2)
but you put that check at exact required place.


> regards, tom lane
>



-- 
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] TEXT vs VARCHAR join qual push down diffrence, bug or expected?

2015-09-24 Thread Jeevan Chalke
On Wed, Sep 23, 2015 at 10:15 PM, Tom Lane  wrote:

> I wrote:
> > Hm ... actually, we probably need *both* types of changes if that's
> > what we believe the state values mean.
>
>
I too was confused with the state explanations from the code-comments which
we have them now. With your explanation here clears that.
Thanks for explaining those.

After a bit more thinking and experimentation, I propose the attached
> patch.
>

I had a look over the patch and reviewed it. It is in excellent state to
check-in.


> regards, tom lane
>
>

Thanks
-- 
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] TEXT vs VARCHAR join qual push down diffrence, bug or expected?

2015-09-24 Thread Tom Lane
Jeevan Chalke  writes:
> On Wed, Sep 23, 2015 at 10:15 PM, Tom Lane  wrote:
>> After a bit more thinking and experimentation, I propose the attached
>> patch.

> I had a look over the patch and reviewed it. It is in excellent state to
> check-in.

After further thought I decided that the base case for
Const/Param/non-foreign-Vars wasn't quite right either.  If we don't like
the collation we should just set the state to UNSAFE not fail immediately,
because it might appear in a context where collation doesn't matter.
An example is "var IS NOT NULL".

So I've committed the attached modification of that patch.

regards, tom lane

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 81cb2b4..697de60 100644
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***
*** 17,27 
   * We do not consider that it is ever safe to send COLLATE expressions to
   * the remote server: it might not have the same collation names we do.
   * (Later we might consider it safe to send COLLATE "C", but even that would
!  * fail on old remote servers.)  An expression is considered safe to send only
!  * if all collations used in it are traceable to Var(s) of the foreign table.
!  * That implies that if the remote server gets a different answer than we do,
!  * the foreign table's columns are not marked with collations that match the
!  * remote table's columns, which we can consider to be user error.
   *
   * Portions Copyright (c) 2012-2015, PostgreSQL Global Development Group
   *
--- 17,28 
   * We do not consider that it is ever safe to send COLLATE expressions to
   * the remote server: it might not have the same collation names we do.
   * (Later we might consider it safe to send COLLATE "C", but even that would
!  * fail on old remote servers.)  An expression is considered safe to send
!  * only if all operator/function input collations used in it are traceable to
!  * Var(s) of the foreign table.  That implies that if the remote server gets
!  * a different answer than we do, the foreign table's columns are not marked
!  * with collations that match the remote table's columns, which we can
!  * consider to be user error.
   *
   * Portions Copyright (c) 2012-2015, PostgreSQL Global Development Group
   *
*** typedef struct foreign_glob_cxt
*** 69,77 
   */
  typedef enum
  {
! 	FDW_COLLATE_NONE,			/* expression is of a noncollatable type */
  	FDW_COLLATE_SAFE,			/* collation derives from a foreign Var */
! 	FDW_COLLATE_UNSAFE			/* collation derives from something else */
  } FDWCollateState;
  
  typedef struct foreign_loc_cxt
--- 70,81 
   */
  typedef enum
  {
! 	FDW_COLLATE_NONE,			/* expression is of a noncollatable type, or
!  * it has default collation that is not
!  * traceable to a foreign Var */
  	FDW_COLLATE_SAFE,			/* collation derives from a foreign Var */
! 	FDW_COLLATE_UNSAFE			/* collation is non-default and derives from
!  * something other than a foreign Var */
  } FDWCollateState;
  
  typedef struct foreign_loc_cxt
*** foreign_expr_walker(Node *node,
*** 272,284 
  else
  {
  	/* Var belongs to some other table */
! 	if (var->varcollid != InvalidOid &&
! 		var->varcollid != DEFAULT_COLLATION_OID)
! 		return false;
! 
! 	/* We can consider that it doesn't set collation */
! 	collation = InvalidOid;
! 	state = FDW_COLLATE_NONE;
  }
  			}
  			break;
--- 276,299 
  else
  {
  	/* Var belongs to some other table */
! 	collation = var->varcollid;
! 	if (collation == InvalidOid ||
! 		collation == DEFAULT_COLLATION_OID)
! 	{
! 		/*
! 		 * It's noncollatable, or it's safe to combine with a
! 		 * collatable foreign Var, so set state to NONE.
! 		 */
! 		state = FDW_COLLATE_NONE;
! 	}
! 	else
! 	{
! 		/*
! 		 * Do not fail right away, since the Var might appear
! 		 * in a collation-insensitive context.
! 		 */
! 		state = FDW_COLLATE_UNSAFE;
! 	}
  }
  			}
  			break;
*** foreign_expr_walker(Node *node,
*** 288,303 
  
  /*
   * If the constant has nondefault collation, either it's of a
!  * non-builtin type, or it reflects folding of a CollateExpr;
!  * either way, it's unsafe to send to the remote.
   */
! if (c->constcollid != InvalidOid &&
! 	c->constcollid != DEFAULT_COLLATION_OID)
! 	return false;
! 
! /* Otherwise, we can consider that it doesn't set collation */
! collation = InvalidOid;
! state = FDW_COLLATE_NONE;
  			}
  			break;
  		case T_Param:
--- 303,318 
  
  /*
   * If the constant has nondefault collation, either it's of a
!  * non-builtin type, or it reflects folding of a CollateExpr.
!  * It's unsafe to send to the remote unless it's used in a
!  * 

Re: [HACKERS] TEXT vs VARCHAR join qual push down diffrence, bug or expected?

2015-09-24 Thread Jeevan Chalke
On Thu, Sep 24, 2015 at 10:22 PM, Tom Lane  wrote:

> Jeevan Chalke  writes:
> > On Wed, Sep 23, 2015 at 10:15 PM, Tom Lane  wrote:
> >> After a bit more thinking and experimentation, I propose the attached
> >> patch.
>
> > I had a look over the patch and reviewed it. It is in excellent state to
> > check-in.
>
> After further thought I decided that the base case for
> Const/Param/non-foreign-Vars wasn't quite right either.  If we don't like
> the collation we should just set the state to UNSAFE not fail immediately,
> because it might appear in a context where collation doesn't matter.
> An example is "var IS NOT NULL".
>

Make sense.


>
> So I've committed the attached modification of that patch.
>
> Thanks


> regards, tom lane
>
>


-- 
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] TEXT vs VARCHAR join qual push down diffrence, bug or expected?

2015-09-23 Thread Jeevan Chalke
Hi Tom,

On Tue, Sep 22, 2015 at 12:31 AM, Tom Lane  wrote:

>
> I think you're blaming the wrong code; RelabelType is handled basically
> the same as most other cases.
>
> It strikes me that this function is really going about things the wrong
> way.  Rather than trying to determine the output collation per se, what
> we ought to be asking is "does every operator in the proposed expression
> have an input collation that can be traced to some foreign Var further
> down in the expression"?  That is, given the example in hand,
>
> RelabelType(ForeignVar) = RelabelType(LocalVar)
>
> the logic ought to be like "the ForeignVar has collation X, and that
> bubbles up without change through the RelabelType, and then the equals
> operator's inputcollation matches that, so accept it --- regardless of
> where the other operand's collation came from exactly".  The key point
> is that we want to validate operator input collations, not output
> collations, as having something to do with what the remote side would do.
>
> This would represent a fairly significant rewrite of foreign_expr_walker's
> collation logic; although I think the end result would be no more
> complicated, possibly simpler, than it is now.
>
> regards, tom lane
>


IIUC, you are saying that collation check for output collation is not
necessary for all OpExpr/FuncExpr/ArrayRef etc.
Should we remove code blocks like
collation = r->resultcollid;
if (collation == InvalidOid)
state = FDW_COLLATE_NONE;
else if (inner_cxt.state == FDW_COLLATE_SAFE &&
 collation == inner_cxt.collation)
state = FDW_COLLATE_SAFE;
else
state = FDW_COLLATE_UNSAFE;

and just bubble up the collation and state to the next level?

Here I see that, in the result collation validation, we missed the case when
result collation is default collation. For foreign var, we return collation
as is in inner context with the state set to SAFE. But in case of local var,
we are only allowing InvalidOid or Default oid for collation, however while
returning back, we set collation to InvalidOid and state to NONE even for
default collation. I think we are missing something here.

To handle this case, we need to either,
1. allow local var to set inner_cxt collation to what var actually has
(which
will be either Invalid or DEFAULT) and set state to NONE if non collable or
set state to SAFE if default collation. Like:
  In T_Var, local var case
collation = var->varcollid;
state = OidIsValid(collation) ? FDW_COLLATE_SAFE : FDW_COLLATE_NONE;

OR
2. In above code block, which checks result collation, we need to handle
default collation. Like:
else if (collation == DEFAULT_COLLATION_OID)
state = inner_cxt.state;

Let me know if I missed any.

Thanks

-- 
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] TEXT vs VARCHAR join qual push down diffrence, bug or expected?

2015-09-23 Thread Tom Lane
Jeevan Chalke  writes:
> On Tue, Sep 22, 2015 at 12:31 AM, Tom Lane  wrote:
>> It strikes me that this function is really going about things the wrong
>> way.  Rather than trying to determine the output collation per se, what
>> we ought to be asking is "does every operator in the proposed expression
>> have an input collation that can be traced to some foreign Var further
>> down in the expression"?

> IIUC, you are saying that collation check for output collation is not
> necessary for all OpExpr/FuncExpr/ArrayRef etc.
> Should we remove code blocks like
> collation = r->resultcollid;
> if (collation == InvalidOid)
> state = FDW_COLLATE_NONE;
> else if (inner_cxt.state == FDW_COLLATE_SAFE &&
>  collation == inner_cxt.collation)
> state = FDW_COLLATE_SAFE;
> else
> state = FDW_COLLATE_UNSAFE;

> and just bubble up the collation and state to the next level?

Removing that entirely would be quite incorrect, because then you'd be
lying to the parent node about what collation your node outputs.

After thinking a bit more about the existing special case for non-foreign
Vars, I wonder if what we should do is change these code blocks to look
like

collation = r->resultcollid;
if (collation == InvalidOid)
state = FDW_COLLATE_NONE;
else if (inner_cxt.state == FDW_COLLATE_SAFE &&
 collation == inner_cxt.collation)
state = FDW_COLLATE_SAFE;
+   else if (collation == DEFAULT_COLLATION_OID)
+   state = FDW_COLLATE_NONE;
else
state = FDW_COLLATE_UNSAFE;

That is, only explicit introduction of a non-default collation causes
a subexpression to get labeled FDW_COLLATE_UNSAFE.  Default collations
would lose out when getting merged with a nondefault collation from a
foreign Var, so they should work all right.

The core point here is that we're going to send the expression to the
remote without any COLLATE clauses, so the remote's parser has to
come to the same conclusions we did about which collation to apply.
We assume that default-collation-throughout will work all right.
Nondefault collations will work as long as they originate from foreign
Vars, because then the remote parser should see the equivalent far-end
collations originating from those Vars --- and those collations win when
combined with default collations.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TEXT vs VARCHAR join qual push down diffrence, bug or expected?

2015-09-23 Thread Tom Lane
I wrote:
> After thinking a bit more about the existing special case for non-foreign
> Vars, I wonder if what we should do is change these code blocks to look
> like

> collation = r->resultcollid;
> if (collation == InvalidOid)
> state = FDW_COLLATE_NONE;
> else if (inner_cxt.state == FDW_COLLATE_SAFE &&
>  collation == inner_cxt.collation)
> state = FDW_COLLATE_SAFE;
> + else if (collation == DEFAULT_COLLATION_OID)
> + state = FDW_COLLATE_NONE;
> else
> state = FDW_COLLATE_UNSAFE;

On further thought, maybe it's the special case for non-foreign Vars
that is busted.  That is, non-foreign Vars should do

/* Var belongs to some other table */
if (var->varcollid != InvalidOid &&
var->varcollid != DEFAULT_COLLATION_OID)
return false;

-   /* We can consider that it doesn't set collation */
-   collation = InvalidOid;
-   state = FDW_COLLATE_NONE;
+   collation = var->varcollid;
+   state = OidIsValid(collation) ? FDW_COLLATE_SAFE : 
FDW_COLLATE_NONE;

and likewise for Consts, Params, etc.

This would basically mean clarifying the meaning of the state values as:

FDW_COLLATE_NONE: the expression is of a noncollatable type, period.

FDW_COLLATE_SAFE: the expression has default collation, or a nondefault
collation that is traceable to a foreign Var.

FDW_COLLATE_UNSAFE: the expression has a nondefault collation that is not
traceable to a foreign Var.

Hm ... actually, we probably need *both* types of changes if that's
what we believe the state values mean.

An alternative definition would be that FDW_COLLATE_NONE subsumes the
"collation doesn't trace to a foreign Var, but it's default so we don't
really care" case.  I think the problem we've got is that the
non-foreign-Var code thinks that's what the definition is, but the rest
of the code isn't quite consistent with it.

In any case I think we want to end up with a clearer specification of
what the states mean.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TEXT vs VARCHAR join qual push down diffrence, bug or expected?

2015-09-23 Thread Tom Lane
I wrote:
> Hm ... actually, we probably need *both* types of changes if that's
> what we believe the state values mean.

After a bit more thinking and experimentation, I propose the attached
patch.

regards, tom lane

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 81cb2b4..f64482c 100644
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***
*** 17,27 
   * We do not consider that it is ever safe to send COLLATE expressions to
   * the remote server: it might not have the same collation names we do.
   * (Later we might consider it safe to send COLLATE "C", but even that would
!  * fail on old remote servers.)  An expression is considered safe to send only
!  * if all collations used in it are traceable to Var(s) of the foreign table.
!  * That implies that if the remote server gets a different answer than we do,
!  * the foreign table's columns are not marked with collations that match the
!  * remote table's columns, which we can consider to be user error.
   *
   * Portions Copyright (c) 2012-2015, PostgreSQL Global Development Group
   *
--- 17,28 
   * We do not consider that it is ever safe to send COLLATE expressions to
   * the remote server: it might not have the same collation names we do.
   * (Later we might consider it safe to send COLLATE "C", but even that would
!  * fail on old remote servers.)  An expression is considered safe to send
!  * only if all operator/function input collations used in it are traceable to
!  * Var(s) of the foreign table.  That implies that if the remote server gets
!  * a different answer than we do, the foreign table's columns are not marked
!  * with collations that match the remote table's columns, which we can
!  * consider to be user error.
   *
   * Portions Copyright (c) 2012-2015, PostgreSQL Global Development Group
   *
*** typedef struct foreign_glob_cxt
*** 69,77 
   */
  typedef enum
  {
! 	FDW_COLLATE_NONE,			/* expression is of a noncollatable type */
  	FDW_COLLATE_SAFE,			/* collation derives from a foreign Var */
! 	FDW_COLLATE_UNSAFE			/* collation derives from something else */
  } FDWCollateState;
  
  typedef struct foreign_loc_cxt
--- 70,81 
   */
  typedef enum
  {
! 	FDW_COLLATE_NONE,			/* expression is of a noncollatable type, or
!  * it has default collation that is not
!  * traceable to a foreign Var */
  	FDW_COLLATE_SAFE,			/* collation derives from a foreign Var */
! 	FDW_COLLATE_UNSAFE			/* collation is non-default and derives from
!  * something other than a foreign Var */
  } FDWCollateState;
  
  typedef struct foreign_loc_cxt
*** foreign_expr_walker(Node *node,
*** 272,283 
  else
  {
  	/* Var belongs to some other table */
! 	if (var->varcollid != InvalidOid &&
! 		var->varcollid != DEFAULT_COLLATION_OID)
  		return false;
! 
! 	/* We can consider that it doesn't set collation */
! 	collation = InvalidOid;
  	state = FDW_COLLATE_NONE;
  }
  			}
--- 276,286 
  else
  {
  	/* Var belongs to some other table */
! 	collation = var->varcollid;
! 	if (collation != InvalidOid &&
! 		collation != DEFAULT_COLLATION_OID)
  		return false;
! 	/* For either allowed collation, the state is NONE */
  	state = FDW_COLLATE_NONE;
  }
  			}
*** foreign_expr_walker(Node *node,
*** 291,302 
   * non-builtin type, or it reflects folding of a CollateExpr;
   * either way, it's unsafe to send to the remote.
   */
! if (c->constcollid != InvalidOid &&
! 	c->constcollid != DEFAULT_COLLATION_OID)
  	return false;
! 
! /* Otherwise, we can consider that it doesn't set collation */
! collation = InvalidOid;
  state = FDW_COLLATE_NONE;
  			}
  			break;
--- 294,304 
   * non-builtin type, or it reflects folding of a CollateExpr;
   * either way, it's unsafe to send to the remote.
   */
! collation = c->constcollid;
! if (collation != InvalidOid &&
! 	collation != DEFAULT_COLLATION_OID)
  	return false;
! /* For either allowed collation, the state is NONE */
  state = FDW_COLLATE_NONE;
  			}
  			break;
*** foreign_expr_walker(Node *node,
*** 307,317 
  /*
   * Collation handling is same as for Consts.
   */
! if (p->paramcollid != InvalidOid &&
! 	p->paramcollid != DEFAULT_COLLATION_OID)
  	return false;
- 
- collation = InvalidOid;
  state = FDW_COLLATE_NONE;
  			}
  			break;
--- 309,318 
  /*
   * Collation handling is same as for Consts.
   */
! collation = p->paramcollid;
! if (collation != InvalidOid &&
! 	collation != DEFAULT_COLLATION_OID)
  	return false;
  state = FDW_COLLATE_NONE;
  			}
  			break;
*** foreign_expr_walker(Node *node,
*** 348,353 
--- 349,356 
  else if (inner_cxt.state == 

Re: [HACKERS] TEXT vs VARCHAR join qual push down diffrence, bug or expected?

2015-09-21 Thread Tom Lane
Jeevan Chalke  writes:
> It is observed that, when we have one remote (huge) table and one local
> (small) table and a join between them, then
>  1. If the column type is text, then we push the join qual to the remote
> server, so that we will have less rows to fetch, and thus execution time
> is very less.
>  2. If the column type is varchar, then we do not push the join qual to the
> remote server, resulting into large number of data fetch and thus
> execution time is very high.

Hmm ...

> Also given that RelabelType are just dummy wrapper for binary compatible
> types, can we simply set collation and state from its inner context instead
> on above check block.

I think you're blaming the wrong code; RelabelType is handled basically
the same as most other cases.

It strikes me that this function is really going about things the wrong
way.  Rather than trying to determine the output collation per se, what
we ought to be asking is "does every operator in the proposed expression
have an input collation that can be traced to some foreign Var further
down in the expression"?  That is, given the example in hand,

RelabelType(ForeignVar) = RelabelType(LocalVar)

the logic ought to be like "the ForeignVar has collation X, and that
bubbles up without change through the RelabelType, and then the equals
operator's inputcollation matches that, so accept it --- regardless of
where the other operand's collation came from exactly".  The key point
is that we want to validate operator input collations, not output
collations, as having something to do with what the remote side would do.

This would represent a fairly significant rewrite of foreign_expr_walker's
collation logic; although I think the end result would be no more
complicated, possibly simpler, than it is now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers