On 2015/10/10 10:17, Robert Haas wrote:
On Thu, Oct 8, 2015 at 11:00 PM, Etsuro Fujita
<fujita.ets...@lab.ntt.co.jp> wrote:
The best plan is presumably something like this as you said before:

LockRows
-> Nested Loop
     -> Seq Scan on verysmall v
     -> Foreign Scan on bigft1 and bigft2
          Remote SQL: SELECT * FROM bigft1 JOIN bigft2 ON bigft1.x =
bigft2.x AND bigft1.q = $1 AND bigft2.r = $2

Consider the EvalPlanQual testing to see if the updated version of a
tuple in v satisfies the query.  If we use the column in the testing, we
would get the wrong results in some cases.

More precisely, we would get the wrong result when the value of v.q or v.r
in the updated version has changed.

Interesting test case.  It's worth considering why this works if you
were to replace the Foreign Scan with an Index Scan; suppose the query
is SELECT * FROM verysmall v LEFT JOIN realbiglocaltable t ON v.x =
t.x FOR UPDATE OF v, so that you get:

LockRows
-> Nested Loop
   -> Seq Scan on verysmall v
   -> Foreign Scan on realbiglocaltable t
     Index Cond: v.x = t.x

In your example, the remote SQL pushes down certain quals to the
remote server, and so if we just return the same tuple they might no
longer be satisfied.  In this example, the index qual is essentially a
filter condition that has been "pushed down" into the index AM.  The
EvalPlanQual machinery prevents this from generating wrong answers by
rechecking the index cond - see IndexRecheck.  Even though it's
normally the AM's job to enforce the index cond, and the executor does
not need to recheck, in the EvalPlanQual case it does need to recheck.

I think the foreign data wrapper case should be handled the same way.
Any condition that we initially pushed down to the foreign server
needs to be locally rechecked if we're inside EPQ.

Agreed.

As KaiGai-san also pointed out before, I think we should address this in each of the following cases:

1) remote qual (scanrelid>0)
2) remote join (scanrelid==0)

As for #1, I noticed that there is a bug in handling the same kind of FDW queries, which will be shown below. As you said, I think this should be addressed by rechecking the remote quals *locally*. (I thought another fix for this kind of bug before, though.) IIUC, I think this should be fixed separately from #2, as this is a bug not only in 9.5, but in back branches. Please find attached a patch.

Create an environment:

mydatabase=# create table t1 (a int primary key, b text);
mydatabase=# insert into t1 select a, 'notsolongtext' from generate_series(1, 1000000) a;

postgres=# create server myserver foreign data wrapper postgres_fdw options (dbname 'mydatabase');
postgres=# create user mapping for current_user server myserver;
postgres=# create foreign table ft1 (a int, b text) server myserver options (table_name 't1');
postgres=# alter foreign table ft1 options (add use_remote_estimate 'true');
postgres=# create table inttab (a int);
postgres=# insert into inttab select a from generate_series(1, 10) a;
postgres=# analyze ft1;
postgres=# analyze inttab;

Run concurrent transactions that produce incorrect result:

[Terminal1]
postgres=# begin;
BEGIN
postgres=# update inttab set a = a + 1 where a = 1;
UPDATE 1

[Terminal2]
postgres=# explain verbose select * from inttab, ft1 where inttab.a = ft1.a limit 1 for update;
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Limit  (cost=100.43..198.99 rows=1 width=70)
   Output: inttab.a, ft1.a, ft1.b, inttab.ctid, ft1.*
   ->  LockRows  (cost=100.43..1086.00 rows=10 width=70)
         Output: inttab.a, ft1.a, ft1.b, inttab.ctid, ft1.*
         ->  Nested Loop  (cost=100.43..1085.90 rows=10 width=70)
               Output: inttab.a, ft1.a, ft1.b, inttab.ctid, ft1.*
-> Seq Scan on public.inttab (cost=0.00..1.10 rows=10 width=10)
                     Output: inttab.a, inttab.ctid
-> Foreign Scan on public.ft1 (cost=100.43..108.47 rows=1 width=18)
                     Output: ft1.a, ft1.b, ft1.*
Remote SQL: SELECT a, b FROM public.t1 WHERE (($1::integer = a)) FOR UPDATE
(11 rows)

postgres=# select * from inttab, ft1 where inttab.a = ft1.a limit 1 for update;

[Terminal1]
postgres=# commit;
COMMIT

[Terminal2]
(After the commit in Terminal1, the following result will be shown in Terminal2. Note that the values of inttab.a and ft1.a wouldn't satisfy the remote qual!)
 a | a |       b
---+---+---------------
 2 | 1 | notsolongtext
(1 row)

As for #2, I didn't come up with any solution to locally rechecking pushed-down join conditions against a joined tuple populated from a column that we discussed. Instead, I'd like to revise a local-join-execution-plan-based approach that we discussed before, by addressing your comments such as [1]. Would it be the right way to go?

Best regards,
Etsuro Fujita

[1] http://www.postgresql.org/message-id/ca+tgmoaazs0dr23r7ptbseqfwotuvcpnbqdhxebo9gi+dmx...@mail.gmail.com
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
***************
*** 563,569 **** fileGetForeignPlan(PlannerInfo *root,
  							scan_relid,
  							NIL,	/* no expressions to evaluate */
  							best_path->fdw_private,
! 							NIL /* no custom tlist */ );
  }
  
  /*
--- 563,570 ----
  							scan_relid,
  							NIL,	/* no expressions to evaluate */
  							best_path->fdw_private,
! 							NIL,	/* no custom tlist */
! 							NIL /* no remote quals */ );
  }
  
  /*
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 748,753 **** postgresGetForeignPlan(PlannerInfo *root,
--- 748,754 ----
  	Index		scan_relid = baserel->relid;
  	List	   *fdw_private;
  	List	   *remote_conds = NIL;
+ 	List	   *remote_exprs = NIL;
  	List	   *local_exprs = NIL;
  	List	   *params_list = NIL;
  	List	   *retrieved_attrs;
***************
*** 769,776 **** postgresGetForeignPlan(PlannerInfo *root,
  	 *
  	 * This code must match "extract_actual_clauses(scan_clauses, false)"
  	 * except for the additional decision about remote versus local execution.
! 	 * Note however that we only strip the RestrictInfo nodes from the
! 	 * local_exprs list, since appendWhereClause expects a list of
  	 * RestrictInfos.
  	 */
  	foreach(lc, scan_clauses)
--- 770,777 ----
  	 *
  	 * This code must match "extract_actual_clauses(scan_clauses, false)"
  	 * except for the additional decision about remote versus local execution.
! 	 * Note however that we don't strip the RestrictInfo nodes from the
! 	 * remote_conds list, since appendWhereClause expects a list of
  	 * RestrictInfos.
  	 */
  	foreach(lc, scan_clauses)
***************
*** 784,794 **** postgresGetForeignPlan(PlannerInfo *root,
--- 785,801 ----
  			continue;
  
  		if (list_member_ptr(fpinfo->remote_conds, rinfo))
+ 		{
  			remote_conds = lappend(remote_conds, rinfo);
+ 			remote_exprs = lappend(remote_exprs, rinfo->clause);
+ 		}
  		else if (list_member_ptr(fpinfo->local_conds, rinfo))
  			local_exprs = lappend(local_exprs, rinfo->clause);
  		else if (is_foreign_expr(root, baserel, rinfo->clause))
+ 		{
  			remote_conds = lappend(remote_conds, rinfo);
+ 			remote_exprs = lappend(remote_exprs, rinfo->clause);
+ 		}
  		else
  			local_exprs = lappend(local_exprs, rinfo->clause);
  	}
***************
*** 874,880 **** postgresGetForeignPlan(PlannerInfo *root,
  							scan_relid,
  							params_list,
  							fdw_private,
! 							NIL /* no custom tlist */ );
  }
  
  /*
--- 881,888 ----
  							scan_relid,
  							params_list,
  							fdw_private,
! 							NIL,	/* no custom tlist */
! 							remote_exprs);
  }
  
  /*
*** a/src/backend/executor/nodeForeignscan.c
--- b/src/backend/executor/nodeForeignscan.c
***************
*** 72,79 **** ForeignNext(ForeignScanState *node)
  static bool
  ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot)
  {
! 	/* There are no access-method-specific conditions to recheck. */
! 	return true;
  }
  
  /* ----------------------------------------------------------------
--- 72,90 ----
  static bool
  ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot)
  {
! 	ExprContext *econtext;
! 
! 	/*
! 	 * extract necessary information from foreign scan node
! 	 */
! 	econtext = node->ss.ps.ps_ExprContext;
! 
! 	/* Does the tuple meet the remote qual condition? */
! 	econtext->ecxt_scantuple = slot;
! 
! 	ResetExprContext(econtext);
! 
! 	return ExecQual(node->fdw_scan_quals, econtext, false);
  }
  
  /* ----------------------------------------------------------------
***************
*** 135,140 **** ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
--- 146,154 ----
  	scanstate->ss.ps.qual = (List *)
  		ExecInitExpr((Expr *) node->scan.plan.qual,
  					 (PlanState *) scanstate);
+ 	scanstate->fdw_scan_quals = (List *)
+ 		ExecInitExpr((Expr *) node->fdw_scan_quals,
+ 					 (PlanState *) scanstate);
  
  	/*
  	 * tuple table initialization
*** a/src/backend/nodes/copyfuncs.c
--- b/src/backend/nodes/copyfuncs.c
***************
*** 648,653 **** _copyForeignScan(const ForeignScan *from)
--- 648,654 ----
  	COPY_NODE_FIELD(fdw_exprs);
  	COPY_NODE_FIELD(fdw_private);
  	COPY_NODE_FIELD(fdw_scan_tlist);
+ 	COPY_NODE_FIELD(fdw_scan_quals);
  	COPY_BITMAPSET_FIELD(fs_relids);
  	COPY_SCALAR_FIELD(fsSystemCol);
  
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
***************
*** 594,599 **** _outForeignScan(StringInfo str, const ForeignScan *node)
--- 594,600 ----
  	WRITE_NODE_FIELD(fdw_exprs);
  	WRITE_NODE_FIELD(fdw_private);
  	WRITE_NODE_FIELD(fdw_scan_tlist);
+ 	WRITE_NODE_FIELD(fdw_scan_quals);
  	WRITE_BITMAPSET_FIELD(fs_relids);
  	WRITE_BOOL_FIELD(fsSystemCol);
  }
*** a/src/backend/optimizer/plan/createplan.c
--- b/src/backend/optimizer/plan/createplan.c
***************
*** 2153,2158 **** create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
--- 2153,2160 ----
  			replace_nestloop_params(root, (Node *) scan_plan->scan.plan.qual);
  		scan_plan->fdw_exprs = (List *)
  			replace_nestloop_params(root, (Node *) scan_plan->fdw_exprs);
+ 		scan_plan->fdw_scan_quals = (List *)
+ 			replace_nestloop_params(root, (Node *) scan_plan->fdw_scan_quals);
  	}
  
  	/*
***************
*** 3738,3744 **** make_foreignscan(List *qptlist,
  				 Index scanrelid,
  				 List *fdw_exprs,
  				 List *fdw_private,
! 				 List *fdw_scan_tlist)
  {
  	ForeignScan *node = makeNode(ForeignScan);
  	Plan	   *plan = &node->scan.plan;
--- 3740,3747 ----
  				 Index scanrelid,
  				 List *fdw_exprs,
  				 List *fdw_private,
! 				 List *fdw_scan_tlist,
! 				 List *fdw_scan_quals)
  {
  	ForeignScan *node = makeNode(ForeignScan);
  	Plan	   *plan = &node->scan.plan;
***************
*** 3754,3759 **** make_foreignscan(List *qptlist,
--- 3757,3763 ----
  	node->fdw_exprs = fdw_exprs;
  	node->fdw_private = fdw_private;
  	node->fdw_scan_tlist = fdw_scan_tlist;
+ 	node->fdw_scan_quals = fdw_scan_quals;
  	/* fs_relids will be filled in by create_foreignscan_plan */
  	node->fs_relids = NULL;
  	/* fsSystemCol will be filled in by create_foreignscan_plan */
*** a/src/backend/optimizer/plan/setrefs.c
--- b/src/backend/optimizer/plan/setrefs.c
***************
*** 1140,1145 **** set_foreignscan_references(PlannerInfo *root,
--- 1140,1148 ----
  			fix_scan_list(root, fscan->scan.plan.qual, rtoffset);
  		fscan->fdw_exprs =
  			fix_scan_list(root, fscan->fdw_exprs, rtoffset);
+ 		/* fdw_scan_quals too */
+ 		fscan->fdw_scan_quals =
+ 			fix_scan_list(root, fscan->fdw_scan_quals, rtoffset);
  	}
  
  	/* Adjust fs_relids if needed */
*** a/src/backend/optimizer/plan/subselect.c
--- b/src/backend/optimizer/plan/subselect.c
***************
*** 2396,2401 **** finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
--- 2396,2407 ----
  		case T_ForeignScan:
  			finalize_primnode((Node *) ((ForeignScan *) plan)->fdw_exprs,
  							  &context);
+ 
+ 			/*
+ 			 * We need not look at fdw_scan_quals, since it will have the same
+ 			 * param references as fdw_exprs.
+ 			 */
+ 
  			/* We assume fdw_scan_tlist cannot contain Params */
  			context.paramids = bms_add_members(context.paramids, scan_params);
  			break;
*** a/src/include/nodes/execnodes.h
--- b/src/include/nodes/execnodes.h
***************
*** 1579,1584 **** typedef struct WorkTableScanState
--- 1579,1585 ----
  typedef struct ForeignScanState
  {
  	ScanState	ss;				/* its first field is NodeTag */
+ 	List	   *fdw_scan_quals;	/* remote quals if foreign table */
  	/* use struct pointer to avoid including fdwapi.h here */
  	struct FdwRoutine *fdwroutine;
  	void	   *fdw_state;		/* foreign-data wrapper can keep state here */
*** a/src/include/nodes/plannodes.h
--- b/src/include/nodes/plannodes.h
***************
*** 524,529 **** typedef struct ForeignScan
--- 524,530 ----
  	List	   *fdw_exprs;		/* expressions that FDW may evaluate */
  	List	   *fdw_private;	/* private data for FDW */
  	List	   *fdw_scan_tlist; /* optional tlist describing scan tuple */
+ 	List	   *fdw_scan_quals;	/* remote quals if foreign table */
  	Bitmapset  *fs_relids;		/* RTIs generated by this scan */
  	bool		fsSystemCol;	/* true if any "system column" is needed */
  } ForeignScan;
*** a/src/include/optimizer/planmain.h
--- b/src/include/optimizer/planmain.h
***************
*** 45,51 **** extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
  				  Index scanrelid, Plan *subplan);
  extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
  				 Index scanrelid, List *fdw_exprs, List *fdw_private,
! 				 List *fdw_scan_tlist);
  extern Append *make_append(List *appendplans, List *tlist);
  extern RecursiveUnion *make_recursive_union(List *tlist,
  					 Plan *lefttree, Plan *righttree, int wtParam,
--- 45,51 ----
  				  Index scanrelid, Plan *subplan);
  extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
  				 Index scanrelid, List *fdw_exprs, List *fdw_private,
! 				 List *fdw_scan_tlist, List *fdw_scan_quals);
  extern Append *make_append(List *appendplans, List *tlist);
  extern RecursiveUnion *make_recursive_union(List *tlist,
  					 Plan *lefttree, Plan *righttree, int wtParam,
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to