Re: [HACKERS] Exposing quals

Heikki Linnakangas Tue, 08 Jul 2008 07:52:00 -0700

Simon Riggs wrote:

The notes say "Heikki doesn't think this is a long term solution", but
in the following discussion it was the *only* way of doing this that
will work with non-PostgreSQL databases. So it seems like the way we
would want to go, yes?

How did you come to the conclusion that this is the only way that willwork with non-PostgreSQL databases? I don't see any limitations likethat in any of the proposed approaches.


I guess I should clarify my position on this:

We should start moving towards a full SQL:MED solution that willultimately support pushing down joins, aggregates etc. to the remotedatabase. Including support for transaction control, using 2PC, and costestimation and intelligent planning.

This should be done in an extensible way, so that people can write theirown plugins to connect to different RDBMSs, as well as simple datasources like flat files. The plugin needs to be able to control whichparts of a plan tree can be pushed down to the remote source, estimatethe cost of remote execution, and map remote data types to local ones.And it then needs to be able to construct and execute the remote partsof a plan.

We're obviously not going to get all that overnight, but whatever weimplement now should be the first step towards that, rather thansomething that we need to deprecate and replace in the future.Unfortunately I don't see a way to extend the proposed "exposing qualsto functions" patch to do more than just that.

The list of functionality a full-blown plugin will need is quite long. Idon't think there's any hope of supporting all that without reachinginto some PostgreSQL internal data structures, particularly the plannerstructures like RelOptInfo, Path and Plan. The plugins will be moretightly integrated into the system than say user defined data types.They will need to be written in C, and they will be somewhat versiondependent. Simpler plugins, like one to read CSV files, with no "pushingdown" and no update support, will need less access to internals, andthus will be less version dependent, so pgfoundry projects like thatwill be feasible.

Note that the dependency on internal data structures doesn't go away bysaying that they're passed as text; the text representation of our datastructures is version dependent as well.

So what would the plugin API look like? To hook into the planner, I'menvisioning the plugin would define these functions:


  /*
   * Generate a remote plan for executing a whole subquery remotely. For
   * example, if the query is an aggregate, we might be able to execute
   * the whole aggregate in the remote database. This will be called
   * from grouping_planner(), like optimize_minmax_aggregates().
   * Returns NULL if remote execution is not possible. (a dummy
   * implementation can always return NULL.
   */
  Plan *generate_remote_path(PlannerInfo *, List *tlist);

  /*
   * Generate a path for executing one relation in remote
   * database. The relation can be a base (non-join) remote relation,
   * or a join involving a remote relation. Can return NULL for join
   * relations if the join can't be executed remotely.
   */
  Path *generate_remote_path(PlannerInfo *, RelOptInfo *)

  /*
   * Create a Plan node from a Path. Called from create_plan, when
   * the planner chooses to use a remote path. A typical implementation
   * would create the SQL string to be executed in the remote database,
   * and return a RemotePlan node with that SQL string in it.
   */
  Plan *create_remote_plan(PlannerInfo *, RemotePath *)

On the execution side, the plugin needs to be able to execute apreviously generated RemotePlan. There would be a new executor nodetype, a RemoteScan, that would be similar to a seq scan or index scan,but delegates the actual execution to the plugin. The execution part ofthe plugin API would reflect the API of executor nodes, something like:


  void *scan_open(RemotePlan *)
  HeapTuple *scan_getnext(void *scanstate)
  void scan_close(void *scanstate)

The presumption here is that you would define remote tables with theappropriate SQL:MED statements beforehand (CREATE FOREIGN TABLE).However, it is flexible enough that you could implement the "exposingquals to functions" functionality with this as well:generate_remote_path() would need to recognize the function scans thatit can handle, and return a RemotePath struct with all the sameinformation as create_functionscan_path does (the cost estimates couldbe adjusted for the pushed down quals at this point as well).create_remote_plan would return a FunctionScan node, but with the extraqualifiers passed into the function as arguments. In case of dblink, itcould just add extra WHERE clauses to the query that's being passed asargument. I'm not proposing that we do the stuff described in thisparagraph, just using it as an example of the flexibility.

BTW, I think the "exposing quals to functions" functionality could beimplemented as a planner hook as well. The hook would call the standardplanner, and modify the plan tree after that, passing the quals as extraarguments to functions that can take advantage of them.

A "foreign data wrapper" interface is also defined in the SQL/MEDstandard. I've only looked at it briefly, but it seems provide roughlythe same functionality as the API I defined above. It would be a goodidea to look at that, though I don't think that part of the standard isvery widely adopted.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing quals

Reply via email to