KaiGai
On Tue, Nov 19, 2013 at 9:41 AM, Kohei KaiGai <[email protected]> wrote:
> Thanks for your review.
>
> 2013/11/19 Jim Mlodgenski <[email protected]>:
> > My initial review on this feature:
> > - The patches apply and build, but it produces a warning:
> > ctidscan.c: In function ‘CTidInitCustomScanPlan’:
> > ctidscan.c:362:9: warning: unused variable ‘scan_relid’
> [-Wunused-variable]
> >
> This variable was only used in Assert() macro, so it causes a warning if
> you
> don't put --enable-cassert on the configure script.
> Anyway, I adjusted the code to check relid of RelOptInfo directly.
>
The warning is now gone.
> > I'd recommend that you split the part1 patch containing the ctidscan
> contrib
> > into its own patch. It is more than half of the patch and its certainly
> > stands on its own. IMO, I think ctidscan fits a very specific use case
> and
> > would be better off being an extension instead of in contrib.
> >
> OK, I split them off. The part-1 is custom-scan API itself, the part-2 is
> ctidscan portion, and the part-3 is remote join on postgres_fdw.
>
Attached is a patch for the documentation. I think the documentation still
needs a little more work, but it is pretty close. I can add some more
detail to it once finish adapting the hadoop_fdw to using the custom scan
api and have a better understanding of all of the calls.
> Thanks,
> --
> KaiGai Kohei <[email protected]>
>
*** a/doc/src/sgml/custom-scan.sgml 2013-11-18 17:50:02.652039003 -0500
--- b/doc/src/sgml/custom-scan.sgml 2013-11-22 09:09:13.624254649 -0500
***************
*** 8,47 ****
<secondary>handler for</secondary>
</indexterm>
<para>
! Custom-scan API enables extension to provide alternative ways to scan or
! join relations, being fully integrated with cost based optimizer,
! in addition to the built-in implementation.
! It consists of a set of callbacks, with a unique name, to be invoked during
! query planning and execution. Custom-scan provider should implement these
! callback functions according to the expectation of API.
</para>
<para>
! Overall, here is four major jobs that custom-scan provider should implement.
! The first one is registration of custom-scan provider itself. Usually, it
! shall be done once at <literal>_PG_init()</literal> entrypoint on module
! loading.
! The other three jobs shall be done for each query planning and execution.
! The second one is submission of candidate paths to scan or join relations,
! with an adequate cost, for the core planner.
! Then, planner shall chooses a cheapest path from all the candidates.
! If custom path survived, the planner kicks the third job; construction of
! <literal>CustomScan</literal> plan node, being located within query plan
! tree instead of the built-in plan node.
! The last one is execution of its implementation in answer to invocations
! by the core executor.
</para>
<para>
! Some of contrib module utilize the custom-scan API. It may be able to
! provide a good example for new development.
<variablelist>
<varlistentry>
<term><xref linkend="ctidscan"></term>
<listitem>
<para>
! Its logic enables to skip earlier pages or terminate scan prior to
! end of the relation, if inequality operator on <literal>ctid</literal>
! system column can narrow down the scope to be scanned, instead of
! the sequential scan that reads a relation from the head to the end.
</para>
</listitem>
</varlistentry>
--- 8,46 ----
<secondary>handler for</secondary>
</indexterm>
<para>
! The custom-scan API enables an extension to provide alternative ways to scan
! or join relations leveraging the cost based optimizer. The API consists of a
! set of callbacks, with a unique names, to be invoked during query planning
! and execution. A custom-scan provider should implement these callback
! functions according to the expectation of the API.
</para>
<para>
! Overall, there are four major tasks that a custom-scan provider should
! implement. The first task is the registration of custom-scan provider itself.
! Usually, this needs to be done once at the <literal>_PG_init()</literal>
! entrypoint when the module is loading. The remaing three tasks are all done
! when a query is planning and executing. The second task is the submission of
! candidate paths to either scan or join relations with an adequate cost for
! the core planner. Then, the planner will choose the cheapest path from all of
! the candidates. If the custom path survived, the planner starts the third
! task; construction of a <literal>CustomScan</literal> plan node, located
! within the query plan tree instead of the built-in plan node. The last task
! is the execution of its implementation in answer to invocations by the core
! executor.
</para>
<para>
! Some of contrib modules utilize the custom-scan API. They may provide a good
! example for new development.
<variablelist>
<varlistentry>
<term><xref linkend="ctidscan"></term>
<listitem>
<para>
! This custom scan in this module enables a scan to skip earlier pages or
! terminate prior to end of the relation, if the inequality operator on the
! <literal>ctid</literal> system column can narrow down the scope to be
! scanned, instead of a sequential scan which reads a relation from the
! head to the end.
</para>
</listitem>
</varlistentry>
***************
*** 49,70 ****
<term><xref linkend="postgres-fdw"></term>
<listitem>
<para>
! Its logic replaces a local join of foreign tables managed by
! <literal>postgres_fdw</literal> with a custom scan that fetches
! remotely joined relations.
! It shows the way to implement a custom scan node that performs
! instead join nodes.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
<para>
! Right now, only scan and join are supported to have fully integrated cost
! based query optimization performing on custom scan API.
! You might be able to implement other stuff, like sort or aggregation, with
! manipulation of the planned tree, however, extension has to be responsible
! to handle this replacement correctly. Here is no support by the core.
</para>
<sect1 id="custom-scan-spec">
--- 48,68 ----
<term><xref linkend="postgres-fdw"></term>
<listitem>
<para>
! This custom scan in this module replaces a local join of foreign tables
! managed by <literal>postgres_fdw</literal> with a scan that fetches
! remotely joined relations. It demostrates the way to implement a custom
! scan node that performs join nodes.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
<para>
! Currently, only scan and join are fully supported with integrated cost
! based query optimization using the custom scan API. You might be able to
! implement other stuff, like sort or aggregation, with manipulation of the
! planned tree, however, the extension has to be responsible to handle this
! replacement correctly. There is no support in the core.
</para>
<sect1 id="custom-scan-spec">
***************
*** 72,80 ****
<sect2 id="custom-scan-register">
<title>Registration of custom scan provider</title>
<para>
! The first job for custom scan provider is registration of a set of
! callbacks with a unique name. Usually, it shall be done once on
! <literal>_PG_init()</literal> entrypoint of module loading.
<programlisting>
void
register_custom_provider(const CustomProvider *provider);
--- 70,78 ----
<sect2 id="custom-scan-register">
<title>Registration of custom scan provider</title>
<para>
! The first task for a custom scan provider is the registration of a set of
! callbacks with a unique names. Usually, this is done once upon module
! loading in the <literal>_PG_init()</literal> entrypoint.
<programlisting>
void
register_custom_provider(const CustomProvider *provider);
***************
*** 90,105 ****
<sect2 id="custom-scan-path">
<title>Submission of custom paths</title>
<para>
! The query planner finds out the best way to scan or join relations from
! the various potential paths; combination of a scan algorithm and target
! relations.
! Prior to this selection, we list up all the potential paths towards
! a target relation (if base relation) or a pair of relations (if join).
! The <literal>add_scan_path_hook</> and <literal>add_join_path_hook</>
! allows extensions to add alternative scan paths in addition to built-in
! ones.
If custom-scan provider can submit a potential scan path towards the
! supplied relation, it shall construct <literal>CustomPath</> object
with appropriate parameters.
<programlisting>
typedef struct CustomPath
--- 88,102 ----
<sect2 id="custom-scan-path">
<title>Submission of custom paths</title>
<para>
! The query planner finds the best way to scan or join relations from various
! potential paths using a combination of scan algorithms and target
! relations. Prior to this selection, we list all of the potential paths
! towards a target relation (if it is a base relation) or a pair of relations
! (if it is a join). The <literal>add_scan_path_hook</> and
! <literal>add_join_path_hook</> allow extensions to add alternative scan
! paths in addition to built-in paths.
If custom-scan provider can submit a potential scan path towards the
! supplied relation, it shall construct a <literal>CustomPath</> object
with appropriate parameters.
<programlisting>
typedef struct CustomPath
***************
*** 110,118 ****
List *custom_private; /* can be used for private data */
} CustomPath;
</programlisting>
! Its <literal>path</> is common field for all the path nodes to store
! cost estimation. In addition, <literal>custom_name</> is the name of
! registered custom scan provider, <literal>custom_flags</> is a set of
flags below, and <literal>custom_private</> can be used to store private
data of the custom scan provider.
</para>
--- 107,115 ----
List *custom_private; /* can be used for private data */
} CustomPath;
</programlisting>
! Its <literal>path</> is a common field for all the path nodes to store
! a cost estimation. In addition, <literal>custom_name</> is the name of
! the registered custom scan provider, <literal>custom_flags</> is a set of
flags below, and <literal>custom_private</> can be used to store private
data of the custom scan provider.
</para>
***************
*** 125,132 ****
It informs the query planner this custom scan node supports
<literal>ExecMarkPosCustomScan</> and
<literal>ExecRestorePosCustomScan</> methods.
! Also, custom scan provider has to be responsible to mark and restore
! a particular position.
</para>
</listitem>
</varlistentry>
--- 122,129 ----
It informs the query planner this custom scan node supports
<literal>ExecMarkPosCustomScan</> and
<literal>ExecRestorePosCustomScan</> methods.
! Also, the custom scan provider has to be responsible to mark and
! restore a particular position.
</para>
</listitem>
</varlistentry>
***************
*** 135,141 ****
<listitem>
<para>
It informs the query planner this custom scan node supports
! backward scan.
Also, custom scan provider has to be responsible to scan with
backward direction.
</para>
--- 132,138 ----
<listitem>
<para>
It informs the query planner this custom scan node supports
! backward scans.
Also, custom scan provider has to be responsible to scan with
backward direction.
</para>
***************
*** 148,157 ****
<sect2 id="custom-scan-plan">
<title>Construction of custom plan node</title>
<para>
! Once <literal>CustomPath</literal> got choosen by query planner,
! it calls back its associated custom scan provider to complete setting
! up <literal>CustomScan</literal> plan node according to the path
! information.
<programlisting>
void
InitCustomScanPlan(PlannerInfo *root,
--- 145,154 ----
<sect2 id="custom-scan-plan">
<title>Construction of custom plan node</title>
<para>
! Once <literal>CustomPath</literal> was choosen by the query planner,
! it calls back to its associated to the custom scan provider to complete
! setting up the <literal>CustomScan</literal> plan node according to the
! path information.
<programlisting>
void
InitCustomScanPlan(PlannerInfo *root,
***************
*** 160,180 ****
List *tlist,
List *scan_clauses);
</programlisting>
! Query planner does basic initialization on the <literal>cscan_plan</>
! being allocated, then custom scan provider can apply final initialization.
! <literal>cscan_path</> is the path node that was constructed on the
! previous stage then got choosen.
<literal>tlist</> is a list of <literal>TargetEntry</> to be assigned
on the <literal>Plan</> portion in the <literal>cscan_plan</>.
Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to
! be checked during relation scan. Its expression portion shall be also
assigned on the <literal>Plan</> portion, but can be eliminated from
this list if custom scan provider can handle these checks by itself.
</para>
<para>
It often needs to adjust <literal>varno</> of <literal>Var</> node that
! references a particular scan node, after conscruction of plan node.
! For example, Var node in the target list of join node originally
references a particular relation underlying a join, however, it has to
be adjusted to either inner or outer reference.
<programlisting>
--- 157,177 ----
List *tlist,
List *scan_clauses);
</programlisting>
! The query planner does basic initialization on the <literal>cscan_plan</>
! being allocated, then the custom scan provider can apply final
! initialization. <literal>cscan_path</> is the path node that was
! constructed on the previous stage then was choosen.
<literal>tlist</> is a list of <literal>TargetEntry</> to be assigned
on the <literal>Plan</> portion in the <literal>cscan_plan</>.
Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to
! be checked during a relation scan. Its expression portion will also be
assigned on the <literal>Plan</> portion, but can be eliminated from
this list if custom scan provider can handle these checks by itself.
</para>
<para>
It often needs to adjust <literal>varno</> of <literal>Var</> node that
! references a particular scan node, after construction of the plan node.
! For example, Var node in the target list of the join node originally
references a particular relation underlying a join, however, it has to
be adjusted to either inner or outer reference.
<programlisting>
***************
*** 183,191 ****
CustomScan *cscan_plan,
int rtoffset);
</programlisting>
! This callback is optional if custom scan node is a vanilla relation
! scan because here is nothing special to do. Elsewhere, it needs to
! be handled by custom scan provider in case when a custom scan replaced
a join with two or more relations for example.
</para>
</sect2>
--- 180,188 ----
CustomScan *cscan_plan,
int rtoffset);
</programlisting>
! This callback is optional if the custom scan node is a vanilla relation
! scan because there is nothing special to do. Elsewhere, it needs to
! be handled by the custom scan provider in case when a custom scan replaced
a join with two or more relations for example.
</para>
</sect2>
***************
*** 193,200 ****
<sect2 id="custom-scan-exec">
<title>Execution of custom scan node</title>
<para>
! Query execuror also launches associated callbacks to begin, execute and
! end custom scan according to the executor's manner.
</para>
<para>
<programlisting>
--- 190,197 ----
<sect2 id="custom-scan-exec">
<title>Execution of custom scan node</title>
<para>
! The query executor also launches the associated callbacks to begin, execute
! and end the custom scan according to the executor's manner.
</para>
<para>
<programlisting>
***************
*** 202,217 ****
BeginCustomScan(CustomScanState *csstate, int eflags);
</programlisting>
It begins execution of the custom scan on starting up executor.
! It allows custom scan provider to do any initialization job around this
! plan, however, it is not a good idea to launch actual scanning jobs.
(It shall be done on the first invocation of <literal>ExecCustomScan</>
instead.)
The <literal>custom_state</> field of <literal>CustomScanState</> is
! intended to save the private state being managed by custom scan provider.
! Also, <literal>eflags</> has flag bits of the executor's operating mode
! for this plan node.
! Note that custom scan provider should not perform anything visible
! externally if <literal>EXEC_FLAG_EXPLAIN_ONLY</> would be given,
</para>
<para>
--- 199,214 ----
BeginCustomScan(CustomScanState *csstate, int eflags);
</programlisting>
It begins execution of the custom scan on starting up executor.
! It allows the custom scan provider to do any initialization job around this
! plan, however, it is not a good idea to launch the actual scanning jobs.
(It shall be done on the first invocation of <literal>ExecCustomScan</>
instead.)
The <literal>custom_state</> field of <literal>CustomScanState</> is
! intended to save the private state being managed by the custom scan
! provider. Also, <literal>eflags</> has flag bits of the executor's
! operating mode for this plan node. Note that the custom scan provider
! should not perform anything visible externally if
! <literal>EXEC_FLAG_EXPLAIN_ONLY</> would be given,
</para>
<para>
***************
*** 219,229 ****
TupleTableSlot *
ExecCustomScan(CustomScanState *csstate);
</programlisting>
! It fetches one tuple from the underlying relation or relations if join
according to the custom logic. Unlike <literal>IterateForeignScan</>
! method in foreign table, it is also responsible to check whether next
tuple matches the qualifier of this scan, or not.
! A usual way to implement this method is the callback performs just an
entrypoint of <literal>ExecQual</> with its own access method.
</para>
--- 216,226 ----
TupleTableSlot *
ExecCustomScan(CustomScanState *csstate);
</programlisting>
! It fetches one tuple from the underlying relation or relations, if joining,
according to the custom logic. Unlike <literal>IterateForeignScan</>
! method in foreign table, it is also responsible to check whether the next
tuple matches the qualifier of this scan, or not.
! The usual way to implement this method is the callback performs just an
entrypoint of <literal>ExecQual</> with its own access method.
</para>
***************
*** 232,240 ****
Node *
MultiExecCustomScan(CustomScanState *csstate);
</programlisting>
! It fetches multiple tuples from the underlying relation or relations if
! join according to the custom logic. Pay attention the data format (and
! the way to return also) depends on the type of upper node.
</para>
<para>
--- 229,237 ----
Node *
MultiExecCustomScan(CustomScanState *csstate);
</programlisting>
! It fetches multiple tuples from the underlying relation or relations, if
! joining, according to the custom logic. Pay attention the data format (and
! the way to return also) since it depends on the type of upper node.
</para>
<para>
***************
*** 242,248 ****
void
EndCustomScan(CustomScanState *csstate);
</programlisting>
! It ends the scan and release resources privately allocated.
It is usually not important to release memory in per-execution memory
context. So, all this callback should be responsible is its own
resources regardless from the framework.
--- 239,245 ----
void
EndCustomScan(CustomScanState *csstate);
</programlisting>
! It ends the scan and releases resources privately allocated.
It is usually not important to release memory in per-execution memory
context. So, all this callback should be responsible is its own
resources regardless from the framework.
***************
*** 257,263 ****
ReScanCustomScan(CustomScanState *csstate);
</programlisting>
It restarts the current scan from the beginning.
! Note that parameters of the scan depends on might change values,
so rewinded scan does not need to return exactly identical tuples.
</para>
<para>
--- 254,260 ----
ReScanCustomScan(CustomScanState *csstate);
</programlisting>
It restarts the current scan from the beginning.
! Note that parameters of the scan depends on may change values,
so rewinded scan does not need to return exactly identical tuples.
</para>
<para>
***************
*** 276,282 ****
RestorePosCustom(CustomScanState *csstate);
</programlisting>
It rewinds the current position of the custom scan to the position
! where <literal>MarkPosCustomScan</> saved before.
Note that it is optional to implement, only when
<literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
</para>
--- 273,279 ----
RestorePosCustom(CustomScanState *csstate);
</programlisting>
It rewinds the current position of the custom scan to the position
! where <literal>MarkPosCustomScan</> was saved before.
Note that it is optional to implement, only when
<literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
</para>
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers