Re: [PR] docs: Add more detailed architecture documentation [datafusion-comet]

via GitHub Tue, 17 Sep 2024 13:48:22 -0700


mbutrovich commented on code in PR #922:
URL: https://github.com/apache/datafusion-comet/pull/922#discussion_r1763994682



##########
docs/source/contributor-guide/plugin_overview.md:
##########
@@ -17,30 +17,41 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Comet Plugin Overview
+# Comet Plugin Architecture
 
-The entry point to Comet is the `org.apache.spark.CometPlugin` class, which 
can be registered with Spark by adding the following setting to the Spark 
configuration when launching `spark-shell` or `spark-submit`:
+## Comet SQL Plugin
+
+The entry point to Comet is the `org.apache.spark.CometPlugin` class, which 
can be registered with Spark by adding the
+following setting to the Spark configuration when launching `spark-shell` or 
`spark-submit`:
 
 ```
 --conf spark.plugins=org.apache.spark.CometPlugin
 ```
 
-On initialization, this class registers two physical plan optimization rules 
with Spark: `CometScanRule` and `CometExecRule`. These rules run whenever a 
query stage is being planned.
+On initialization, this class registers two physical plan optimization rules 
with Spark: `CometScanRule`
+and `CometExecRule`. These rules run whenever a query stage is being planned 
during Adaptive Query Execution, and
+run once for the entire plan when Adaptive Query Execution is disabled.
 
 ## CometScanRule
 
-`CometScanRule` replaces any Parquet scans with Comet Parquet scan classes.

Review Comment:
   Maybe "Spark v1 and v2 data sources" to disambiguate that it's not a Parquet 
or Comet concept.



##########
docs/source/contributor-guide/plugin_overview.md:
##########
@@ -51,9 +62,27 @@ Comet does not support partially replacing subsets of the 
plan within a query st
 transitions to convert between row-based and columnar data between Spark 
operators and Comet operators and the overhead
 of this could outweigh the benefits of running parts of the query stage 
natively in Comet.
 
-Once the plan has been transformed, it is serialized into Comet protocol 
buffer format by the `QueryPlanSerde` class
-and this serialized plan is passed into the native code by `CometExecIterator`.
+## Query Execution
+
+Once the plan has been transformed, any consecutive Comet operators are 
combined into a `CometNativeExec` which contains
+a serialized version of the plan (the serialization code can be found in 
`QueryPlanSerde`). When this operator is
+executed, the serialized plan is passed to the native code when calling 
`Native.createPlan`.
 
 In the native code there is a `PhysicalPlanner` struct (in `planner.rs`) which 
converts the serialized plan into an
-Apache DataFusion physical plan. In some cases, Comet provides specialized 
physical operators and expressions to
+Apache DataFusion `ExecutionPlan`. In some cases, Comet provides specialized 
physical operators and expressions to
 override the DataFusion versions to ensure compatibility with Apache Spark.
+
+`CometExecIterator` will invoke `Native.executePlan` to fetch the next batch 
from the native plan. This is repeated

Review Comment:
   fetch -> pull



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] docs: Add more detailed architecture documentation [datafusion-comet]

Reply via email to