mbutrovich commented on code in PR #922: URL: https://github.com/apache/datafusion-comet/pull/922#discussion_r1763994682
########## docs/source/contributor-guide/plugin_overview.md: ########## @@ -17,30 +17,41 @@ specific language governing permissions and limitations under the License. --> -# Comet Plugin Overview +# Comet Plugin Architecture -The entry point to Comet is the `org.apache.spark.CometPlugin` class, which can be registered with Spark by adding the following setting to the Spark configuration when launching `spark-shell` or `spark-submit`: +## Comet SQL Plugin + +The entry point to Comet is the `org.apache.spark.CometPlugin` class, which can be registered with Spark by adding the +following setting to the Spark configuration when launching `spark-shell` or `spark-submit`: ``` --conf spark.plugins=org.apache.spark.CometPlugin ``` -On initialization, this class registers two physical plan optimization rules with Spark: `CometScanRule` and `CometExecRule`. These rules run whenever a query stage is being planned. +On initialization, this class registers two physical plan optimization rules with Spark: `CometScanRule` +and `CometExecRule`. These rules run whenever a query stage is being planned during Adaptive Query Execution, and +run once for the entire plan when Adaptive Query Execution is disabled. ## CometScanRule -`CometScanRule` replaces any Parquet scans with Comet Parquet scan classes. Review Comment: Maybe "Spark v1 and v2 data sources" to disambiguate that it's not a Parquet or Comet concept. ########## docs/source/contributor-guide/plugin_overview.md: ########## @@ -51,9 +62,27 @@ Comet does not support partially replacing subsets of the plan within a query st transitions to convert between row-based and columnar data between Spark operators and Comet operators and the overhead of this could outweigh the benefits of running parts of the query stage natively in Comet. -Once the plan has been transformed, it is serialized into Comet protocol buffer format by the `QueryPlanSerde` class -and this serialized plan is passed into the native code by `CometExecIterator`. +## Query Execution + +Once the plan has been transformed, any consecutive Comet operators are combined into a `CometNativeExec` which contains +a serialized version of the plan (the serialization code can be found in `QueryPlanSerde`). When this operator is +executed, the serialized plan is passed to the native code when calling `Native.createPlan`. In the native code there is a `PhysicalPlanner` struct (in `planner.rs`) which converts the serialized plan into an -Apache DataFusion physical plan. In some cases, Comet provides specialized physical operators and expressions to +Apache DataFusion `ExecutionPlan`. In some cases, Comet provides specialized physical operators and expressions to override the DataFusion versions to ensure compatibility with Apache Spark. + +`CometExecIterator` will invoke `Native.executePlan` to fetch the next batch from the native plan. This is repeated Review Comment: fetch -> pull -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
