tshauck commented on PR #422:
URL: https://github.com/apache/datafusion-comet/pull/422#issuecomment-2112964714

   Notes from comet community meeting:
   
   * Possible to get link for presentation slides and video?
     * Andy to publish video
     * Liang Chi has slides here: 
https://docs.google.com/presentation/d/1H0fF2MOkkBK8fPBlnqK6LejUeLcVD917JhVWfp3mb8A/edit#slide=id.p
   * `CometSparkSessionExtension` injects rules into spark session extension
     * E.g. replace spark scan with comet scan
     * Replace spark execution with comet execution class
   * `QueryPlanSerde` takes spark expression an serializes via proto
   * Cast/types support is not fully supported (e.g. `Add(left, right, _)` 
checks if `supportedDataTypes(left.dataType)`.
   * operator.proto defines `Operator`
   * `message Expr`... need to update `expr.proto` (See `Add` for example)
     * left, right, return type
   * `PhysicalPlanner` is the physical planner struct
     * `create_expr` and `create_plan` are two most important parts
     * gets children from protobuf message and calls `create_expr` or 
`create_plan` to generate the physical plan
     * `spark_expr.expr_struct` is matched on then passed to appropriate part
     * Can sometimes use built-in support for expressions (e.g. is not null 
expression), this _only_ works if the implement is compatible w/ spark (e.g. 
cast)
   * Adding a Cast
     * datafusion's `PhysicalExpr` is implemented for `Cast`.
   * Spark version differences (e.g. `failOnError` is a common change from 3.3 
to 3.4+)
     * use shim layer to handle API differences
   * `CometExpressionSuite` is for testing
     * use `checkSparkAnswerAndOperator`
   
   
   Questions:
   
   * Which expressions to support? Try to get support for the full expression.
     * 3.4 main difference is failOnError
     * Expect 4.0 to be reasonably difference
   * How to handle new expressions? Add new expression into datafusion if 
_very_ common expression with standard behavior
   * Which other areas are important? Aiming for complete coverage... `CAST` is 
recently a hotbed of activity and high priority... tests are important
   * Unrelated to expressions: first release? Andy has started copying over the 
basic datafusion scripts to get out a source release, but ultimate goal is jar 
file with native execution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to