minni31 opened a new pull request, #12080: URL: https://github.com/apache/gluten/pull/12080
## CONTEXT `LocalTableScanExec` is a Spark physical operator that materializes in-memory data (e.g., from `Dataset.toDF()`, `spark.range()`, or constant relations optimized by the catalyst planner). Currently, Gluten does not offload this operator, so the output rows stay in Spark's internal row format and require a separate row-to-columnar conversion step before downstream Velox operators can consume them. This is the companion PR to #12077 (RDDScanExec support) — both follow the same design pattern. ## WHAT Adds a `VeloxLocalTableScanTransformer` that intercepts `LocalTableScanExec` in the offload rules and performs row-to-columnar conversion using Velox's native `RowToColumnarConverter`. The implementation: - Introduces a `LocalTableScanTransformer` base trait in `gluten-substrait` with the backend-agnostic contract (output attributes, row data, schema validation). - Adds the Velox-specific `VeloxLocalTableScanTransformer` that delegates schema validation to `VeloxValidatorApi.validateSchema` — the same canonical validator used by all other Velox operators — ensuring recursive complex-type validation, TimestampNTZ handling, and variant struct detection are handled consistently. - Wires up the offload rule in `OffloadSingleNodeRules` and the backend factory method in `VeloxSparkPlanExecApi`. - Skips offloading for streaming sources (`plan.getStream.isEmpty` guard) since those follow a different execution path. - Propagates `SQLMetrics` (numInputRows, numOutputBatches, convertTime) so conversion costs are visible in the Spark UI. - Uses the 7-parameter `toColumnarBatchIterator` overload to pass plan-level metrics to the native converter. ### Tests | Suite | Tests | Status | |-------|-------|--------| | `VeloxLocalTableScanSuite` | 7 tests covering primitive types, nullable columns, empty datasets, multi-partition, string columns, filter pushdown, multiple column types | Local pass | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
