minni31 opened a new pull request, #12080:
URL: https://github.com/apache/gluten/pull/12080

   ## CONTEXT
   
   `LocalTableScanExec` is a Spark physical operator that materializes 
in-memory data (e.g., from `Dataset.toDF()`, `spark.range()`, or constant 
relations optimized by the catalyst planner). Currently, Gluten does not 
offload this operator, so the output rows stay in Spark's internal row format 
and require a separate row-to-columnar conversion step before downstream Velox 
operators can consume them.
   
   This is the companion PR to #12077 (RDDScanExec support) — both follow the 
same design pattern.
   
   ## WHAT
   
   Adds a `VeloxLocalTableScanTransformer` that intercepts `LocalTableScanExec` 
in the offload rules and performs row-to-columnar conversion using Velox's 
native `RowToColumnarConverter`. The implementation:
   
   - Introduces a `LocalTableScanTransformer` base trait in `gluten-substrait` 
with the backend-agnostic contract (output attributes, row data, schema 
validation).
   - Adds the Velox-specific `VeloxLocalTableScanTransformer` that delegates 
schema validation to `VeloxValidatorApi.validateSchema` — the same canonical 
validator used by all other Velox operators — ensuring recursive complex-type 
validation, TimestampNTZ handling, and variant struct detection are handled 
consistently.
   - Wires up the offload rule in `OffloadSingleNodeRules` and the backend 
factory method in `VeloxSparkPlanExecApi`.
   - Skips offloading for streaming sources (`plan.getStream.isEmpty` guard) 
since those follow a different execution path.
   - Propagates `SQLMetrics` (numInputRows, numOutputBatches, convertTime) so 
conversion costs are visible in the Spark UI.
   - Uses the 7-parameter `toColumnarBatchIterator` overload to pass plan-level 
metrics to the native converter.
   
   ### Tests
   
   | Suite | Tests | Status |
   |-------|-------|--------|
   | `VeloxLocalTableScanSuite` | 7 tests covering primitive types, nullable 
columns, empty datasets, multi-partition, string columns, filter pushdown, 
multiple column types | Local pass |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to