[I] ColumnarPartialProject may additional C2R/C2R and case OOM and poor performance [incubator-gluten]

via GitHub Mon, 17 Mar 2025 01:03:35 -0700


weixiuli opened a new issue, #9025:
URL: https://github.com/apache/incubator-gluten/issues/9025


   ### Description
   
   When the follower of UDF project doesn't support columnar, and it may 
additional C2R and case OOM and poor performance. An example when we set 
`spark.gluten.sql.native.writer.enabled=true` (only for testing the follower of 
UDF project doesn't support columnar )and run the SQL:
   
   ```
   insert overwrite t1
   select (plus_one(l_extendedprice) * l_discount
    + hash(l_orderkey) + hash(l_comment)) as revenue
    from   lineitem
   ```
   ```
     +- Execute InsertIntoHadoopFsRelationCommand 
file:/root//git-gluten/spark-warehouse/org.apache.gluten.expression.UDFPartialProjectSuiteRasOn/t1,
 false, Parquet, 
[path=file:/root//git-gluten/spark-warehouse/org.apache.gluten.expression.UDFPartialProjectSuiteRasOn/t1],
 Overwrite, `spark_catalog`.`default`.`t1`, 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:/root//git-gluten/spark-warehouse/org.apache.gluten.expression.UDFPartialProjectSuiteRasOn/t1),
 [revenue]
            +- WriteFiles
               +- VeloxColumnarToRow
                  +- ^(2) ProjectExecTransformer 
[cast((((_SparkPartialProject0#64 * l_discount#6) + cast(hash(l_orderkey#0L, 
42) as decimal(10,0))) + cast(hash(l_comment#15, 42) as decimal(10,0))) as 
double) AS revenue#53]
                     +- ^(2) InputIteratorTransformer[l_orderkey#0L, 
l_extendedprice#5, l_discount#6, l_comment#15, _SparkPartialProject0#64]
                        +- ColumnarPartialProject Project [cast((((if 
(isnull(cast(l_extendedprice#5 as bigint))) cast(null as decimal(20,0)) else 
cast(plus_one(knownnotnull(cast(l_extendedprice#5 as bigint))) as 
decimal(20,0)) * l_discount#6) + cast(hash(l_orderkey#0L, 42) as 
decimal(10,0))) + cast(hash(l_comment#15, 42) as decimal(10,0))) as double) AS 
revenue#53] PartialProject List(if (isnull(cast(l_extendedprice#5 as bigint))) 
cast(null as decimal(20,0)) else 
cast(plus_one(knownnotnull(cast(l_extendedprice#5 as bigint))) as 
decimal(20,0)) AS _SparkPartialProject0#64)
                           +- ^(1) BatchScanTransformer parquet 
file:/root//git-gluten/backends-velox/target/scala-2.12/test-classes/tpch-data-parquet/lineitem[l_orderkey#0L,
 l_extendedprice#5, l_discount#6, l_comment#15] ParquetScan DataFilters: [], 
Format: parquet, Location: InMemoryFileIndex(1 
paths)[file:/root//git-gluten/backends-velox/target/scala-2.12/t..., 
PartitionFilters: [], PushedAggregation: [], PushedFilters: [], PushedGroupBy: 
[], ReadSchema: 
struct<l_orderkey:bigint,l_extendedprice:decimal(12,2),l_discount:decimal(12,2),l_comment:string>
 RuntimeFilters: [] NativeFilters: []
   ```
   
   With this plan we know that the ColumnarPartialProject has to add the C2R 
and R2C for the unsupported expression in the UDF，but the output of the 
ColumnarPartialProject doesn't need Columnar.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] ColumnarPartialProject may additional C2R/C2R and case OOM and poor performance [incubator-gluten]

Reply via email to