weixiuli opened a new issue, #9025:
URL: https://github.com/apache/incubator-gluten/issues/9025
### Description
When the follower of UDF project doesn't support columnar, and it may
additional C2R and case OOM and poor performance. An example when we set
`spark.gluten.sql.native.writer.enabled=true` (only for testing the follower of
UDF project doesn't support columnar )and run the SQL:
```
insert overwrite t1
select (plus_one(l_extendedprice) * l_discount
+ hash(l_orderkey) + hash(l_comment)) as revenue
from lineitem
```
```
+- Execute InsertIntoHadoopFsRelationCommand
file:/root//git-gluten/spark-warehouse/org.apache.gluten.expression.UDFPartialProjectSuiteRasOn/t1,
false, Parquet,
[path=file:/root//git-gluten/spark-warehouse/org.apache.gluten.expression.UDFPartialProjectSuiteRasOn/t1],
Overwrite, `spark_catalog`.`default`.`t1`,
org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:/root//git-gluten/spark-warehouse/org.apache.gluten.expression.UDFPartialProjectSuiteRasOn/t1),
[revenue]
+- WriteFiles
+- VeloxColumnarToRow
+- ^(2) ProjectExecTransformer
[cast((((_SparkPartialProject0#64 * l_discount#6) + cast(hash(l_orderkey#0L,
42) as decimal(10,0))) + cast(hash(l_comment#15, 42) as decimal(10,0))) as
double) AS revenue#53]
+- ^(2) InputIteratorTransformer[l_orderkey#0L,
l_extendedprice#5, l_discount#6, l_comment#15, _SparkPartialProject0#64]
+- ColumnarPartialProject Project [cast((((if
(isnull(cast(l_extendedprice#5 as bigint))) cast(null as decimal(20,0)) else
cast(plus_one(knownnotnull(cast(l_extendedprice#5 as bigint))) as
decimal(20,0)) * l_discount#6) + cast(hash(l_orderkey#0L, 42) as
decimal(10,0))) + cast(hash(l_comment#15, 42) as decimal(10,0))) as double) AS
revenue#53] PartialProject List(if (isnull(cast(l_extendedprice#5 as bigint)))
cast(null as decimal(20,0)) else
cast(plus_one(knownnotnull(cast(l_extendedprice#5 as bigint))) as
decimal(20,0)) AS _SparkPartialProject0#64)
+- ^(1) BatchScanTransformer parquet
file:/root//git-gluten/backends-velox/target/scala-2.12/test-classes/tpch-data-parquet/lineitem[l_orderkey#0L,
l_extendedprice#5, l_discount#6, l_comment#15] ParquetScan DataFilters: [],
Format: parquet, Location: InMemoryFileIndex(1
paths)[file:/root//git-gluten/backends-velox/target/scala-2.12/t...,
PartitionFilters: [], PushedAggregation: [], PushedFilters: [], PushedGroupBy:
[], ReadSchema:
struct<l_orderkey:bigint,l_extendedprice:decimal(12,2),l_discount:decimal(12,2),l_comment:string>
RuntimeFilters: [] NativeFilters: []
```
With this plan we know that the ColumnarPartialProject has to add the C2R
and R2C for the unsupported expression in the UDF,but the output of the
ColumnarPartialProject doesn't need Columnar.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]