rluvaton commented on PR #1793:
URL: 
https://github.com/apache/datafusion-comet/pull/1793#issuecomment-2916913507

   I still think there is a bug here:
   
   For this test (when running on main):
   ```scala
   test("debug datafusion native filter") {
     val schema = StructType(
       Seq(
         StructField("row_idx", IntegerType, nullable = false),
         StructField("int", IntegerType, nullable = false)))
   
     val data = DataGenerator.DEFAULT.generateRows(1000, schema)
   
     withSQLConf(
       CometConf.COMET_EXPLAIN_VERBOSE_ENABLED.key -> "true",
       CometConf.COMET_EXPLAIN_NATIVE_ENABLED.key -> "true",
       CometConf.COMET_SPARK_TO_ARROW_SUPPORTED_OPERATOR_LIST.key -> "RDDScan") 
{
       val df = spark
         .createDataFrame(spark.sparkContext.parallelize(data, 1), schema)
         .where(col("row_idx") < 10000 || col("row_idx") > 10010)
   
       df.explain(true)
       df
         .show()
     }
   }
   ```
   
   The spark plan is:
   ```
   == Parsed Logical Plan ==
   'Filter (('row_idx < 10000) OR ('row_idx > 10010))
   +- LogicalRDD [row_idx#2, int#3], false
   
   == Analyzed Logical Plan ==
   row_idx: int, int: int
   Filter ((row_idx#2 < 10000) OR (row_idx#2 > 10010))
   +- LogicalRDD [row_idx#2, int#3], false
   
   == Optimized Logical Plan ==
   Filter ((row_idx#2 < 10000) OR (row_idx#2 > 10010))
   +- LogicalRDD [row_idx#2, int#3], false
   
   == Physical Plan ==
   *(2) CometColumnarToRow
   +- CometFilter [row_idx#2, int#3], ((row_idx#2 < 10000) OR (row_idx#2 > 
10010))
      +- CometSparkRowToColumnar
         +- *(1) Scan ExistingRDD[row_idx#2,int#3]
   
   ```
   
   and the datafusion plan is:
   ```
   25/05/28 19:17:14 INFO core/src/execution/jni_api.rs: Comet native query 
plan:
   FilterExec: col_0@0 < 10000 OR col_0@0 > 10010
     ScanExec: source=[CometSparkRowToColumnar (unknown)], schema=[col_0: 
Int32, col_1: Int32]
   ```
   It is using DataFusion Filter and not CometFilter while it should use comet 
filter as there is reuse, no?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to