rluvaton commented on PR #1793: URL: https://github.com/apache/datafusion-comet/pull/1793#issuecomment-2916913507
I still think there is a bug here: For this test (when running on main): ```scala test("debug datafusion native filter") { val schema = StructType( Seq( StructField("row_idx", IntegerType, nullable = false), StructField("int", IntegerType, nullable = false))) val data = DataGenerator.DEFAULT.generateRows(1000, schema) withSQLConf( CometConf.COMET_EXPLAIN_VERBOSE_ENABLED.key -> "true", CometConf.COMET_EXPLAIN_NATIVE_ENABLED.key -> "true", CometConf.COMET_SPARK_TO_ARROW_SUPPORTED_OPERATOR_LIST.key -> "RDDScan") { val df = spark .createDataFrame(spark.sparkContext.parallelize(data, 1), schema) .where(col("row_idx") < 10000 || col("row_idx") > 10010) df.explain(true) df .show() } } ``` The spark plan is: ``` == Parsed Logical Plan == 'Filter (('row_idx < 10000) OR ('row_idx > 10010)) +- LogicalRDD [row_idx#2, int#3], false == Analyzed Logical Plan == row_idx: int, int: int Filter ((row_idx#2 < 10000) OR (row_idx#2 > 10010)) +- LogicalRDD [row_idx#2, int#3], false == Optimized Logical Plan == Filter ((row_idx#2 < 10000) OR (row_idx#2 > 10010)) +- LogicalRDD [row_idx#2, int#3], false == Physical Plan == *(2) CometColumnarToRow +- CometFilter [row_idx#2, int#3], ((row_idx#2 < 10000) OR (row_idx#2 > 10010)) +- CometSparkRowToColumnar +- *(1) Scan ExistingRDD[row_idx#2,int#3] ``` and the datafusion plan is: ``` 25/05/28 19:17:14 INFO core/src/execution/jni_api.rs: Comet native query plan: FilterExec: col_0@0 < 10000 OR col_0@0 > 10010 ScanExec: source=[CometSparkRowToColumnar (unknown)], schema=[col_0: Int32, col_1: Int32] ``` It is using DataFusion Filter and not CometFilter while it should use comet filter as there is reuse, no? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org