Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r198551889 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -660,6 +661,56 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex assert(df.where("col > 0").count() === 2) } } + + test("filter pushdown - StringStartsWith") { + withParquetDataFrame((1 to 4).map(i => Tuple1(i + "str" + i))) { implicit df => --- End diff -- I think that all of these tests go through the `keep` method instead of the `canDrop` and `inverseCanDrop`. I think those methods need to be tested. You can do that by constructing a Parquet file with row groups that have predictable statistics, but that would be difficult. An easier way to do this is to define the predicate class elsewhere and create a unit test for it that passes in different statistics values.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org