Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21623#discussion_r198551889
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
    @@ -660,6 +661,56 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
           assert(df.where("col > 0").count() === 2)
         }
       }
    +
    +  test("filter pushdown - StringStartsWith") {
    +    withParquetDataFrame((1 to 4).map(i => Tuple1(i + "str" + i))) { 
implicit df =>
    --- End diff --
    
    I think that all of these tests go through the `keep` method instead of the 
`canDrop` and `inverseCanDrop`. I think those methods need to be tested. You 
can do that by constructing a Parquet file with row groups that have 
predictable statistics, but that would be difficult. An easier way to do this 
is to define the predicate class elsewhere and create a unit test for it that 
passes in different statistics values.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to