Github user attilapiros commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21623#discussion_r198241792
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
    @@ -270,6 +277,29 @@ private[parquet] class ParquetFilters(pushDownDate: 
Boolean) {
           case sources.Not(pred) =>
             createFilter(schema, pred).map(FilterApi.not)
     
    +      case sources.StringStartsWith(name, prefix) if pushDownStartWith && 
canMakeFilterOn(name) =>
    +        Option(prefix).map { v =>
    +          FilterApi.userDefined(binaryColumn(name),
    +            new UserDefinedPredicate[Binary] with Serializable {
    +              private val strToBinary = 
Binary.fromReusedByteArray(v.getBytes)
    +              private val size = strToBinary.length
    +
    +              override def canDrop(statistics: Statistics[Binary]): 
Boolean = {
    +                val comparator = 
PrimitiveComparator.UNSIGNED_LEXICOGRAPHICAL_BINARY_COMPARATOR
    +                val max = statistics.getMax
    +                val min = statistics.getMin
    +                comparator.compare(max.slice(0, math.min(size, 
max.length)), strToBinary) < 0 ||
    +                  comparator.compare(min.slice(0, math.min(size, 
min.length)), strToBinary) > 0
    +              }
    +
    +              override def inverseCanDrop(statistics: Statistics[Binary]): 
Boolean = false
    --- End diff --
    
    No. 
    
    Let me illustrate this with an example: let's assume min="BBB", max="DDD" 
canDrop() means if your prefix is before "BBB" (like "A") we can stop as there 
is no reason to search within this range. This is also true for prefixes after 
"DDD" (like "E").
    
    Now if your operator is negated. What can you say when your prefix is "C" 
and the range is "BBB" and "DDD"? Can you drop it? No. And if the prefix is "A" 
or "E". Still not. You see you should check the range.  


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to