[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

mallman Thu, 24 May 2018 00:10:02 -0700

Github user mallman commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21320#discussion_r190484386
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala
 ---
    @@ -99,27 +100,28 @@ trait ConstraintHelper {
       }
     
       /**
    -   * Infer the Attribute-specific IsNotNull constraints from the null 
intolerant child expressions
    -   * of constraints.
    +   * Infer the Attribute and ExtractValue-specific IsNotNull constraints 
from the null intolerant
    +   * child expressions of constraints.
        */
       private def inferIsNotNullConstraints(constraint: Expression): 
Seq[Expression] =
         constraint match {
           // When the root is IsNotNull, we can push IsNotNull through the 
child null intolerant
           // expressions
    -      case IsNotNull(expr) => 
scanNullIntolerantAttribute(expr).map(IsNotNull(_))
    +      case IsNotNull(expr) => 
scanNullIntolerantField(expr).map(IsNotNull(_))
           // Constraints always return true for all the inputs. That means, 
null will never be returned.
           // Thus, we can infer `IsNotNull(constraint)`, and also push 
IsNotNull through the child
           // null intolerant expressions.
    -      case _ => scanNullIntolerantAttribute(constraint).map(IsNotNull(_))
    +      case _ => scanNullIntolerantField(constraint).map(IsNotNull(_))
         }
     
       /**
    -   * Recursively explores the expressions which are null intolerant and 
returns all attributes
    -   * in these expressions.
    +   * Recursively explores the expressions which are null intolerant and 
returns all attributes and
    +   * complex type extractors in these expressions.
        */
    -  private def scanNullIntolerantAttribute(expr: Expression): 
Seq[Attribute] = expr match {
    +  private def scanNullIntolerantField(expr: Expression): Seq[Expression] = 
expr match {
    +    case ev: ExtractValue => Seq(ev)
    --- End diff --
    
    I agree adding more direct and independent test coverage for this change is 
a good idea. However, omitting this change will weaken the capabilities of this 
PR. It would also imply the removal of the failing test case in 
`ParquetSchemaPruningSuite`, which would imply two follow on PRs. The first 
would be to add this specific change plus the right test coverage. The next 
would be to restore the test case removed from 'ParquetSchemaPruningSuite'.
    
    Let me suggest an alternative. As this change is a valuable enhancement for 
this PR, let me try adding an appropriate test case in 
`InferFiltersFromConstraintsSuite` as part of this PR. That will eliminate the 
requirement for two more follow-on PRs.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

Reply via email to