Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/8922#issuecomment-146215917
  
    This implementation only considers the use case to evaluate a single 
attribute with an UDF and compare the result with a literal value. We only 
consider this because in the current implementation of `selectFilters` in 
`DataSourceStrategy`, only the predicates involving an attribute and a literal 
value (i.e., `col = 1`, `col2 > 2`, etc.) are selected to be the candidates for 
pushing down. Besides, the form like `udf(column) = 'ABCDE...'` is mostly 
common and widely used in our SQL queries involving UDFs in filtering condition.
    
    Your proposal looks good and very general. However, I am little worrying 
the performance regressions brought by creating a row for each input value and 
evaluating on the row.
    
    This patch helps us reduce the memory footprint required for loading lot of 
data from Parquet files. For performance, the improvement is not significant 
but competes with the case of not pushing down at least.
    
    I agree that this patch introduces additional complexity to the API. If you 
still think it is not worth, I will close this PR first.
    
    Thanks for reviewing and suggestion.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to