Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22313#discussion_r214775155 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala --- @@ -71,12 +71,24 @@ private[orc] object OrcFilters { for { // Combines all convertible filters using `And` to produce a single conjunction - conjunction <- convertibleFilters.reduceOption(org.apache.spark.sql.sources.And) + conjunction <- buildTree(convertibleFilters) --- End diff -- For the first question, I don't think Parquet has the same issue because Parquet uses `canMakeFilterOn` while ORC is trying to build a full result (with a fresh builder) to check if it's okay or not. For the second question, in ORC, we already did the first half(`flatMap`) to compute `convertibleFilters`, but it can change it with `filters.filter`. ```scala val convertibleFilters = for { filter <- filters _ <- buildSearchArgument(dataTypeMap, filter, SearchArgumentFactory.newBuilder()) } yield filter ``` 2. And, the second half `reduceOption(FilterApi.and)` was the original ORC code which generated a skewed tree having exponential time complexity. We need to use `buildTree`.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org