Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19651#discussion_r154408356
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala
 ---
    @@ -0,0 +1,210 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources.orc
    +
    +import org.apache.orc.storage.ql.io.sarg.{PredicateLeaf, SearchArgument, 
SearchArgumentFactory}
    +import org.apache.orc.storage.ql.io.sarg.SearchArgument.Builder
    +import org.apache.orc.storage.serde2.io.HiveDecimalWritable
    +
    +import org.apache.spark.sql.sources.Filter
    +import org.apache.spark.sql.types._
    +
    +/**
    + * Helper object for building ORC `SearchArgument`s, which are used for 
ORC predicate push-down.
    + *
    + * Due to limitation of ORC `SearchArgument` builder, we had to end up 
with a pretty weird double-
    + * checking pattern when converting `And`/`Or`/`Not` filters.
    + *
    + * An ORC `SearchArgument` must be built in one pass using a single 
builder.  For example, you can't
    + * build `a = 1` and `b = 2` first, and then combine them into `a = 1 AND 
b = 2`.  This is quite
    + * different from the cases in Spark SQL or Parquet, where complex filters 
can be easily built using
    + * existing simpler ones.
    + *
    + * The annoying part is that, `SearchArgument` builder methods like 
`startAnd()`, `startOr()`, and
    + * `startNot()` mutate internal state of the builder instance.  This 
forces us to translate all
    + * convertible filters with a single builder instance. However, before 
actually converting a filter,
    + * we've no idea whether it can be recognized by ORC or not. Thus, when an 
inconvertible filter is
    + * found, we may already end up with a builder whose internal state is 
inconsistent.
    + *
    + * For example, to convert an `And` filter with builder `b`, we call 
`b.startAnd()` first, and then
    + * try to convert its children.  Say we convert `left` child successfully, 
but find that `right`
    + * child is inconvertible.  Alas, `b.startAnd()` call can't be rolled 
back, and `b` is inconsistent
    + * now.
    + *
    + * The workaround employed here is that, for `And`/`Or`/`Not`, we first 
try to convert their
    + * children with brand new builders, and only do the actual conversion 
with the right builder
    + * instance when the children are proven to be convertible.
    + *
    + * P.S.: Hive seems to use `SearchArgument` together with 
`ExprNodeGenericFuncDesc` only.  Usage of
    + * builder methods mentioned above can only be found in test code, where 
all tested filters are
    + * known to be convertible.
    + */
    +private[orc] object OrcFilters {
    --- End diff --
    
    Yes. It's logically the same with old version. Only API usage is updated 
here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to