Github user scwf commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6194#discussion_r30466733
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala ---
    @@ -0,0 +1,144 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive.orc
    +
    +import org.apache.hadoop.hive.common.`type`.{HiveChar, HiveDecimal, 
HiveVarchar}
    +import org.apache.hadoop.hive.ql.io.sarg.SearchArgument
    +import org.apache.hadoop.hive.ql.io.sarg.SearchArgument.Builder
    +import org.apache.hadoop.hive.serde2.io.DateWritable
    +
    +import org.apache.spark.Logging
    +import org.apache.spark.sql.sources._
    +
    +/**
    + * It may be optimized by push down partial filters. But we are 
conservative here.
    + * Because if some filters fail to be parsed, the tree may be corrupted,
    + * and cannot be used anymore.
    + */
    +private[orc] object OrcFilters extends Logging {
    +  def createFilter(expr: Array[Filter]): Option[SearchArgument] = {
    +    expr.reduceOption(And).flatMap { conjunction =>
    +      val builder = SearchArgument.FACTORY.newBuilder()
    +      buildSearchArgument(conjunction, builder).map(_.build())
    +    }
    +  }
    +
    +  private def buildSearchArgument(expression: Filter, builder: Builder): 
Option[Builder] = {
    +    def newBuilder = SearchArgument.FACTORY.newBuilder()
    +
    +    def isSearchableLiteral(value: Any) = value match {
    +      // These are types recognized by the 
`SearchArgumentImpl.BuilderImpl.boxLiteral()` method.
    +      case _: String | _: Long | _: Double | _: DateWritable | _: 
HiveDecimal | _: HiveChar |
    +           _: HiveVarchar | _: Byte | _: Short | _: Integer | _: Float => 
true
    +      case _ => false
    +    }
    +
    +    // lian: I probably missed something here, and had to end up with a 
pretty weird double-checking
    +    // pattern when converting `And`/`Or`/`Not` filters.
    +    //
    +    // The annoying part is that, `SearchArgument` builder methods like 
`startAnd()` `startOr()`,
    +    // and `startNot()` mutate internal state of the builder instance.  
This forces us to translate
    +    // all convertible filters with a single builder instance. However, 
before actually converting a
    +    // filter, we've no idea whether it can be recognized by ORC or not. 
Thus, when an inconvertible
    +    // filter is found, we may already end up with a builder whose 
internal state is inconsistent.
    +    //
    +    // For example, to convert an `And` filter with builder `b`, we call 
`b.startAnd()` first, and
    +    // then try to convert its children.  Say we convert `left` child 
successfully, but find that
    +    // `right` child is inconvertible.  Alas, `b.startAnd()` call can't be 
rolled back, and `b` is
    +    // inconsistent now.
    +    //
    +    // The workaround employed here is that, for `And`/`Or`/`Not`, we 
first try to convert their
    +    // children with brand new builders, and only do the actual conversion 
with the right builder
    +    // instance when the children are proven to be convertible.
    +    //
    +    // P.S.: Hive seems to use `SearchArgument` together with 
`ExprNodeGenericFuncDesc` only.
    --- End diff --
    
    get it, thanks for the explanation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to