[GitHub] spark pull request #14083: [SPARK-16406][SQL] Improve performance of Logical...

hvanhovell Wed, 06 Jul 2016 21:17:49 -0700

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14083#discussion_r69849376
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
    @@ -165,111 +169,99 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
       def resolveQuoted(
           name: String,
           resolver: Resolver): Option[NamedExpression] = {
    -    resolve(UnresolvedAttribute.parseAttributeName(name), output, resolver)
    +    
outputAttributeResolver.resolve(UnresolvedAttribute.parseAttributeName(name), 
resolver)
       }
     
       /**
    -   * Resolve the given `name` string against the given attribute, 
returning either 0 or 1 match.
    -   *
    -   * This assumes `name` has multiple parts, where the 1st part is a 
qualifier
    -   * (i.e. table name, alias, or subquery alias).
    -   * See the comment above `candidates` variable in resolve() for 
semantics the returned data.
    +   * Refreshes (or invalidates) any metadata/data cached in the plan 
recursively.
        */
    -  private def resolveAsTableColumn(
    -      nameParts: Seq[String],
    -      resolver: Resolver,
    -      attribute: Attribute): Option[(Attribute, List[String])] = {
    -    assert(nameParts.length > 1)
    -    if (attribute.qualifier.exists(resolver(_, nameParts.head))) {
    -      // At least one qualifier matches. See if remaining parts match.
    -      val remainingParts = nameParts.tail
    -      resolveAsColumn(remainingParts, resolver, attribute)
    -    } else {
    -      None
    -    }
    +  def refresh(): Unit = children.foreach(_.refresh())
    +}
    +
    +/**
    + * Helper class for (LogicalPlan) attribute resolution. This class indexes 
attributes by their
    + * case-in-sensitive name, and checks potential candidates using the given 
Resolver. Both qualified
    + * and direct resolution are supported.
    + */
    +private[catalyst] class AttributeResolver(attributes: Seq[Attribute]) 
extends Logging {
    +  private def unique[T](m: Map[T, Seq[Attribute]]): Map[T, Seq[Attribute]] 
= {
    +    m.mapValues(_.distinct).map(identity)
       }
     
    -  /**
    -   * Resolve the given `name` string against the given attribute, 
returning either 0 or 1 match.
    -   *
    -   * Different from resolveAsTableColumn, this assumes `name` does NOT 
start with a qualifier.
    -   * See the comment above `candidates` variable in resolve() for 
semantics the returned data.
    -   */
    -  private def resolveAsColumn(
    -      nameParts: Seq[String],
    -      resolver: Resolver,
    -      attribute: Attribute): Option[(Attribute, List[String])] = {
    -    if (!attribute.isGenerated && resolver(attribute.name, 
nameParts.head)) {
    -      Option((attribute.withName(nameParts.head), nameParts.tail.toList))
    -    } else {
    -      None
    +  /** Map to use for direct case insensitive attribute lookups. */
    +  private val direct: Map[String, Seq[Attribute]] = {
    --- End diff --
    
    You have a point: it is the secondary code path, so it is less likely to be 
used. I'll take a look at it on my next pass.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14083: [SPARK-16406][SQL] Improve performance of Logical...

Reply via email to