Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16954#discussion_r105766906
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
    @@ -365,17 +368,73 @@ object TypeCoercion {
       }
     
       /**
    -   * Convert the value and in list expressions to the common operator type
    -   * by looking at all the argument types and finding the closest one that
    -   * all the arguments can be cast to. When no common operator type is 
found
    -   * the original expression will be returned and an Analysis Exception 
will
    -   * be raised at type checking phase.
    +   * Handles type coercion for both IN expression with subquery and IN
    +   * expressions without subquery.
    +   * 1. In the first case, find the common type by comparing the left hand 
side (LHS)
    +   *    expression types against corresponding right hand side (RHS) 
expression derived
    +   *    from the subquery expression's plan output. Inject appropriate 
casts in the
    +   *    LHS and RHS side of IN expression.
    +   *
    +   * 2. In the second case, convert the value and in list expressions to 
the
    +   *    common operator type by looking at all the argument types and 
finding
    +   *    the closest one that all the arguments can be cast to. When no 
common
    +   *    operator type is found the original expression will be returned 
and an
    +   *    Analysis Exception will be raised at the type checking phase.
        */
       object InConversion extends Rule[LogicalPlan] {
         def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
           // Skip nodes who's children have not been resolved yet.
           case e if !e.childrenResolved => e
     
    +      // Handle type casting required between value expression and 
subquery output
    +      // in IN subquery.
    +      case i @ In(a, Seq(ListQuery(sub, children, exprId))) if !i.resolved 
=>
    +        // LHS is the value expression of IN subquery.
    +        val lhs = a match {
    +          // Multi columns in IN clause is represented as a 
CreateNamedStruct.
    +          // flatten the named struct to get the list of expressions.
    +          case cns: CreateNamedStruct => cns.valExprs
    +          case expr => Seq(expr)
    +        }
    +
    +        // RHS is the subquery output.
    +        val rhs = sub.output
    +        require(lhs.length == rhs.length)
    +
    +        val commonTypes = lhs.zip(rhs).flatMap { case (l, r) =>
    +          findCommonTypeForBinaryComparison(l.dataType, r.dataType) match {
    +            case d @ Some(_) => d
    +            case _ => findTightestCommonType(l.dataType, r.dataType)
    +          }
    +        }
    +
    +        // The number of columns/expressions must match between LHS and 
RHS of an
    +        // IN subquery expression.
    +        if (commonTypes.length == lhs.length) {
    +          val castedRhs = rhs.zip(commonTypes).map {
    +            case (e, dt) if e.dataType != dt => Alias(Cast(e, dt), 
e.name)()
    +            case (e, _) => e
    +          }
    +          val castedLhs = lhs.zip(commonTypes).map {
    +            case (e, dt) if e.dataType != dt => Cast(e, dt)
    +            case (e, _) => e
    +          }
    +
    +          // Before constructing the In expression, wrap the multi values 
in LHS
    +          // in a CreatedNamedStruct.
    +          val newLhs = a match {
    --- End diff --
    
    My bad, I never compile these snippets. You have a point there. We could 
just use CreateStruct (since we really don't care about the name). So that 
would look something like this (again not compiled):
    ```scala
    val newLhs = castedLhs match {
      case Seq(lhs) => lhs
      case _ => CreateStruct(castedLhs)
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to