Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/4918#issuecomment-77565205
  
    Hi @marmbrus , I studied how we handle ORDER BY and had a more complete fix.
    For simple example "SELECT a, b FROM t ORDER BY b", it will be parsed into
    
        Sort(
          Seq(SortOrder(Unresolved("b"))),
          Projection(Seq(Unresolved("a"), Unresolved("b")), Relation)
        )
    and there are 2 parts need to be resolved: "a" and "b" in `Projection`, "a" 
in `SortOrder`.
    Remember, we resolve those unresolved stuff in `ResolveReferences` rule by 
code
    
        case u @ UnresolvedAttribute(name) =>
          // Leave unchanged if resolution fails.  Hopefully will be resolved 
next round.
          val result = q.resolveChildren(name, resolver).getOrElse(u)
          logDebug(s"Resolving $u to $result")
          result
    So first we can resolve `Projection` successfully, and when we resolve 
`Sort`, its input is `Projection`'s output. And the `ORDER BY` referenced 
attribute "a" exists in the projection list, so we can also resolve `Sort` 
successfully.
    But for `SELECT a FROM t ORDER BY b`, the `ORDER BY` referenced attribute 
"a" doesn't exist in the projection list, and what we do is expanding the 
projection list by `ResolveSortReferences` rule. I think this rule's name is 
confusing, it doesn't resolve `Sort`(or `SortOrder`s), but only expanding 
projection list or grouping expressions, we still resolve `Sort`(actually 
`SortOrder`s in `Sort`) in `ResolveReferences` rule(we can resolve it 
successfully after expanding projection list).
    This works well until we have nested fields. For example "SELECT a.b FROM t 
ORDER BY b.a", what we should do is adding "b" to projection list, but actually 
we will get error when try to resolve `Sort` in `ResolveReferences` rule. "a.b" 
will have alias name "b", and we will mistakenly think we have `Attribute` "b" 
and try to resolve "b.a" on top of it.
    One possible fix is changing alias name of `GetField` chain like 
@chenghao-intel did in https://github.com/apache/spark/pull/4892, but it need 
more work to handle `GetItem` stuff. I think a better idea is override 
`resolveChildren` method in `Sort` and filter out the `GetField` stuff from 
`Projection`'s output. I have done it in 
https://github.com/apache/spark/pull/4904.
    So this PR is not complete, it do fixes some issues but not all of them, 
please review https://github.com/apache/spark/pull/4904 and discuss with me if 
you have any problems.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to