Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/4918#issuecomment-77565205 Hi @marmbrus , I studied how we handle ORDER BY and had a more complete fix. For simple example "SELECT a, b FROM t ORDER BY b", it will be parsed into Sort( Seq(SortOrder(Unresolved("b"))), Projection(Seq(Unresolved("a"), Unresolved("b")), Relation) ) and there are 2 parts need to be resolved: "a" and "b" in `Projection`, "a" in `SortOrder`. Remember, we resolve those unresolved stuff in `ResolveReferences` rule by code case u @ UnresolvedAttribute(name) => // Leave unchanged if resolution fails. Hopefully will be resolved next round. val result = q.resolveChildren(name, resolver).getOrElse(u) logDebug(s"Resolving $u to $result") result So first we can resolve `Projection` successfully, and when we resolve `Sort`, its input is `Projection`'s output. And the `ORDER BY` referenced attribute "a" exists in the projection list, so we can also resolve `Sort` successfully. But for `SELECT a FROM t ORDER BY b`, the `ORDER BY` referenced attribute "a" doesn't exist in the projection list, and what we do is expanding the projection list by `ResolveSortReferences` rule. I think this rule's name is confusing, it doesn't resolve `Sort`(or `SortOrder`s), but only expanding projection list or grouping expressions, we still resolve `Sort`(actually `SortOrder`s in `Sort`) in `ResolveReferences` rule(we can resolve it successfully after expanding projection list). This works well until we have nested fields. For example "SELECT a.b FROM t ORDER BY b.a", what we should do is adding "b" to projection list, but actually we will get error when try to resolve `Sort` in `ResolveReferences` rule. "a.b" will have alias name "b", and we will mistakenly think we have `Attribute` "b" and try to resolve "b.a" on top of it. One possible fix is changing alias name of `GetField` chain like @chenghao-intel did in https://github.com/apache/spark/pull/4892, but it need more work to handle `GetItem` stuff. I think a better idea is override `resolveChildren` method in `Sort` and filter out the `GetField` stuff from `Projection`'s output. I have done it in https://github.com/apache/spark/pull/4904. So this PR is not complete, it do fixes some issues but not all of them, please review https://github.com/apache/spark/pull/4904 and discuss with me if you have any problems.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org