Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9548#issuecomment-155226523 @marmbrus Thank you for your suggestions! That is also like my initial idea. I did a try last night. Unfortunately, I hit a problem when adding such a field to `Column` API. In the current design, the class `Column` corresponds to the class `Expression`, which includes both `AttributeReference` and the other types. For `Column`, it makes sense to have such a dataFrame identifier. However, when `Column` is generated from the binary expression types (e.g., `gt`), it could have more than one dataFrame identifiers. Does that sound good to you? When implementing the idea, it becomes more difficult. For example, in the following binary operators, ```scala def === (other: Any): Column = { val right = lit(other).expr EqualTo(expr, right) } ``` `EqualTo` is an `Expression`. `expr` and `right` are not `Column`s. Thus, when accessing the `Column` generated from `===`, we are unable to know the dataframe sources of `expr` and `right` if we do not change `AttributeReference`. That is why I am thinking this could mean a major code change to `DataFrame` and `Column`. Thank you for any further suggestion.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org