Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/9548#issuecomment-155226523
  
    @marmbrus Thank you for your suggestions! 
    
    That is also like my initial idea. I did a try last night. Unfortunately, I 
hit a problem when adding such a field to `Column` API. In the current design, 
the class `Column` corresponds to the class `Expression`, which includes both 
`AttributeReference` and the other types. For `Column`, it makes sense to have 
such a dataFrame identifier. However, when `Column` is generated from the 
binary expression types (e.g., `gt`), it could have more than one dataFrame 
identifiers. Does that sound good to you? 
    
    When implementing the idea, it becomes more difficult. For example, in the 
following binary operators,
    
    ```scala
      def === (other: Any): Column = {
        val right = lit(other).expr
        EqualTo(expr, right)
      }
    ```
    
    `EqualTo` is an `Expression`. `expr` and `right` are not `Column`s. Thus, 
when accessing the `Column` generated from `===`, we are unable to know the 
dataframe sources of `expr` and `right` if we do not change 
`AttributeReference`.  
    
    That is why I am thinking this could mean a major code change to 
`DataFrame` and `Column`. Thank you for any further suggestion. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to