Daniel Shields created SPARK-24949:
--------------------------------------

             Summary: pyspark.sql.Column breaks the iterable contract
                 Key: SPARK-24949
                 URL: https://issues.apache.org/jira/browse/SPARK-24949
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.3.1
            Reporter: Daniel Shields


pyspark.sql.Column implements __iter__ just to raise a TypeError:
{code:java}
def __iter__(self):
    raise TypeError("Column is not iterable")
{code}
This makes column look iterable even when it isn't:
{code:java}
isinstance(mycolumn, collections.Iterable) # Evaluates to True{code}
This function should be removed from Column completely so it behaves like every 
other non-iterable class.

For further motivation of why this should be fixed, consider the below example, 
which currently requires listing Column explicitly:
{code:java}
def listlike(value):
    # Column unfortunately implements __iter__ just to raise a TypeError.
    # This breaks the iterable contract and should be fixed in Spark proper.
    return isinstance(value, collections.Iterable) and not isinstance(value, 
(str, bytes, pyspark.sql.Column))
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to