Oli Hall created SPARK-23299:
--------------------------------

             Summary: __repr__ broken for Rows instantiated with *args
                 Key: SPARK-23299
                 URL: https://issues.apache.org/jira/browse/SPARK-23299
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.2.0, 1.5.0
         Environment: Tested on OS X with Spark 1.5.0 as well as pip-installed 
`pyspark` 2.2.0. Code in question appears to still be in error on the master 
branch of the GitHub repository.
            Reporter: Oli Hall


PySpark Rows throw an exception if instantiated without column names when 
`__repr__` is called. The most minimal reproducible example I've found is this:
{code:java}
> from pyspark.sql.types import Row
> Row(123)
<stack-trace snipped for brevity>
<v-env location>/lib/python2.7/site-packages/pyspark/sql/types.pyc in 
__repr__(self)
-> 1524             return "<Row(%s)>" % ", ".join(self)

TypeError: sequence item 0: expected string, int found{code}
This appears to be due to the implementation of `__repr__`, which works 
excellently for Rows created with column names, but for those without, assumes 
all values are strings ([link 
here|https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L1584]).

This should be an easy fix, if the values are mapped to `str` first, all should 
be well (last line is the only modification):
{code:java}
def __repr__(self):
    """Printable representation of Row used in Python REPL."""
    if hasattr(self, "__fields__"):
        return "Row(%s)" % ", ".join("%s=%r" % (k, v)
                                     for k, v in zip(self.__fields__, 
tuple(self)))
    else:
        "<Row(%s)>" % ", ".join(map(str, self))
{code}
This will yield the following:
{code:java}
> from pyspark.sql.types import Row
> Row('aaa', 123)
<Row(aaaa, 123)>
{code}
  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to