[spark] branch master updated: [SPARK-23299][SQL][PYSPARK] Fix repr behaviour for Rows

holden Mon, 06 May 2019 10:03:06 -0700

This is an automated email from the ASF dual-hosted git repository.

holden pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new eec1a3c  [SPARK-23299][SQL][PYSPARK] Fix __repr__ behaviour for Rows
eec1a3c is described below

commit eec1a3c2862955bf2620d4cc5116fbd86e29952e
Author: Tibor Csögör <t...@tiborius.net>
AuthorDate: Mon May 6 10:00:49 2019 -0700

    [SPARK-23299][SQL][PYSPARK] Fix __repr__ behaviour for Rows
    
    This is PR is meant to replace #20503, which lay dormant for a while.  The 
solution in the original PR is still valid, so this is just that patch rebased 
onto the current master.
    
    Original summary follows.
    
    ## What changes were proposed in this pull request?
    
    Fix `__repr__` behaviour for Rows.
    
    Rows `__repr__` assumes data is a string when column name is missing.
    Examples,
    
    ```
    >>> from pyspark.sql.types import Row
    >>> Row ("Alice", "11")
    <Row(Alice, 11)>
    
    >>> Row (name="Alice", age=11)
    Row(age=11, name='Alice')
    
    >>> Row ("Alice", 11)
    <snip stack trace>
    TypeError: sequence item 1: expected string, int found
    ```
    
    This is because Row () when called without column names assumes everything 
is a string.
    
    ## How was this patch tested?
    
    Manually tested and a unit test was added to 
`python/pyspark/sql/tests/test_types.py`.
    
    Closes #24448 from tbcs/SPARK-23299.
    
    Lead-authored-by: Tibor Csögör <t...@tiborius.net>
    Co-authored-by: Shashwat Anand <m...@shashwat.me>
    Signed-off-by: Holden Karau <hol...@pigscanfly.ca>
---
 python/pyspark/sql/tests/test_types.py | 12 ++++++++++++
 python/pyspark/sql/types.py            | 15 +++++++++++++--
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/tests/test_types.py 
b/python/pyspark/sql/tests/test_types.py
index 3afb88c..bb96828 100644
--- a/python/pyspark/sql/tests/test_types.py
+++ b/python/pyspark/sql/tests/test_types.py
@@ -1,3 +1,4 @@
+# -*- encoding: utf-8 -*-
 #
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
@@ -739,6 +740,17 @@ class DataTypeTests(unittest.TestCase):
         tst = TimestampType()
         self.assertEqual(tst.toInternal(datetime.datetime.max) % 1000000, 
999999)
 
+    # regression test for SPARK-23299
+    def test_row_without_column_name(self):
+        row = Row("Alice", 11)
+        self.assertEqual(repr(row), "<Row('Alice', 11)>")
+
+        # test __repr__ with unicode values
+        if sys.version_info.major >= 3:
+            self.assertEqual(repr(Row("数", "量")), "<Row('数', '量')>")
+        else:
+            self.assertEqual(repr(Row(u"数", u"量")), r"<Row(u'\u6570', 
u'\u91cf')>")
+
     def test_empty_row(self):
         row = Row()
         self.assertEqual(len(row), 0)
diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py
index 72c437a..f9b12f1 100644
--- a/python/pyspark/sql/types.py
+++ b/python/pyspark/sql/types.py
@@ -1435,13 +1435,24 @@ class Row(tuple):
 
     >>> Person = Row("name", "age")
     >>> Person
-    <Row(name, age)>
+    <Row('name', 'age')>
     >>> 'name' in Person
     True
     >>> 'wrong_key' in Person
     False
     >>> Person("Alice", 11)
     Row(name='Alice', age=11)
+
+    This form can also be used to create rows as tuple values, i.e. with 
unnamed
+    fields. Beware that such Row objects have different equality semantics:
+
+    >>> row1 = Row("Alice", 11)
+    >>> row2 = Row(name="Alice", age=11)
+    >>> row1 == row2
+    False
+    >>> row3 = Row(a="Alice", b=11)
+    >>> row1 == row3
+    True
     """
 
     def __new__(self, *args, **kwargs):
@@ -1549,7 +1560,7 @@ class Row(tuple):
             return "Row(%s)" % ", ".join("%s=%r" % (k, v)
                                          for k, v in zip(self.__fields__, 
tuple(self)))
         else:
-            return "<Row(%s)>" % ", ".join(self)
+            return "<Row(%s)>" % ", ".join("%r" % field for field in self)
 
 
 class DateConverter(object):


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-23299][SQL][PYSPARK] Fix __repr__ behaviour for Rows

Reply via email to

[spark] branch master updated: [SPARK-23299][SQL][PYSPARK] Fix repr behaviour for Rows