[spark] branch master updated: [SPARK-35408][PYTHON] Improve parameter validation in DataFrame.show

gurwls223 Mon, 17 May 2021 00:23:13 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 9eb45ec  [SPARK-35408][PYTHON] Improve parameter validation in 
DataFrame.show
9eb45ec is described below

commit 9eb45ecb4f39f372e20529da468f304c4ec7c175
Author: Gera Shegalov <g...@apache.org>
AuthorDate: Mon May 17 16:22:46 2021 +0900

    [SPARK-35408][PYTHON] Improve parameter validation in DataFrame.show
    
    ### What changes were proposed in this pull request?
    Provide clearer error message tied to the user's Python code if incorrect 
parameters are passed to `DataFrame.show` rather than the message about a 
missing JVM method the user is not calling directly.
    
    ```
    py4j.Py4JException: Method showString([class java.lang.Boolean, class 
java.lang.Integer, class java.lang.Boolean]) does not exist
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
        at py4j.Gateway.invoke(Gateway.java:274)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748
    ```
    
    ### Why are the changes needed?
    For faster debugging through actionable error message.
    
    ### Does this PR introduce _any_ user-facing change?
    No change for the correct parameters but different error messages for the 
parameters triggering an exception.
    
    ### How was this patch tested?
    - unit test
    - manually in PySpark REPL
    
    Closes #32555 from gerashegalov/df_show_validation.
    
    Authored-by: Gera Shegalov <g...@apache.org>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 python/pyspark/sql/dataframe.py            | 16 ++++++++++++++--
 python/pyspark/sql/tests/test_dataframe.py | 18 ++++++++++++++++++
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 8fe263e..22cc7a4 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -448,7 +448,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
         ----------
         n : int, optional
             Number of rows to show.
-        truncate : bool, optional
+        truncate : bool or int, optional
             If set to ``True``, truncate strings longer than 20 chars by 
default.
             If set to a number greater than one, truncates long strings to 
length ``truncate``
             and align cells right.
@@ -482,10 +482,22 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
          age  | 5
          name | Bob
         """
+
+        if not isinstance(n, int) or isinstance(n, bool):
+            raise TypeError("Parameter 'n' (number of rows) must be an int")
+
+        if not isinstance(vertical, bool):
+            raise TypeError("Parameter 'vertical' must be a bool")
+
         if isinstance(truncate, bool) and truncate:
             print(self._jdf.showString(n, 20, vertical))
         else:
-            print(self._jdf.showString(n, int(truncate), vertical))
+            try:
+                int_truncate = int(truncate)
+            except ValueError:
+                raise TypeError(f"Parameter 'truncate={truncate}' should be 
either bool or int.")
+
+            print(self._jdf.showString(n, int_truncate, vertical))
 
     def __repr__(self):
         if not self._support_repr_html and 
self.sql_ctx._conf.isReplEagerEvalEnabled():
diff --git a/python/pyspark/sql/tests/test_dataframe.py 
b/python/pyspark/sql/tests/test_dataframe.py
index 3e961cb..74895c0 100644
--- a/python/pyspark/sql/tests/test_dataframe.py
+++ b/python/pyspark/sql/tests/test_dataframe.py
@@ -837,6 +837,24 @@ class DataFrameTests(ReusedSQLTestCase):
         finally:
             shutil.rmtree(tpath)
 
+    def test_df_show(self):
+        # SPARK-35408: ensure better diagnostics if incorrect parameters are 
passed
+        # to DataFrame.show
+
+        df = self.spark.createDataFrame([('foo',)])
+        df.show(5)
+        df.show(5, True)
+        df.show(5, 1, True)
+        df.show(n=5, truncate='1', vertical=False)
+        df.show(n=5, truncate=1.5, vertical=False)
+
+        with self.assertRaisesRegex(TypeError, "Parameter 'n'"):
+            df.show(True)
+        with self.assertRaisesRegex(TypeError, "Parameter 'vertical'"):
+            df.show(vertical='foo')
+        with self.assertRaisesRegex(TypeError, "Parameter 'truncate=foo'"):
+            df.show(truncate='foo')
+
 
 class QueryExecutionListenerTests(unittest.TestCase, SQLTestUtils):
     # These tests are separate because it uses 
'spark.sql.queryExecutionListeners' which is

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35408][PYTHON] Improve parameter validation in DataFrame.show

Reply via email to