Repository: spark
Updated Branches:
  refs/heads/master dbd492b7e -> 11a849b3a


[SPARK-22370][SQL][PYSPARK][FOLLOW-UP] Fix a test failure when xmlrunner is 
installed.

## What changes were proposed in this pull request?

This is a follow-up pr of #19587.

If `xmlrunner` is installed, 
`VectorizedUDFTests.test_vectorized_udf_check_config` fails by the following 
error because the `self` which is a subclass of `unittest.TestCase` in the UDF 
`check_records_per_batch` can't be pickled anymore.

```
PicklingError: Cannot pickle files that are not opened for reading: w
```

This changes the UDF not to refer the `self`.

## How was this patch tested?

Tested locally.

Author: Takuya UESHIN <ues...@databricks.com>

Closes #20115 from ueshin/issues/SPARK-22370_fup1.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/11a849b3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/11a849b3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/11a849b3

Branch: refs/heads/master
Commit: 11a849b3a7b3d03c48d3e17c8a721acedfd89285
Parents: dbd492b
Author: Takuya UESHIN <ues...@databricks.com>
Authored: Fri Dec 29 23:04:28 2017 +0900
Committer: hyukjinkwon <gurwls...@gmail.com>
Committed: Fri Dec 29 23:04:28 2017 +0900

----------------------------------------------------------------------
 python/pyspark/sql/tests.py | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/11a849b3/python/pyspark/sql/tests.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py
index 3ef1522..1c34c89 100644
--- a/python/pyspark/sql/tests.py
+++ b/python/pyspark/sql/tests.py
@@ -3825,6 +3825,7 @@ class VectorizedUDFTests(ReusedSQLTestCase):
 
     def test_vectorized_udf_check_config(self):
         from pyspark.sql.functions import pandas_udf, col
+        import pandas as pd
         orig_value = 
self.spark.conf.get("spark.sql.execution.arrow.maxRecordsPerBatch", None)
         self.spark.conf.set("spark.sql.execution.arrow.maxRecordsPerBatch", 3)
         try:
@@ -3832,11 +3833,11 @@ class VectorizedUDFTests(ReusedSQLTestCase):
 
             @pandas_udf(returnType=LongType())
             def check_records_per_batch(x):
-                self.assertTrue(x.size <= 3)
-                return x
+                return pd.Series(x.size).repeat(x.size)
 
-            result = df.select(check_records_per_batch(col("id")))
-            self.assertEqual(df.collect(), result.collect())
+            result = df.select(check_records_per_batch(col("id"))).collect()
+            for (r,) in result:
+                self.assertTrue(r <= 3)
         finally:
             if orig_value is None:
                 
self.spark.conf.unset("spark.sql.execution.arrow.maxRecordsPerBatch")


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to