This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.3 by this push:
     new bf3cdea  [SPARK-24740][PYTHON][ML][BACKPORT-2.3] Make PySpark's tests 
compatible with NumPy 1.14+
bf3cdea is described below

commit bf3cdeae3f27effb50f874cfe05f14192be47783
Author: hyukjinkwon <gurwls...@apache.org>
AuthorDate: Sat Jan 19 13:09:44 2019 +0900

    [SPARK-24740][PYTHON][ML][BACKPORT-2.3] Make PySpark's tests compatible 
with NumPy 1.14+
    
    ## What changes were proposed in this pull request?
    This PR backported SPARK-24740 to branch-2.3;
    This PR proposes to make PySpark's tests compatible with NumPy 0.14+
    NumPy 0.14.x introduced rather radical changes about its string 
representation.
    
    For example, the tests below are failed:
    
    ```
    **********************************************************************
    File "/.../spark/python/pyspark/ml/linalg/__init__.py", line 895, in 
__main__.DenseMatrix.__str__
    Failed example:
        print(dm)
    Expected:
        DenseMatrix([[ 0.,  2.],
                     [ 1.,  3.]])
    Got:
        DenseMatrix([[0., 2.],
                     [1., 3.]])
    **********************************************************************
    File "/.../spark/python/pyspark/ml/linalg/__init__.py", line 899, in 
__main__.DenseMatrix.__str__
    Failed example:
        print(dm)
    Expected:
        DenseMatrix([[ 0.,  1.],
                     [ 2.,  3.]])
    Got:
        DenseMatrix([[0., 1.],
                     [2., 3.]])
    **********************************************************************
    File "/.../spark/python/pyspark/ml/linalg/__init__.py", line 939, in 
__main__.DenseMatrix.toArray
    Failed example:
        m.toArray()
    Expected:
        array([[ 0.,  2.],
               [ 1.,  3.]])
    Got:
        array([[0., 2.],
               [1., 3.]])
    **********************************************************************
    File "/.../spark/python/pyspark/ml/linalg/__init__.py", line 324, in 
__main__.DenseVector.dot
    Failed example:
        dense.dot(np.reshape([1., 2., 3., 4.], (2, 2), order='F'))
    Expected:
        array([  5.,  11.])
    Got:
        array([ 5., 11.])
    **********************************************************************
    File "/.../spark/python/pyspark/ml/linalg/__init__.py", line 567, in 
__main__.SparseVector.dot
    Failed example:
        a.dot(np.array([[1, 1], [2, 2], [3, 3], [4, 4]]))
    Expected:
        array([ 22.,  22.])
    Got:
        array([22., 22.])
    ```
    
    See [release 
note](https://docs.scipy.org/doc/numpy-1.14.0/release.html#compatibility-notes).
    
    ## How was this patch tested?
    
    Manually tested:
    
    ```
    $ ./run-tests --python-executables=python3.6,python2.7 
--modules=pyspark-ml,pyspark-mllib
    Running PySpark tests. Output is in /.../spark/python/unit-tests.log
    Will test against the following Python executables: ['python3.6', 
'python2.7']
    Will test the following Python modules: ['pyspark-ml', 'pyspark-mllib']
    Starting test(python2.7): pyspark.mllib.tests
    Starting test(python2.7): pyspark.ml.classification
    Starting test(python3.6): pyspark.mllib.tests
    Starting test(python2.7): pyspark.ml.clustering
    Finished test(python2.7): pyspark.ml.clustering (54s)
    Starting test(python2.7): pyspark.ml.evaluation
    Finished test(python2.7): pyspark.ml.classification (74s)
    Starting test(python2.7): pyspark.ml.feature
    Finished test(python2.7): pyspark.ml.evaluation (27s)
    Starting test(python2.7): pyspark.ml.fpm
    Finished test(python2.7): pyspark.ml.fpm (0s)
    Starting test(python2.7): pyspark.ml.image
    Finished test(python2.7): pyspark.ml.image (17s)
    Starting test(python2.7): pyspark.ml.linalg.__init__
    Finished test(python2.7): pyspark.ml.linalg.__init__ (1s)
    Starting test(python2.7): pyspark.ml.recommendation
    Finished test(python2.7): pyspark.ml.feature (76s)
    Starting test(python2.7): pyspark.ml.regression
    Finished test(python2.7): pyspark.ml.recommendation (69s)
    Starting test(python2.7): pyspark.ml.stat
    Finished test(python2.7): pyspark.ml.regression (45s)
    Starting test(python2.7): pyspark.ml.tests
    Finished test(python2.7): pyspark.ml.stat (28s)
    Starting test(python2.7): pyspark.ml.tuning
    Finished test(python2.7): pyspark.ml.tuning (20s)
    Starting test(python2.7): pyspark.mllib.classification
    Finished test(python2.7): pyspark.mllib.classification (31s)
    Starting test(python2.7): pyspark.mllib.clustering
    Finished test(python2.7): pyspark.mllib.tests (260s)
    Starting test(python2.7): pyspark.mllib.evaluation
    Finished test(python3.6): pyspark.mllib.tests (266s)
    Starting test(python2.7): pyspark.mllib.feature
    Finished test(python2.7): pyspark.mllib.evaluation (21s)
    Starting test(python2.7): pyspark.mllib.fpm
    Finished test(python2.7): pyspark.mllib.feature (38s)
    Starting test(python2.7): pyspark.mllib.linalg.__init__
    Finished test(python2.7): pyspark.mllib.linalg.__init__ (1s)
    Starting test(python2.7): pyspark.mllib.linalg.distributed
    Finished test(python2.7): pyspark.mllib.fpm (34s)
    Starting test(python2.7): pyspark.mllib.random
    Finished test(python2.7): pyspark.mllib.clustering (64s)
    Starting test(python2.7): pyspark.mllib.recommendation
    Finished test(python2.7): pyspark.mllib.random (15s)
    Starting test(python2.7): pyspark.mllib.regression
    Finished test(python2.7): pyspark.mllib.linalg.distributed (47s)
    Starting test(python2.7): pyspark.mllib.stat.KernelDensity
    Finished test(python2.7): pyspark.mllib.stat.KernelDensity (0s)
    Starting test(python2.7): pyspark.mllib.stat._statistics
    Finished test(python2.7): pyspark.mllib.recommendation (40s)
    Starting test(python2.7): pyspark.mllib.tree
    Finished test(python2.7): pyspark.mllib.regression (38s)
    Starting test(python2.7): pyspark.mllib.util
    Finished test(python2.7): pyspark.mllib.stat._statistics (19s)
    Starting test(python3.6): pyspark.ml.classification
    Finished test(python2.7): pyspark.mllib.tree (26s)
    Starting test(python3.6): pyspark.ml.clustering
    Finished test(python2.7): pyspark.mllib.util (27s)
    Starting test(python3.6): pyspark.ml.evaluation
    Finished test(python3.6): pyspark.ml.evaluation (30s)
    Starting test(python3.6): pyspark.ml.feature
    Finished test(python2.7): pyspark.ml.tests (234s)
    Starting test(python3.6): pyspark.ml.fpm
    Finished test(python3.6): pyspark.ml.fpm (1s)
    Starting test(python3.6): pyspark.ml.image
    Finished test(python3.6): pyspark.ml.clustering (55s)
    Starting test(python3.6): pyspark.ml.linalg.__init__
    Finished test(python3.6): pyspark.ml.linalg.__init__ (0s)
    Starting test(python3.6): pyspark.ml.recommendation
    Finished test(python3.6): pyspark.ml.classification (71s)
    Starting test(python3.6): pyspark.ml.regression
    Finished test(python3.6): pyspark.ml.image (18s)
    Starting test(python3.6): pyspark.ml.stat
    Finished test(python3.6): pyspark.ml.stat (37s)
    Starting test(python3.6): pyspark.ml.tests
    Finished test(python3.6): pyspark.ml.regression (59s)
    Starting test(python3.6): pyspark.ml.tuning
    Finished test(python3.6): pyspark.ml.feature (93s)
    Starting test(python3.6): pyspark.mllib.classification
    Finished test(python3.6): pyspark.ml.recommendation (83s)
    Starting test(python3.6): pyspark.mllib.clustering
    Finished test(python3.6): pyspark.ml.tuning (29s)
    Starting test(python3.6): pyspark.mllib.evaluation
    Finished test(python3.6): pyspark.mllib.evaluation (26s)
    Starting test(python3.6): pyspark.mllib.feature
    Finished test(python3.6): pyspark.mllib.classification (43s)
    Starting test(python3.6): pyspark.mllib.fpm
    Finished test(python3.6): pyspark.mllib.clustering (81s)
    Starting test(python3.6): pyspark.mllib.linalg.__init__
    Finished test(python3.6): pyspark.mllib.linalg.__init__ (2s)
    Starting test(python3.6): pyspark.mllib.linalg.distributed
    Finished test(python3.6): pyspark.mllib.fpm (48s)
    Starting test(python3.6): pyspark.mllib.random
    Finished test(python3.6): pyspark.mllib.feature (54s)
    Starting test(python3.6): pyspark.mllib.recommendation
    Finished test(python3.6): pyspark.mllib.random (18s)
    Starting test(python3.6): pyspark.mllib.regression
    Finished test(python3.6): pyspark.mllib.linalg.distributed (55s)
    Starting test(python3.6): pyspark.mllib.stat.KernelDensity
    Finished test(python3.6): pyspark.mllib.stat.KernelDensity (1s)
    Starting test(python3.6): pyspark.mllib.stat._statistics
    Finished test(python3.6): pyspark.mllib.recommendation (51s)
    Starting test(python3.6): pyspark.mllib.tree
    Finished test(python3.6): pyspark.mllib.regression (45s)
    Starting test(python3.6): pyspark.mllib.util
    Finished test(python3.6): pyspark.mllib.stat._statistics (21s)
    Finished test(python3.6): pyspark.mllib.tree (27s)
    Finished test(python3.6): pyspark.mllib.util (27s)
    Finished test(python3.6): pyspark.ml.tests (264s)
    ```
    
    Closes #23591 from maropu/BACKPORT-24740.
    
    Authored-by: hyukjinkwon <gurwls...@apache.org>
    Signed-off-by: Takeshi Yamamuro <yamam...@apache.org>
---
 python/pyspark/ml/clustering.py            | 6 ++++++
 python/pyspark/ml/linalg/__init__.py       | 5 +++++
 python/pyspark/ml/stat.py                  | 6 ++++++
 python/pyspark/mllib/clustering.py         | 6 ++++++
 python/pyspark/mllib/evaluation.py         | 6 ++++++
 python/pyspark/mllib/linalg/__init__.py    | 6 ++++++
 python/pyspark/mllib/linalg/distributed.py | 6 ++++++
 python/pyspark/mllib/stat/_statistics.py   | 6 ++++++
 8 files changed, 47 insertions(+)

diff --git a/python/pyspark/ml/clustering.py b/python/pyspark/ml/clustering.py
index 66fb005..8bb8fd1 100644
--- a/python/pyspark/ml/clustering.py
+++ b/python/pyspark/ml/clustering.py
@@ -1134,8 +1134,14 @@ class LDA(JavaEstimator, HasFeaturesCol, HasMaxIter, 
HasSeed, HasCheckpointInter
 
 if __name__ == "__main__":
     import doctest
+    import numpy
     import pyspark.ml.clustering
     from pyspark.sql import SparkSession
+    try:
+        # Numpy 1.14+ changed it's string format.
+        numpy.set_printoptions(legacy='1.13')
+    except TypeError:
+        pass
     globs = pyspark.ml.clustering.__dict__.copy()
     # The small batch size here ensures that we see multiple batches,
     # even in these small test examples:
diff --git a/python/pyspark/ml/linalg/__init__.py 
b/python/pyspark/ml/linalg/__init__.py
index c2fc29d..9ee7368 100644
--- a/python/pyspark/ml/linalg/__init__.py
+++ b/python/pyspark/ml/linalg/__init__.py
@@ -1160,6 +1160,11 @@ class Matrices(object):
 
 def _test():
     import doctest
+    try:
+        # Numpy 1.14+ changed it's string format.
+        np.set_printoptions(legacy='1.13')
+    except TypeError:
+        pass
     (failure_count, test_count) = doctest.testmod(optionflags=doctest.ELLIPSIS)
     if failure_count:
         exit(-1)
diff --git a/python/pyspark/ml/stat.py b/python/pyspark/ml/stat.py
index 079b083..12a5f22 100644
--- a/python/pyspark/ml/stat.py
+++ b/python/pyspark/ml/stat.py
@@ -134,8 +134,14 @@ class Correlation(object):
 
 if __name__ == "__main__":
     import doctest
+    import numpy
     import pyspark.ml.stat
     from pyspark.sql import SparkSession
+    try:
+        # Numpy 1.14+ changed it's string format.
+        numpy.set_printoptions(legacy='1.13')
+    except TypeError:
+        pass
 
     globs = pyspark.ml.stat.__dict__.copy()
     # The small batch size here ensures that we see multiple batches,
diff --git a/python/pyspark/mllib/clustering.py 
b/python/pyspark/mllib/clustering.py
index bb687a7..74d6159 100644
--- a/python/pyspark/mllib/clustering.py
+++ b/python/pyspark/mllib/clustering.py
@@ -1042,7 +1042,13 @@ class LDA(object):
 
 def _test():
     import doctest
+    import numpy
     import pyspark.mllib.clustering
+    try:
+        # Numpy 1.14+ changed it's string format.
+        numpy.set_printoptions(legacy='1.13')
+    except TypeError:
+        pass
     globs = pyspark.mllib.clustering.__dict__.copy()
     globs['sc'] = SparkContext('local[4]', 'PythonTest', batchSize=2)
     (failure_count, test_count) = doctest.testmod(globs=globs, 
optionflags=doctest.ELLIPSIS)
diff --git a/python/pyspark/mllib/evaluation.py 
b/python/pyspark/mllib/evaluation.py
index 2cd1da3..9f8f2a5 100644
--- a/python/pyspark/mllib/evaluation.py
+++ b/python/pyspark/mllib/evaluation.py
@@ -531,8 +531,14 @@ class MultilabelMetrics(JavaModelWrapper):
 
 def _test():
     import doctest
+    import numpy
     from pyspark.sql import SparkSession
     import pyspark.mllib.evaluation
+    try:
+        # Numpy 1.14+ changed it's string format.
+        numpy.set_printoptions(legacy='1.13')
+    except TypeError:
+        pass
     globs = pyspark.mllib.evaluation.__dict__.copy()
     spark = SparkSession.builder\
         .master("local[4]")\
diff --git a/python/pyspark/mllib/linalg/__init__.py 
b/python/pyspark/mllib/linalg/__init__.py
index ced1eca..bcca2db 100644
--- a/python/pyspark/mllib/linalg/__init__.py
+++ b/python/pyspark/mllib/linalg/__init__.py
@@ -1372,6 +1372,12 @@ class QRDecomposition(object):
 
 def _test():
     import doctest
+    import numpy
+    try:
+        # Numpy 1.14+ changed it's string format.
+        numpy.set_printoptions(legacy='1.13')
+    except TypeError:
+        pass
     (failure_count, test_count) = doctest.testmod(optionflags=doctest.ELLIPSIS)
     if failure_count:
         exit(-1)
diff --git a/python/pyspark/mllib/linalg/distributed.py 
b/python/pyspark/mllib/linalg/distributed.py
index 4cb8025..28b6d74 100644
--- a/python/pyspark/mllib/linalg/distributed.py
+++ b/python/pyspark/mllib/linalg/distributed.py
@@ -1364,9 +1364,15 @@ class BlockMatrix(DistributedMatrix):
 
 def _test():
     import doctest
+    import numpy
     from pyspark.sql import SparkSession
     from pyspark.mllib.linalg import Matrices
     import pyspark.mllib.linalg.distributed
+    try:
+        # Numpy 1.14+ changed it's string format.
+        numpy.set_printoptions(legacy='1.13')
+    except TypeError:
+        pass
     globs = pyspark.mllib.linalg.distributed.__dict__.copy()
     spark = SparkSession.builder\
         .master("local[2]")\
diff --git a/python/pyspark/mllib/stat/_statistics.py 
b/python/pyspark/mllib/stat/_statistics.py
index 49b2644..7731dca 100644
--- a/python/pyspark/mllib/stat/_statistics.py
+++ b/python/pyspark/mllib/stat/_statistics.py
@@ -303,7 +303,13 @@ class Statistics(object):
 
 def _test():
     import doctest
+    import numpy
     from pyspark.sql import SparkSession
+    try:
+        # Numpy 1.14+ changed it's string format.
+        numpy.set_printoptions(legacy='1.13')
+    except TypeError:
+        pass
     globs = globals().copy()
     spark = SparkSession.builder\
         .master("local[4]")\


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to