zhao bo created SPARK-29205:
-------------------------------

             Summary: Pyspark tests failed for suspected performance problem on 
ARM
                 Key: SPARK-29205
                 URL: https://issues.apache.org/jira/browse/SPARK-29205
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
    Affects Versions: 3.0.0
         Environment: OS: Ubuntu16.04

Arch: aarch64

Host: Virtual Machine
            Reporter: zhao bo


We test the pyspark on ARM VM. But found some test fails, once we change the 
source code to extend the wait time for making sure those test tasks had 
finished, then the test will pass.

 

The affected test cases including:

pyspark.mllib.tests.test_streaming_algorithms:StreamingLinearRegressionWithTests.test_parameter_convergence

pyspark.mllib.tests.test_streaming_algorithms:StreamingLogisticRegressionWithSGDTests.test_convergence

pyspark.mllib.tests.test_streaming_algorithms:StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy

pyspark.mllib.tests.test_streaming_algorithms:StreamingLogisticRegressionWithSGDTests.test_training_and_prediction

The error message about above test fails:
======================================================================
FAIL: test_parameter_convergence 
(pyspark.mllib.tests.test_streaming_algorithms.StreamingLinearRegressionWithTests)
Test that the model parameters improve with streaming data.
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 429, in test_parameter_convergen ce
    self._eventually(condition, catch_assertions=True)
  File 
"/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 74, in _eventually
    raise lastValue
  File 
"/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 65, in _eventually
    lastValue = condition()
  File 
"/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 425, in condition
    self.assertEqual(len(model_weights), len(batches))
AssertionError: 6 != 10
 
 
======================================================================
FAIL: test_convergence 
(pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 292, in test_convergence
    self._eventually(condition, 60.0, catch_assertions=True)
  File 
"/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 74, in _eventually
    raise lastValue
  File 
"/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 65, in _eventually
    lastValue = condition()
  File 
"/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 288, in condition
    self.assertEqual(len(models), len(input_batches))
AssertionError: 19 != 20
 
======================================================================
FAIL: test_parameter_accuracy 
(pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 266, in test_parameter_accuracy
    self._eventually(condition, catch_assertions=True)
  File 
"/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 74, in _eventually
    raise lastValue
  File 
"/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 65, in _eventually
    lastValue = condition()
  File 
"/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 263, in condition
    self.assertAlmostEqual(rel, 0.1, 1)
AssertionError: 0.21309223935797794 != 0.1 within 1 places
 
======================================================================
FAIL: test_training_and_prediction 
(pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests)
Test that the model improves on toy data with no. of batches
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 367, in test_training_and_predic tion
    self._eventually(condition, timeout=60.0)
  File 
"/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 78, in _eventually
    % (timeout, lastValue))
AssertionError: Test failed due to timeout after 60 sec, with last condition 
returning: Latest errors: 0.67, 0.71, 0.78, 0.7, 0. 75, 0.74, 0.73, 0.69, 0.62, 
0.71, 0.69, 0.75, 0.72, 0.77, 0.71, 0.74, 0.76, 0.78, 0.7, 0.78, 0.8
 
 
Is it possible to expand the job time to make sure the job run finish if the 
test result is not change?? Any help or advice is welcome. Thanks

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to