Dear Spark developers,

Are there any best practices or guidelines for machine learning unit tests in 
Spark? After taking a brief look at the unit tests in ML and MLlib, I have 
found that each algorithm is tested in a different way. There are few kinds of 
tests:
1)Partial check of internal algorithm correctness. This can be anything.
2)Generate test data with distribution specific to the algorithm, do machine 
learning and check the outcomes. This is also very specific.
3)Compare the parameters (weights) of machine learning model with parameters 
from existing implementations, such as R or SciPy. This looks more like a 
useful test, so that you are sure you will get the same result from the 
algorithm as other people get using other software.

After googling a bit, I've found the following guidelines rather relevant:
http://blog.mpacula.com/2011/02/17/unit-testing-statistical-software/

I am wondering, should we come up with specific guidelines for machine 
learning, such as that the user is guaranteed to get the expected result? This 
also might be considered as additional benefit for Spark - to be standardized 
ML.

Best regards, Alexander

Reply via email to