Hi Steffen, Thanks for sharing your results about MLlib — this sounds like a useful tool. However, I wanted to point out that some of the results may be expected for certain machine learning algorithms, so it might be good to design those tests with that in mind. For example:
> - The classification of LogisticRegression, DecisionTree, and RandomForest > were not inverted when all binary class labels are flipped. > - The classification of LogisticRegression, DecisionTree, GBT, and > RandomForest sometimes changed when the features are reordered. > - The classification of LogisticRegression, RandomForest, and LinearSVC > sometimes changed when the instances are reordered. All of these things might occur because the algorithms are nondeterministic. Were the effects large or small? Or, for example, was the final difference in accuracy statistically significant? Many ML algorithms are trained using randomized algorithms like stochastic gradient descent, so you can’t expect exactly the same results under these changes. > - The classification of NaïveBayes and the LinearSVC sometimes changed if one > is added to each feature value. This might be due to nondeterminism as above, but it might also be due to regularization or nonlinear effects for some algorithms. For example, some algorithms might look at the relative values of features, in which case adding 1 to each feature value transforms the data. Other algorithms might require that data be centered around a mean of 0 to work best. I haven’t read the paper in detail, but basically it would be good to account for randomized algorithms as well as various model assumptions, and make sure the differences in results in these tests are statistically significant. Matei --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org