Please try reducing the step size. The native BLAS library is not required. -Xiangrui
On Tue, Apr 21, 2015 at 5:15 AM, Staffan <staffan.arvids...@gmail.com> wrote: > Hi, > I've written an application that performs some machine learning on some > data. I've validated that the data _should_ give a good output with a decent > RMSE by using Lib-SVM: > Mean squared error = 0.00922063 (regression) > Squared correlation coefficient = 0.9987 (regression) > > When I try to use Spark ML to do the exact same thing I get: > Mean Squared Error = 8.466193152067944E224 > > Which is "somewhat" worse.. I've tried to look at the data before it's > inputted to the model, printed that data to file (which is actually the data > used when I got the result from Lib-SVM above). Somewhere there much be a > huge mistake, but I cannot place it somewhere in my code (see below). > traningLP and testLP are training and test-data, in RDD[LabeledPoint]. > > // Generate model > val model_gen = new RidgeRegressionWithSGD(); > val model = model_gen.run(trainingLP); > > // Predict on the test-data > val valuesAndPreds = testLP.map { point => > val prediction = model.predict(point.features); > println("label: " + point.label + ", pred: " + prediction); > (point.label, prediction); > } > val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean(); > println("Mean Squared Error = " + MSE) > > > I've printed label and prediction-values for each data-point in the testset, > and the result is something like this; > label: 5.04, pred: -4.607899000641277E112 > label: 3.59, pred: -3.96787105480399E112 > label: 5.06, pred: -2.8263294374576145E112 > label: 2.85, pred: -1.1536508029072844E112 > label: 2.1, pred: -4.269312783707508E111 > label: 2.75, pred: -3.0072665148591558E112 > label: -0.29, pred: -2.035681731641989E112 > label: 1.98, pred: -3.163404340354783E112 > > So there is obviously something wrong with the prediction step. I'm using > the SparseVector representation of the Vector in LabeledPoint, looking > something like this for reference (shortened for convenience); > (-1.59,(2080,[29,59,62,74,127,128,131,144,149,175,198,200,239,247,267,293,307,364,374,393,410,424,425,431,448,469,477,485,501,525,532,533,538,560,..],[1.0,1.0,2.0,8.0,1.0,1.0,6.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,3.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,8.0,2.0,1.0,1.0,..])) > (-1.75,(2080,[103,131,149,208,296,335,520,534,603,620,661,694,709,748,859,1053,1116,1156,1186,1207,1208,1223,1256,1278,1356,1375,1399,1480,1569,..],[1.0,1.0,1.0,1.0,1.0,1.0,1.0,4.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,2.0,2.0,2.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,4.0,1.0,7.0,1.0,3.0,2.0,1.0])) > > I do get one type of warning, but that's about it! (And as to my > understanding, this native code is not required to get the correct results, > only to improve performance). > 6010 [main] WARN com.github.fommil.netlib.BLAS - Failed to load > implementation from: com.github.fommil.netlib.NativeSystemBLAS > 6011 [main] WARN com.github.fommil.netlib.BLAS - Failed to load > implementation from: com.github.fommil.netlib.NativeRefBLAS > > So where do I go from here? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Problem-with-using-Spark-ML-tp22591.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org