Re: Problem with using Spark ML

Xiangrui Meng Wed, 22 Apr 2015 22:47:18 -0700

Please try reducing the step size. The native BLAS library is not
required. -Xiangrui


On Tue, Apr 21, 2015 at 5:15 AM, Staffan <staffan.arvids...@gmail.com> wrote:
> Hi,
> I've written an application that performs some machine learning on some
> data. I've validated that the data _should_ give a good output with a decent
> RMSE by using Lib-SVM:
> Mean squared error = 0.00922063 (regression)
> Squared correlation coefficient = 0.9987 (regression)
>
> When I try to use Spark ML to do the exact same thing I get:
> Mean Squared Error = 8.466193152067944E224
>
> Which is "somewhat" worse.. I've tried to look at the data before it's
> inputted to the model, printed that data to file (which is actually the data
> used when I got the result from Lib-SVM above). Somewhere there much be a
> huge mistake, but I cannot place it somewhere in my code (see below).
> traningLP and testLP are training and test-data, in RDD[LabeledPoint].
>
> // Generate model
> val model_gen = new RidgeRegressionWithSGD();
> val model = model_gen.run(trainingLP);
>
> // Predict on the test-data
> val valuesAndPreds = testLP.map { point =>
>         val prediction = model.predict(point.features);
>         println("label: " + point.label + ", pred: " + prediction);
>         (point.label, prediction);
> }
> val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean();
> println("Mean Squared Error = " + MSE)
>
>
> I've printed label and prediction-values for each data-point in the testset,
> and the result is something like this;
> label: 5.04, pred: -4.607899000641277E112
> label: 3.59, pred: -3.96787105480399E112
> label: 5.06, pred: -2.8263294374576145E112
> label: 2.85, pred: -1.1536508029072844E112
> label: 2.1, pred: -4.269312783707508E111
> label: 2.75, pred: -3.0072665148591558E112
> label: -0.29, pred: -2.035681731641989E112
> label: 1.98, pred: -3.163404340354783E112
>
> So there is obviously something wrong with the prediction step. I'm using
> the SparseVector representation of the Vector in LabeledPoint, looking
> something like this for reference (shortened for convenience);
> (-1.59,(2080,[29,59,62,74,127,128,131,144,149,175,198,200,239,247,267,293,307,364,374,393,410,424,425,431,448,469,477,485,501,525,532,533,538,560,..],[1.0,1.0,2.0,8.0,1.0,1.0,6.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,3.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,8.0,2.0,1.0,1.0,..]))
> (-1.75,(2080,[103,131,149,208,296,335,520,534,603,620,661,694,709,748,859,1053,1116,1156,1186,1207,1208,1223,1256,1278,1356,1375,1399,1480,1569,..],[1.0,1.0,1.0,1.0,1.0,1.0,1.0,4.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,2.0,2.0,2.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,4.0,1.0,7.0,1.0,3.0,2.0,1.0]))
>
> I do get one type of warning, but that's about it! (And as to my
> understanding, this native code is not required to get the correct results,
> only to improve performance).
> 6010 [main] WARN  com.github.fommil.netlib.BLAS  - Failed to load
> implementation from: com.github.fommil.netlib.NativeSystemBLAS
> 6011 [main] WARN  com.github.fommil.netlib.BLAS  - Failed to load
> implementation from: com.github.fommil.netlib.NativeRefBLAS
>
> So where do I go from here?
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Problem-with-using-Spark-ML-tp22591.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Problem with using Spark ML

Reply via email to