Re: FlinkML SVM Predictions are always 1.0

Rong Rong Mon, 25 Jun 2018 08:06:02 -0700

Looking at the Java/Scala Doc for this class [1]. Seems like this only
supports +1.0 and -1.0 as labeling and there's no mention you can use any
positive integer.


I tried your use case and using just +1 and -1 actually works fine.

--
Rong

[1]
https://github.com/apache/flink/blob/master/flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/classification/SVM.scala

On Mon, Jun 25, 2018 at 7:06 AM Mano Swerts <mano.swe...@ixxus.com> wrote:

> Hi all,
>
> This is just getting stranger… After playing a while, it seems that if I
> have a vector that has value of 0 (i.e. all zero’s) it classifies it as
> -1.0. Any other value for the vector causes it to classify as 1.0:
>
> ============================
> === Predictions
> ============================
> (DenseVector(0.0, 0.0, 0.0),-1.0)
> (DenseVector(0.0, 0.5, 0.0),1.0)
> (DenseVector(1.0, 1.0, 1.0),1.0)
> (DenseVector(0.0, 0.0, 0.0),-1.0)
> (DenseVector(0.0, 0.5, 1.0),1.0)
>
> So it seems that my values need to be binary for this prediction to work,
> which of course does not make sense and doesn’t match the data from the
> example on the Flink website. It gives me the impression that it is using
> the vector as the label instead of the value…
>
> Any insights?
>
> — Mano
>
> On 25 Jun 2018, at 11:40, Mano Swerts <mano.swe...@ixxus.com<mailto:
> mano.swe...@ixxus.com>> wrote:
>
> Hi Rong,
>
> As you can see in my test data example, I did change the labeling data to
> 8 and 16 instead of 1 and 0.
>
> If SVM always returns +1.0 or -1.0, that would then indeed explain where
> the 1.0 is coming from. But, it never gives me -1.0, so there is still
> something wrong as it classifies everything under the same label.
>
> Thanks.
>
> — Mano
>
> On 23 Jun 2018, at 20:50, Rong Rong <walter...@gmail.com<mailto:
> walter...@gmail.com>> wrote:
>
> Hi Mano,
>
> For the always positive prediction result. I think the standard svmguide
> data [1] is labeling data as 0.0 and 1.0 instead of -1.0 and +1.0. Maybe
> correcting that should work for your case.
> For the change of eval pairs, I think SVM in FlinkML will always return
> a +1.0 or -1.0 when you use it this way as a binary classification.
>
> Thanks,
> Rong
>
> [1]
> https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/svmguide1
>
> On Fri, Jun 22, 2018 at 6:49 AM Mano Swerts <mano.swe...@ixxus.com<mailto:
> mano.swe...@ixxus.com>> wrote:
>
> Hi guys,
>
> Here I am again. I am playing with Flink ML and was just trying to get the
> example to work used in the documentation:
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html#loading-data
> (the one using the astroparticle LibSVM data).
>
> My code is basically what you see in the example, with some more output
> for verification:
>
>
> object LearnDocumentEntityRelationship {
>
>   val trainingDataPath = “/data/svmguide1.training.txt"
>   val testDataPath = “/data/svmguide1.test.txt"
>
>   def main(args: Array[String]) {
>       val env = ExecutionEnvironment.getExecutionEnvironment
>
>       val trainingData: DataSet[LabeledVector] = MLUtils.readLibSVM(env,
> trainingDataPath)
>
>       println("============================")
>       println("=== Training Data")
>       println("============================")
>       trainingData.print()
>
>       val testData = MLUtils.readLibSVM(env, testDataPath).map(x =>
> (x.vector, x.label))
>
>       println("============================")
>       println("=== Test Data")
>       println("============================")
>       testData.print()
>
>       val svm = SVM()
>           .setBlocks(env.getParallelism)
>           .setIterations(100)
>           .setRegularization(0.001)
>           .setStepsize(0.1)
>           .setSeed(42)
>
>       svm.fit(trainingData)
>
>       val evaluationPairs: DataSet[(Double, Double)] =
> svm.evaluate(testData)
>
>       println("============================")
>       println("=== Evaluation Pairs")
>       println("============================")
>       evaluationPairs.print()
>
>       val realData = MLUtils.readLibSVM(env, testDataPath).map(x =>
> x.vector)
>
>       var predictionDS = svm.predict(realData)
>
>       println("============================")
>       println("=== Predictions")
>       println("============================")
>       predictionDS.print()
>
>       println("=== End")
>
>       env.execute("Learn Document Entity Relationship Job")
>   }
> }
>
>
> The issue is that the predictions (from both the evaluation pairs and the
> prediction dataset) are always equal to “1.0”. When I changed the labels in
> the data files to 16 and 8 (so 1 is not a valid label anymore) it still
> keeps predicting “1.0” for every single record. I also tried with some
> other custom datasets, but I always get that same result.
>
> This is a concise part of the output (as the data contains to many records
> to put here):
>
> ============================
> === Test Data
> ============================
> (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797),
> (3,97.52163)),16.0)
> (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797),
> (3,97.52163)),16.0)
> (SparseVector((0,77.948), (1,193.678), (2,0.1584834), (3,122.2632)),8.0)
> (SparseVector((0,50.24301), (1,312.111), (2,-0.166669), (3,179.9808)),8.0)
>
> ============================
> === Evaluation Pairs
> ============================
> (16.0,1.0)
> (16.0,1.0)
> (8.0,1.0)
> (8.0,1.0)
>
> ============================
> === Predictions
> ============================
> (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), (3,97.52163)),1.0)
> (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), (3,97.52163)),1.0)
> (SparseVector((0,77.948), (1,193.678), (2,0.1584834), (3,122.2632)),1.0)
> (SparseVector((0,50.24301), (1,312.111), (2,-0.166669), (3,179.9808)),1.0)
>
>
> Am I doing something wrong?
>
> Any pointers are greatly appreciated. Thanks!
>
> — Mano
>
>
>
>

Re: FlinkML SVM Predictions are always 1.0

Reply via email to