Looking at the Java/Scala Doc for this class [1]. Seems like this only supports +1.0 and -1.0 as labeling and there's no mention you can use any positive integer.
I tried your use case and using just +1 and -1 actually works fine. -- Rong [1] https://github.com/apache/flink/blob/master/flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/classification/SVM.scala On Mon, Jun 25, 2018 at 7:06 AM Mano Swerts <mano.swe...@ixxus.com> wrote: > Hi all, > > This is just getting stranger… After playing a while, it seems that if I > have a vector that has value of 0 (i.e. all zero’s) it classifies it as > -1.0. Any other value for the vector causes it to classify as 1.0: > > ============================ > === Predictions > ============================ > (DenseVector(0.0, 0.0, 0.0),-1.0) > (DenseVector(0.0, 0.5, 0.0),1.0) > (DenseVector(1.0, 1.0, 1.0),1.0) > (DenseVector(0.0, 0.0, 0.0),-1.0) > (DenseVector(0.0, 0.5, 1.0),1.0) > > So it seems that my values need to be binary for this prediction to work, > which of course does not make sense and doesn’t match the data from the > example on the Flink website. It gives me the impression that it is using > the vector as the label instead of the value… > > Any insights? > > — Mano > > On 25 Jun 2018, at 11:40, Mano Swerts <mano.swe...@ixxus.com<mailto: > mano.swe...@ixxus.com>> wrote: > > Hi Rong, > > As you can see in my test data example, I did change the labeling data to > 8 and 16 instead of 1 and 0. > > If SVM always returns +1.0 or -1.0, that would then indeed explain where > the 1.0 is coming from. But, it never gives me -1.0, so there is still > something wrong as it classifies everything under the same label. > > Thanks. > > — Mano > > On 23 Jun 2018, at 20:50, Rong Rong <walter...@gmail.com<mailto: > walter...@gmail.com>> wrote: > > Hi Mano, > > For the always positive prediction result. I think the standard svmguide > data [1] is labeling data as 0.0 and 1.0 instead of -1.0 and +1.0. Maybe > correcting that should work for your case. > For the change of eval pairs, I think SVM in FlinkML will always return > a +1.0 or -1.0 when you use it this way as a binary classification. > > Thanks, > Rong > > [1] > https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/svmguide1 > > On Fri, Jun 22, 2018 at 6:49 AM Mano Swerts <mano.swe...@ixxus.com<mailto: > mano.swe...@ixxus.com>> wrote: > > Hi guys, > > Here I am again. I am playing with Flink ML and was just trying to get the > example to work used in the documentation: > > https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html#loading-data > (the one using the astroparticle LibSVM data). > > My code is basically what you see in the example, with some more output > for verification: > > > object LearnDocumentEntityRelationship { > > val trainingDataPath = “/data/svmguide1.training.txt" > val testDataPath = “/data/svmguide1.test.txt" > > def main(args: Array[String]) { > val env = ExecutionEnvironment.getExecutionEnvironment > > val trainingData: DataSet[LabeledVector] = MLUtils.readLibSVM(env, > trainingDataPath) > > println("============================") > println("=== Training Data") > println("============================") > trainingData.print() > > val testData = MLUtils.readLibSVM(env, testDataPath).map(x => > (x.vector, x.label)) > > println("============================") > println("=== Test Data") > println("============================") > testData.print() > > val svm = SVM() > .setBlocks(env.getParallelism) > .setIterations(100) > .setRegularization(0.001) > .setStepsize(0.1) > .setSeed(42) > > svm.fit(trainingData) > > val evaluationPairs: DataSet[(Double, Double)] = > svm.evaluate(testData) > > println("============================") > println("=== Evaluation Pairs") > println("============================") > evaluationPairs.print() > > val realData = MLUtils.readLibSVM(env, testDataPath).map(x => > x.vector) > > var predictionDS = svm.predict(realData) > > println("============================") > println("=== Predictions") > println("============================") > predictionDS.print() > > println("=== End") > > env.execute("Learn Document Entity Relationship Job") > } > } > > > The issue is that the predictions (from both the evaluation pairs and the > prediction dataset) are always equal to “1.0”. When I changed the labels in > the data files to 16 and 8 (so 1 is not a valid label anymore) it still > keeps predicting “1.0” for every single record. I also tried with some > other custom datasets, but I always get that same result. > > This is a concise part of the output (as the data contains to many records > to put here): > > ============================ > === Test Data > ============================ > (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), > (3,97.52163)),16.0) > (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), > (3,97.52163)),16.0) > (SparseVector((0,77.948), (1,193.678), (2,0.1584834), (3,122.2632)),8.0) > (SparseVector((0,50.24301), (1,312.111), (2,-0.166669), (3,179.9808)),8.0) > > ============================ > === Evaluation Pairs > ============================ > (16.0,1.0) > (16.0,1.0) > (8.0,1.0) > (8.0,1.0) > > ============================ > === Predictions > ============================ > (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), (3,97.52163)),1.0) > (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), (3,97.52163)),1.0) > (SparseVector((0,77.948), (1,193.678), (2,0.1584834), (3,122.2632)),1.0) > (SparseVector((0,50.24301), (1,312.111), (2,-0.166669), (3,179.9808)),1.0) > > > Am I doing something wrong? > > Any pointers are greatly appreciated. Thanks! > > — Mano > > > >