Hi guys, Here I am again. I am playing with Flink ML and was just trying to get the example to work used in the documentation: https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html#loading-data (the one using the astroparticle LibSVM data).
My code is basically what you see in the example, with some more output for verification: object LearnDocumentEntityRelationship { val trainingDataPath = “/data/svmguide1.training.txt" val testDataPath = “/data/svmguide1.test.txt" def main(args: Array[String]) { val env = ExecutionEnvironment.getExecutionEnvironment val trainingData: DataSet[LabeledVector] = MLUtils.readLibSVM(env, trainingDataPath) println("============================") println("=== Training Data") println("============================") trainingData.print() val testData = MLUtils.readLibSVM(env, testDataPath).map(x => (x.vector, x.label)) println("============================") println("=== Test Data") println("============================") testData.print() val svm = SVM() .setBlocks(env.getParallelism) .setIterations(100) .setRegularization(0.001) .setStepsize(0.1) .setSeed(42) svm.fit(trainingData) val evaluationPairs: DataSet[(Double, Double)] = svm.evaluate(testData) println("============================") println("=== Evaluation Pairs") println("============================") evaluationPairs.print() val realData = MLUtils.readLibSVM(env, testDataPath).map(x => x.vector) var predictionDS = svm.predict(realData) println("============================") println("=== Predictions") println("============================") predictionDS.print() println("=== End") env.execute("Learn Document Entity Relationship Job") } } The issue is that the predictions (from both the evaluation pairs and the prediction dataset) are always equal to “1.0”. When I changed the labels in the data files to 16 and 8 (so 1 is not a valid label anymore) it still keeps predicting “1.0” for every single record. I also tried with some other custom datasets, but I always get that same result. This is a concise part of the output (as the data contains to many records to put here): ============================ === Test Data ============================ (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), (3,97.52163)),16.0) (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), (3,97.52163)),16.0) (SparseVector((0,77.948), (1,193.678), (2,0.1584834), (3,122.2632)),8.0) (SparseVector((0,50.24301), (1,312.111), (2,-0.166669), (3,179.9808)),8.0) ============================ === Evaluation Pairs ============================ (16.0,1.0) (16.0,1.0) (8.0,1.0) (8.0,1.0) ============================ === Predictions ============================ (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), (3,97.52163)),1.0) (SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), (3,97.52163)),1.0) (SparseVector((0,77.948), (1,193.678), (2,0.1584834), (3,122.2632)),1.0) (SparseVector((0,50.24301), (1,312.111), (2,-0.166669), (3,179.9808)),1.0) Am I doing something wrong? Any pointers are greatly appreciated. Thanks! — Mano