Hi guys,

Here I am again. I am playing with Flink ML and was just trying to get the 
example to work used in the documentation: 
https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html#loading-data
 (the one using the astroparticle LibSVM data).

My code is basically what you see in the example, with some more output for 
verification:


object LearnDocumentEntityRelationship {

    val trainingDataPath = “/data/svmguide1.training.txt"
    val testDataPath = “/data/svmguide1.test.txt"

    def main(args: Array[String]) {
        val env = ExecutionEnvironment.getExecutionEnvironment

        val trainingData: DataSet[LabeledVector] = MLUtils.readLibSVM(env, 
trainingDataPath)

        println("============================")
        println("=== Training Data")
        println("============================")
        trainingData.print()

        val testData = MLUtils.readLibSVM(env, testDataPath).map(x => 
(x.vector, x.label))

        println("============================")
        println("=== Test Data")
        println("============================")
        testData.print()

        val svm = SVM()
            .setBlocks(env.getParallelism)
            .setIterations(100)
            .setRegularization(0.001)
            .setStepsize(0.1)
            .setSeed(42)

        svm.fit(trainingData)

        val evaluationPairs: DataSet[(Double, Double)] = svm.evaluate(testData)

        println("============================")
        println("=== Evaluation Pairs")
        println("============================")
        evaluationPairs.print()

        val realData = MLUtils.readLibSVM(env, testDataPath).map(x => x.vector)

        var predictionDS = svm.predict(realData)

        println("============================")
        println("=== Predictions")
        println("============================")
        predictionDS.print()

        println("=== End")

        env.execute("Learn Document Entity Relationship Job")
    }
}


The issue is that the predictions (from both the evaluation pairs and the 
prediction dataset) are always equal to “1.0”. When I changed the labels in the 
data files to 16 and 8 (so 1 is not a valid label anymore) it still keeps 
predicting “1.0” for every single record. I also tried with some other custom 
datasets, but I always get that same result.

This is a concise part of the output (as the data contains to many records to 
put here):

============================
=== Test Data
============================
(SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), (3,97.52163)),16.0)
(SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), (3,97.52163)),16.0)
(SparseVector((0,77.948), (1,193.678), (2,0.1584834), (3,122.2632)),8.0)
(SparseVector((0,50.24301), (1,312.111), (2,-0.166669), (3,179.9808)),8.0)

============================
=== Evaluation Pairs
============================
(16.0,1.0)
(16.0,1.0)
(8.0,1.0)
(8.0,1.0)

============================
=== Predictions
============================
(SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), (3,97.52163)),1.0)
(SparseVector((0,4.236298), (1,21.9821), (2,-0.3503797), (3,97.52163)),1.0)
(SparseVector((0,77.948), (1,193.678), (2,0.1584834), (3,122.2632)),1.0)
(SparseVector((0,50.24301), (1,312.111), (2,-0.166669), (3,179.9808)),1.0)


Am I doing something wrong?

Any pointers are greatly appreciated. Thanks!

— Mano

Reply via email to