[ https://issues.apache.org/jira/browse/FLINK-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609840#comment-14609840 ]
ASF GitHub Bot commented on FLINK-2297: --------------------------------------- Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/874#discussion_r33663520 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/classification/SVM.scala --- @@ -242,8 +275,21 @@ object SVM{ } } - override def predict(value: T, model: DenseVector): Double = { - value.asBreeze dot model.asBreeze + override def predict(value: T, model: DenseVector, predictParameters: ParameterMap): + Double = { + val thresholdOption = predictParameters.get(Threshold) + + val rawValue = value.asBreeze dot model.asBreeze + // If the Threshold option has been reset, we will get back a Some(None) thresholdOption + // causing the exception when we try to get the value. In that case we just return the + // raw value + try { + val thresOptionValue = thresholdOption.get + if (rawValue > thresOptionValue) 1.0 else -1.0 + } + catch { + case e: java.lang.ClassCastException => rawValue + } --- End diff -- This relates to the previous discussion: I do believe we want this turned on by default, when you train a binary classifier you expect that `predict` will return binary labels, not the decision function values. So if we have `None` as default, the user could write: ```scala val svm = SVM(). setBlocks(env.getParallelism) svm.fit(train) val eval = svm.evaluate(test) ``` and the eval output would not make sense, but if he wrote ```scala val svm = SVM(). setBlocks(env.getParallelism). setThreshold(0.0) svm.fit(train) val eval = svm.evaluate(test) ``` it would. > Add threshold setting for SVM binary predictions > ------------------------------------------------ > > Key: FLINK-2297 > URL: https://issues.apache.org/jira/browse/FLINK-2297 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library > Reporter: Theodore Vasiloudis > Assignee: Theodore Vasiloudis > Priority: Minor > Labels: ML > Fix For: 0.10 > > > Currently SVM outputs the raw decision function values when using the predict > function. > We should have instead the ability to set a threshold above which examples > are labeled as positive (1.0) and below negative (-1.0). Then the prediction > function can be directly used for evaluation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)