[jira] [Commented] (SPARK-6349) Add probability estimates in SVMModel predict result
[ https://issues.apache.org/jira/browse/SPARK-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014043#comment-16014043 ] Nick Pentreath commented on SPARK-6349: --- This is now covered by {{ml}}'s {{LinearSVC}}. Shall we close? > Add probability estimates in SVMModel predict result > > > Key: SPARK-6349 > URL: https://issues.apache.org/jira/browse/SPARK-6349 > Project: Spark > Issue Type: New Feature > Components: MLlib >Affects Versions: 1.2.1 >Reporter: tanyinyan > Original Estimate: 168h > Remaining Estimate: 168h > > In SVMModel, predictPoint method output raw margin(threshold not set) or 1/0 > label(threshold set). > when SVM are used as a classifier, it's hard to find a good threshold,and the > raw margin is hard to understand. > when I am using SVM on > dataset(https://www.kaggle.com/c/avazu-ctr-prediction/data), train on the > first day's dataset(ignore field id/device_id/device_ip, all remaining fields > are concidered as categorical variable, and sparsed before SVM) and predict > on the same data with threshold cleared, the predict result are all > negative. I have to set threshold to -1 to get a reasonable confusion matrix. > So, I suggest to provide probability predict result in SVMModel as in > libSVM(Platt's binary SVM Probablistic Output) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6349) Add probability estimates in SVMModel predict result
[ https://issues.apache.org/jira/browse/SPARK-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364888#comment-14364888 ] Sean Owen commented on SPARK-6349: -- One is just a monotonic function of the other. It doesn't change the problem at all in that respect. > Add probability estimates in SVMModel predict result > > > Key: SPARK-6349 > URL: https://issues.apache.org/jira/browse/SPARK-6349 > Project: Spark > Issue Type: New Feature > Components: MLlib >Affects Versions: 1.2.1 >Reporter: tanyinyan > Original Estimate: 168h > Remaining Estimate: 168h > > In SVMModel, predictPoint method output raw margin(threshold not set) or 1/0 > label(threshold set). > when SVM are used as a classifier, it's hard to find a good threshold,and the > raw margin is hard to understand. > when I am using SVM on > dataset(https://www.kaggle.com/c/avazu-ctr-prediction/data), train on the > first day's dataset(ignore field id/device_id/device_ip, all remaining fields > are concidered as categorical variable, and sparsed before SVM) and predict > on the same data with threshold cleared, the predict result are all > negative. I have to set threshold to -1 to get a reasonable confusion matrix. > So, I suggest to provide probability predict result in SVMModel as in > libSVM(Platt's binary SVM Probablistic Output) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6349) Add probability estimates in SVMModel predict result
[ https://issues.apache.org/jira/browse/SPARK-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364361#comment-14364361 ] tanyinyan commented on SPARK-6349: -- Yes, this doesn't solve the problem of picking which threshold. But a raw margin usually has no fixed boundary(as i tested above, output margin are all negative),but a probability threshold has. So it's more convenient to pick a good threshold , right? > Add probability estimates in SVMModel predict result > > > Key: SPARK-6349 > URL: https://issues.apache.org/jira/browse/SPARK-6349 > Project: Spark > Issue Type: New Feature > Components: MLlib >Affects Versions: 1.2.1 >Reporter: tanyinyan > Original Estimate: 168h > Remaining Estimate: 168h > > In SVMModel, predictPoint method output raw margin(threshold not set) or 1/0 > label(threshold set). > when SVM are used as a classifier, it's hard to find a good threshold,and the > raw margin is hard to understand. > when I am using SVM on > dataset(https://www.kaggle.com/c/avazu-ctr-prediction/data), train on the > first day's dataset(ignore field id/device_id/device_ip, all remaining fields > are concidered as categorical variable, and sparsed before SVM) and predict > on the same data with threshold cleared, the predict result are all > negative. I have to set threshold to -1 to get a reasonable confusion matrix. > So, I suggest to provide probability predict result in SVMModel as in > libSVM(Platt's binary SVM Probablistic Output) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6349) Add probability estimates in SVMModel predict result
[ https://issues.apache.org/jira/browse/SPARK-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363124#comment-14363124 ] Sean Owen commented on SPARK-6349: -- Yeah, the SVM model does not by nature compute a probability. Platt's method doesn't really make it compute a probability, but tacks on a logistic regression model to fit the label as a function of the margin. I think it's a little problematic but a fair bit better than nothing. However this doesn't have much to do with picking a threshold, right? that is more a function of what gives the best precision/recall for your problem, or whatever else matters for the output. You'd still have a similar problem even with a probability threshold to pick. > Add probability estimates in SVMModel predict result > > > Key: SPARK-6349 > URL: https://issues.apache.org/jira/browse/SPARK-6349 > Project: Spark > Issue Type: New Feature > Components: MLlib >Affects Versions: 1.2.1 >Reporter: tanyinyan > Original Estimate: 168h > Remaining Estimate: 168h > > In SVMModel, predictPoint method output raw margin(threshold not set) or 1/0 > label(threshold set). > when SVM are used as a classifier, it's hard to find a good threshold,and the > raw margin is hard to understand. > when I am using SVM on > dataset(https://www.kaggle.com/c/avazu-ctr-prediction/data), train on the > first day's dataset(ignore field id/device_id/device_ip, all remaining fields > are concidered as categorical variable, and sparsed before SVM) and predict > on the same data with threshold cleared, the predict result are all > negative. I have to set threshold to -1 to get a reasonable confusion matrix. > So, I suggest to provide probability predict result in SVMModel as in > libSVM(Platt's binary SVM Probablistic Output) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org