[jira] [Commented] (SPARK-2341) loadLibSVMFile doesn't handle regression datasets

Sean Owen (JIRA) Fri, 18 Jul 2014 04:27:16 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066283#comment-14066283
 ]


Sean Owen commented on SPARK-2341:
----------------------------------

[~mengxr] Here is an example of changing the argument:
https://github.com/srowen/spark/commit/4a584ff9c0ada3d035d4668ecf22ec0e65ed16b6

I won't open a PR yet. I think this is a better API at this point, but the 
question is more whether the weight of deprecated methods are worth it or not. 
Another data point to keep in mind regarding how APIs can evolve.

> loadLibSVMFile doesn't handle regression datasets
> -------------------------------------------------
>
>                 Key: SPARK-2341
>                 URL: https://issues.apache.org/jira/browse/SPARK-2341
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.0.0
>            Reporter: Eustache
>            Priority: Minor
>              Labels: easyfix
>
> Many datasets exist in LibSVM format for regression tasks [1] but currently 
> the loadLibSVMFile primitive doesn't handle regression datasets.
> More precisely, the LabelParser is either a MulticlassLabelParser or a 
> BinaryLabelParser. What happens then is that the file is loaded but in 
> multiclass mode : each target value is interpreted as a class name !
> The fix would be to write a RegressionLabelParser which converts target 
> values to Double and plug it into the loadLibSVMFile routine.
> [1] http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2341) loadLibSVMFile doesn't handle regression datasets

Reply via email to