[ 
https://issues.apache.org/jira/browse/SPARK-15957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang updated SPARK-15957:
--------------------------------
    Description: 
RFormula will index label only when it is string type currently. If the label 
is numeric type and we use RFormula to present a classification model, there is 
no label attributes in label column metadata. The label attributes are useful 
when making prediction for classification, so we can force to index label by 
{{StringIndexer}} whether it is numeric or string type for classification. Then 
SparkR wrappers can extract label attributes from label column metadata 
successfully. This feature can help us to fix bug similar with SPARK-15153.
For regression, we will still to keep label as numeric type.
In this PR, we add a param indexLabel to control whether to force to index 
label for RFormula.

  was:
RFormula will index label only when it is string type. If the label is numeric 
type and we use RFormula to present a classification model, we can not extract 
label attributes from the label column metadata successfully. The label 
attributes are useful when make prediction for classification, so we can force 
to index label by {{StringIndexer}} whether it is numeric or string type for 
classification. Then SparkR wrappers can extract label attributes from the 
column metadata successfully. This feature can help us to fix bug similar with 
SPARK-15153.
For regression, we will still to keep label as numeric type.
We should add a param to control whether to force to index label for RFormula.


> RFormula supports forcing to index label
> ----------------------------------------
>
>                 Key: SPARK-15957
>                 URL: https://issues.apache.org/jira/browse/SPARK-15957
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Yanbo Liang
>            Assignee: Yanbo Liang
>
> RFormula will index label only when it is string type currently. If the label 
> is numeric type and we use RFormula to present a classification model, there 
> is no label attributes in label column metadata. The label attributes are 
> useful when making prediction for classification, so we can force to index 
> label by {{StringIndexer}} whether it is numeric or string type for 
> classification. Then SparkR wrappers can extract label attributes from label 
> column metadata successfully. This feature can help us to fix bug similar 
> with SPARK-15153.
> For regression, we will still to keep label as numeric type.
> In this PR, we add a param indexLabel to control whether to force to index 
> label for RFormula.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to