[jira] [Commented] (SPARK-14623) add label binarizer

2016-04-16 Thread hujiayin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244061#comment-15244061
 ] 

hujiayin commented on SPARK-14623:
--

Hi Joseph, I think it is similar as the combination of StringIndexer + 
OneHotEncoder into one class but the difference is the LabelBinarizer will 
collect the same element into one vector and will remember the position of the 
element in the input. 

For example, 
Input is "yellow,green,red,green,0"
Label Binarizer retrieves the labels from input and the labels are "0, green, 
red, yellow"
Output is
0, 0, 0, 1
0, 1, 0, 0
0, 0, 1, 0
0, 1, 0, 0
1, 0 ,0, 0
The second column reflects element "green" appears at positions 1 and 3 in the 
input. The 4 columns reflect the 4 labels. Column 0 represents label 0 and 
column 1 is label "green", so on. If I understand correctly, StringIndexer 
returns the category number of a label and OneHotEncoder returns the binary 
representation of the category number.

> add label binarizer 
> 
>
> Key: SPARK-14623
> URL: https://issues.apache.org/jira/browse/SPARK-14623
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: hujiayin
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> It relates to https://issues.apache.org/jira/browse/SPARK-7445
> Map the labels to 0/1. 
> For example,
> Input:
> "yellow,green,red,green,0"
> The labels: "0, green, red, yellow"
> Output:
> 0, 0, 0, 1
> 0, 1, 0, 0
> 0, 0, 1, 0
> 0, 1, 0, 0
> 1, 0 ,0, 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14623) add label binarizer

2016-04-15 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243621#comment-15243621
 ] 

Joseph K. Bradley commented on SPARK-14623:
---

[~hujiayin] Thanks for this.  However, this looks like it duplicates the 
functionality of StringIndexer + OneHotEncoder.  How is this different, other 
than putting them into 1 class?

> add label binarizer 
> 
>
> Key: SPARK-14623
> URL: https://issues.apache.org/jira/browse/SPARK-14623
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: hujiayin
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> It relates to https://issues.apache.org/jira/browse/SPARK-7445
> Map the labels to 0/1. 
> For example,
> Input:
> "yellow,green,red,green,0"
> The labels: "0, green, red, yellow"
> Output:
> 0, 0, 0, 1
> 0, 1, 0, 0
> 0, 0, 1, 0
> 0, 1, 0, 0
> 1, 0 ,0, 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14623) add label binarizer

2016-04-13 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240543#comment-15240543
 ] 

Apache Spark commented on SPARK-14623:
--

User 'hujy' has created a pull request for this issue:
https://github.com/apache/spark/pull/12380

> add label binarizer 
> 
>
> Key: SPARK-14623
> URL: https://issues.apache.org/jira/browse/SPARK-14623
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 1.6.1
>Reporter: hujiayin
>Priority: Minor
> Fix For: 2.0.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> It relates to https://issues.apache.org/jira/browse/SPARK-7445
> Map the labels to 0/1. 
> For example,
> Input:
> "yellow,green,red,green,0"
> The labels: "0, green, red, yellow"
> Output:
> 0, 0, 0, 0, 1, 
> 0, 1, 0, 1, 0, 
> 0, 0, 1, 0, 0, 
> 1, 0, 0, 0, 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org