[jira] [Comment Edited] (SPARK-17498) StringIndexer.setHandleInvalid sohuld have another option 'new'

2016-09-12 Thread miroslav Balaz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484129#comment-15484129
 ] 

miroslav Balaz edited comment on SPARK-17498 at 9/12/16 1:46 PM:
-

No I meant, that it should return 3 and 3 for "d" and "e", it corresponds to 
mapping unseen lables to one 'unknown' class. 

[~sowen] I see it like a problem that you have to ensure that training set 
contains all the lables that also test set, the assumption is that it will 
perform poorly if it does not contain the same labels but it would be good if 
it was possible to run it easily to see that. Or the performance might be good 
anyway.


was (Author: mirob):
No I meant, that it should return 3 and 3 for "d" and "e", it corresponds to 
mapping unseen lables to one 'unknown' class. 

> StringIndexer.setHandleInvalid sohuld have another option 'new'
> ---
>
> Key: SPARK-17498
> URL: https://issues.apache.org/jira/browse/SPARK-17498
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Miroslav Balaz
>
> That will map unseen label to maximum known label +1, IndexToString would map 
> that back to "" or NA if there is something like that in spark,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17498) StringIndexer.setHandleInvalid sohuld have another option 'new'

2016-09-12 Thread miroslav Balaz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484129#comment-15484129
 ] 

miroslav Balaz commented on SPARK-17498:


No I meant, that it should return 3 and 3 for "d" and "e", it corresponds to 
mapping unseen lables to one 'unknown' class. 

> StringIndexer.setHandleInvalid sohuld have another option 'new'
> ---
>
> Key: SPARK-17498
> URL: https://issues.apache.org/jira/browse/SPARK-17498
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Miroslav Balaz
>
> That will map unseen label to maximum known label +1, IndexToString would map 
> that back to "" or NA if there is something like that in spark,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17498) StringIndexer.setHandleInvalid sohuld have another option 'new'

2016-09-11 Thread Miroslav Balaz (JIRA)
Miroslav Balaz created SPARK-17498:
--

 Summary: StringIndexer.setHandleInvalid sohuld have another option 
'new'
 Key: SPARK-17498
 URL: https://issues.apache.org/jira/browse/SPARK-17498
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Miroslav Balaz


That will map unseen label to maximum known label +1, IndexToString would map 
that back to "" or NA if there is something like that in spark,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org