[jira] [Comment Edited] (SPARK-17498) StringIndexer.setHandleInvalid sohuld have another option 'new'
[ https://issues.apache.org/jira/browse/SPARK-17498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484129#comment-15484129 ] miroslav Balaz edited comment on SPARK-17498 at 9/12/16 1:46 PM: - No I meant, that it should return 3 and 3 for "d" and "e", it corresponds to mapping unseen lables to one 'unknown' class. [~sowen] I see it like a problem that you have to ensure that training set contains all the lables that also test set, the assumption is that it will perform poorly if it does not contain the same labels but it would be good if it was possible to run it easily to see that. Or the performance might be good anyway. was (Author: mirob): No I meant, that it should return 3 and 3 for "d" and "e", it corresponds to mapping unseen lables to one 'unknown' class. > StringIndexer.setHandleInvalid sohuld have another option 'new' > --- > > Key: SPARK-17498 > URL: https://issues.apache.org/jira/browse/SPARK-17498 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Miroslav Balaz > > That will map unseen label to maximum known label +1, IndexToString would map > that back to "" or NA if there is something like that in spark, -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17498) StringIndexer.setHandleInvalid sohuld have another option 'new'
[ https://issues.apache.org/jira/browse/SPARK-17498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484129#comment-15484129 ] miroslav Balaz commented on SPARK-17498: No I meant, that it should return 3 and 3 for "d" and "e", it corresponds to mapping unseen lables to one 'unknown' class. > StringIndexer.setHandleInvalid sohuld have another option 'new' > --- > > Key: SPARK-17498 > URL: https://issues.apache.org/jira/browse/SPARK-17498 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Miroslav Balaz > > That will map unseen label to maximum known label +1, IndexToString would map > that back to "" or NA if there is something like that in spark, -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17498) StringIndexer.setHandleInvalid sohuld have another option 'new'
Miroslav Balaz created SPARK-17498: -- Summary: StringIndexer.setHandleInvalid sohuld have another option 'new' Key: SPARK-17498 URL: https://issues.apache.org/jira/browse/SPARK-17498 Project: Spark Issue Type: Improvement Components: ML Reporter: Miroslav Balaz That will map unseen label to maximum known label +1, IndexToString would map that back to "" or NA if there is something like that in spark, -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org