[jira] [Commented] (SPARK-19957) Inconsist KMeans initialization mode behavior between ML and MLlib
[ https://issues.apache.org/jira/browse/SPARK-19957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15926404#comment-15926404 ] yuhao yang commented on SPARK-19957: Thanks for the response. > Inconsist KMeans initialization mode behavior between ML and MLlib > -- > > Key: SPARK-19957 > URL: https://issues.apache.org/jira/browse/SPARK-19957 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.1.0 >Reporter: yuhao yang >Priority: Minor > > when users set the initialization mode to "random", KMeans in ML and MLlib > has inconsistent behavior for multiple runs: > MLlib will basically use new Random for each run. > ML Kmeans however will use the default random seed, which is > {code}this.getClass.getName.hashCode.toLong{code}, and keep using the same > number among multiple fitting. > I would expect the "random" initialization mode to be literally random. > There're different solutions with different scope of impact. Adjusting the > hasSeed trait may have a broader impact(but maybe worth discussion). We can > always just set random default seed in KMeans. > Appreciate your feedback. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19957) Inconsist KMeans initialization mode behavior between ML and MLlib
[ https://issues.apache.org/jira/browse/SPARK-19957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925753#comment-15925753 ] Sean Owen commented on SPARK-19957: --- Yeah I think this might be "working as intended". > Inconsist KMeans initialization mode behavior between ML and MLlib > -- > > Key: SPARK-19957 > URL: https://issues.apache.org/jira/browse/SPARK-19957 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.1.0 >Reporter: yuhao yang >Priority: Minor > > when users set the initialization mode to "random", KMeans in ML and MLlib > has inconsistent behavior for multiple runs: > MLlib will basically use new Random for each run. > ML Kmeans however will use the default random seed, which is > {code}this.getClass.getName.hashCode.toLong{code}, and keep using the same > number among multiple fitting. > I would expect the "random" initialization mode to be literally random. > There're different solutions with different scope of impact. Adjusting the > hasSeed trait may have a broader impact(but maybe worth discussion). We can > always just set random default seed in KMeans. > Appreciate your feedback. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19957) Inconsist KMeans initialization mode behavior between ML and MLlib
[ https://issues.apache.org/jira/browse/SPARK-19957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925723#comment-15925723 ] Nick Pentreath commented on SPARK-19957: See https://issues.apache.org/jira/browse/SPARK-16832 > Inconsist KMeans initialization mode behavior between ML and MLlib > -- > > Key: SPARK-19957 > URL: https://issues.apache.org/jira/browse/SPARK-19957 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.1.0 >Reporter: yuhao yang >Priority: Minor > > when users set the initialization mode to "random", KMeans in ML and MLlib > has inconsistent behavior for multiple runs: > MLlib will basically use new Random for each run. > ML Kmeans however will use the default random seed, which is > {code}this.getClass.getName.hashCode.toLong{code}, and keep using the same > number among multiple fitting. > I would expect the "random" initialization mode to be literally random. > There're different solutions with different scope of impact. Adjusting the > hasSeed trait may have a broader impact(but maybe worth discussion). We can > always just set random default seed in KMeans. > Appreciate your feedback. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org