[jira] [Commented] (SPARK-11136) Warm-start support for ML estimator
[ https://issues.apache.org/jira/browse/SPARK-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193991#comment-15193991 ] Xusen Yin commented on SPARK-11136: --- I agree. Will add it in the new commit. Thanks! > Warm-start support for ML estimator > --- > > Key: SPARK-11136 > URL: https://issues.apache.org/jira/browse/SPARK-11136 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Xusen Yin >Priority: Minor > > The current implementation of Estimator does not support warm-start fitting, > i.e. estimator.fit(data, params, partialModel). But first we need to add > warm-start for all ML estimators. This is an umbrella JIRA to add support for > the warm-start estimator. > Treat model as a special parameter, passing it through ParamMap. e.g. val > partialModel: Param[Option[M]] = new Param(...). In the case of model > existing, we use it to warm-start, else we start the training process from > the beginning. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11136) Warm-start support for ML estimator
[ https://issues.apache.org/jira/browse/SPARK-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193916#comment-15193916 ] Nick Pentreath commented on SPARK-11136: I would say the initial model params should take precedence over defaults in the general case. The most common use of initial model is to warm start training given new data. Hence usually the same model params would be trained (excepting cross-validated pipeline etc). Of course it can be slightly modified for each algorithm depending on the details. The user-defined params should definitely take precedence over the others. > Warm-start support for ML estimator > --- > > Key: SPARK-11136 > URL: https://issues.apache.org/jira/browse/SPARK-11136 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Xusen Yin >Priority: Minor > > The current implementation of Estimator does not support warm-start fitting, > i.e. estimator.fit(data, params, partialModel). But first we need to add > warm-start for all ML estimators. This is an umbrella JIRA to add support for > the warm-start estimator. > Treat model as a special parameter, passing it through ParamMap. e.g. val > partialModel: Param[Option[M]] = new Param(...). In the case of model > existing, we use it to warm-start, else we start the training process from > the beginning. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11136) Warm-start support for ML estimator
[ https://issues.apache.org/jira/browse/SPARK-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193886#comment-15193886 ] Xusen Yin commented on SPARK-11136: --- This is a good point. Actually in our settings now, the new KMeans only uses the model itself (i.e. the array of cluster centers) without its parameters. E.g. {code} if (isSet(initialModel)) { require($(initialModel).parentModel.clusterCenters.length == $(k), "mismatched cluster count") require(rdd.first().size == $(initialModel).clusterCenters.head.size, "mismatched dimension") algo.setInitialModel($(initialModel).parentModel) } {code} But I think you're right. We should also extend the parameters in some scenarios. IMHO, the parameter overriding order should be (initialModel parameter < default parameter < user-set parameter). What do you think about it? > Warm-start support for ML estimator > --- > > Key: SPARK-11136 > URL: https://issues.apache.org/jira/browse/SPARK-11136 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Xusen Yin >Priority: Minor > > The current implementation of Estimator does not support warm-start fitting, > i.e. estimator.fit(data, params, partialModel). But first we need to add > warm-start for all ML estimators. This is an umbrella JIRA to add support for > the warm-start estimator. > Treat model as a special parameter, passing it through ParamMap. e.g. val > partialModel: Param[Option[M]] = new Param(...). In the case of model > existing, we use it to warm-start, else we start the training process from > the beginning. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11136) Warm-start support for ML estimator
[ https://issues.apache.org/jira/browse/SPARK-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192975#comment-15192975 ] Nick Pentreath commented on SPARK-11136: A question about the API design - it seems to me that it would be good to have the initial model (if it exists) set up the default params. e.g. {code} val model1 = new KMeans() .setK(10) .setInitSteps(5) .setTol(1e-3) .setInitMode("random") .fit(dataset) val model2 = new KMeans() .setInitialModel(model1) .fit(dataset) {code} Here {{model2}} automatically is trained with the same {{k}}, {{tol}} and {{initMode}} as {{model1}} - but in this case the {{initSteps}} would be overridden to {{1}}. If the user wants to adjust those then they can of course set the params. Thoughts? > Warm-start support for ML estimator > --- > > Key: SPARK-11136 > URL: https://issues.apache.org/jira/browse/SPARK-11136 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Xusen Yin >Priority: Minor > > The current implementation of Estimator does not support warm-start fitting, > i.e. estimator.fit(data, params, partialModel). But first we need to add > warm-start for all ML estimators. This is an umbrella JIRA to add support for > the warm-start estimator. > Treat model as a special parameter, passing it through ParamMap. e.g. val > partialModel: Param[Option[M]] = new Param(...). In the case of model > existing, we use it to warm-start, else we start the training process from > the beginning. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11136) Warm-start support for ML estimator
[ https://issues.apache.org/jira/browse/SPARK-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052718#comment-15052718 ] Xusen Yin commented on SPARK-11136: --- I add a [design doc|https://docs.google.com/document/d/1LSRQDXOepVsOsCRT_PFwuiS9qmbgCzEskVPKXdqHoX0/edit?usp=sharing] here so that we can talk about different implementations easily. > Warm-start support for ML estimator > --- > > Key: SPARK-11136 > URL: https://issues.apache.org/jira/browse/SPARK-11136 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Xusen Yin >Priority: Minor > > The current implementation of Estimator does not support warm-start fitting, > i.e. estimator.fit(data, params, partialModel). But first we need to add > warm-start for all ML estimators. This is an umbrella JIRA to add support for > the warm-start estimator. > Treat model as a special parameter, passing it through ParamMap. e.g. val > partialModel: Param[Option[M]] = new Param(...). In the case of model > existing, we use it to warm-start, else we start the training process from > the beginning. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11136) Warm-start support for ML estimator
[ https://issues.apache.org/jira/browse/SPARK-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960116#comment-14960116 ] Xusen Yin commented on SPARK-11136: --- Sure. And I will add more subtasks on this JIRA to indicate other possible warm-start estimators. > Warm-start support for ML estimator > --- > > Key: SPARK-11136 > URL: https://issues.apache.org/jira/browse/SPARK-11136 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Xusen Yin >Priority: Minor > > The current implementation of Estimator does not support warm-start fitting, > i.e. estimator.fit(data, params, partialModel). But first we need to add > warm-start for all ML estimators. This is an umbrella JIRA to add support for > the warm-start estimator. > Possible solutions: > 1. Add warm-start fitting interface like def fit(dataset: DataFrame, > initModel: M, paramMap: ParamMap): M > 2. Treat model as a special parameter, passing it through ParamMap. e.g. val > partialModel: Param[Option[M]] = new Param(...). In the case of model > existing, we use it to warm-start, else we start the training process from > the beginning. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11136) Warm-start support for ML estimator
[ https://issues.apache.org/jira/browse/SPARK-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960073#comment-14960073 ] Joseph K. Bradley commented on SPARK-11136: --- We should definitely have it be a Param. I just comment on the KMeans JIRA about that. Thanks for pointing out that issue. Would you mind updating this JIRA's description to specify that as the chosen option? > Warm-start support for ML estimator > --- > > Key: SPARK-11136 > URL: https://issues.apache.org/jira/browse/SPARK-11136 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Xusen Yin >Priority: Minor > > The current implementation of Estimator does not support warm-start fitting, > i.e. estimator.fit(data, params, partialModel). But first we need to add > warm-start for all ML estimators. This is an umbrella JIRA to add support for > the warm-start estimator. > Possible solutions: > 1. Add warm-start fitting interface like def fit(dataset: DataFrame, > initModel: M, paramMap: ParamMap): M > 2. Treat model as a special parameter, passing it through ParamMap. e.g. val > partialModel: Param[Option[M]] = new Param(...). In the case of model > existing, we use it to warm-start, else we start the training process from > the beginning. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11136) Warm-start support for ML estimator
[ https://issues.apache.org/jira/browse/SPARK-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960063#comment-14960063 ] Xusen Yin commented on SPARK-11136: --- I have already linked all related issues. [~josephkb] Which kind of methods of supporting warm-start do you prefer? Or other feasible suggestions? In [~jayants]'s code of KMeans warm-start we can see the 3rd implementation. > Warm-start support for ML estimator > --- > > Key: SPARK-11136 > URL: https://issues.apache.org/jira/browse/SPARK-11136 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Xusen Yin >Priority: Minor > > The current implementation of Estimator does not support warm-start fitting, > i.e. estimator.fit(data, params, partialModel). But first we need to add > warm-start for all ML estimators. This is an umbrella JIRA to add support for > the warm-start estimator. > Possible solutions: > 1. Add warm-start fitting interface like def fit(dataset: DataFrame, > initModel: M, paramMap: ParamMap): M > 2. Treat model as a special parameter, passing it through ParamMap. e.g. val > partialModel: Param[Option[M]] = new Param(...). In the case of model > existing, we use it to warm-start, else we start the training process from > the beginning. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org