[jira] [Commented] (SPARK-11136) Warm-start support for ML estimator

2016-03-14 Thread Xusen Yin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193991#comment-15193991
 ] 

Xusen Yin commented on SPARK-11136:
---

I agree. Will add it in the new commit. Thanks!

> Warm-start support for ML estimator
> ---
>
> Key: SPARK-11136
> URL: https://issues.apache.org/jira/browse/SPARK-11136
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Xusen Yin
>Priority: Minor
>
> The current implementation of Estimator does not support warm-start fitting, 
> i.e. estimator.fit(data, params, partialModel). But first we need to add 
> warm-start for all ML estimators. This is an umbrella JIRA to add support for 
> the warm-start estimator. 
> Treat model as a special parameter, passing it through ParamMap. e.g. val 
> partialModel: Param[Option[M]] = new Param(...). In the case of model 
> existing, we use it to warm-start, else we start the training process from 
> the beginning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11136) Warm-start support for ML estimator

2016-03-14 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193916#comment-15193916
 ] 

Nick Pentreath commented on SPARK-11136:


I would say the initial model params should take precedence over defaults
in the general case. The most common use of initial model is to warm start
training given new data. Hence usually the same model params would be
trained (excepting cross-validated pipeline etc).

Of course it can be slightly modified for each algorithm depending on the
details.

The user-defined params should definitely take precedence over the others.




> Warm-start support for ML estimator
> ---
>
> Key: SPARK-11136
> URL: https://issues.apache.org/jira/browse/SPARK-11136
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Xusen Yin
>Priority: Minor
>
> The current implementation of Estimator does not support warm-start fitting, 
> i.e. estimator.fit(data, params, partialModel). But first we need to add 
> warm-start for all ML estimators. This is an umbrella JIRA to add support for 
> the warm-start estimator. 
> Treat model as a special parameter, passing it through ParamMap. e.g. val 
> partialModel: Param[Option[M]] = new Param(...). In the case of model 
> existing, we use it to warm-start, else we start the training process from 
> the beginning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11136) Warm-start support for ML estimator

2016-03-14 Thread Xusen Yin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193886#comment-15193886
 ] 

Xusen Yin commented on SPARK-11136:
---

This is a good point. Actually in our settings now, the new KMeans only uses 
the model itself (i.e. the array of cluster centers) without its parameters. 
E.g.

{code}
if (isSet(initialModel)) {
  require($(initialModel).parentModel.clusterCenters.length == $(k), 
"mismatched cluster count")
  require(rdd.first().size == $(initialModel).clusterCenters.head.size, 
"mismatched dimension")
  algo.setInitialModel($(initialModel).parentModel)
}
{code}

But I think you're right. We should also extend the parameters in some 
scenarios. IMHO, the parameter overriding order should be (initialModel 
parameter < default parameter < user-set parameter). What do you think about it?

> Warm-start support for ML estimator
> ---
>
> Key: SPARK-11136
> URL: https://issues.apache.org/jira/browse/SPARK-11136
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Xusen Yin
>Priority: Minor
>
> The current implementation of Estimator does not support warm-start fitting, 
> i.e. estimator.fit(data, params, partialModel). But first we need to add 
> warm-start for all ML estimators. This is an umbrella JIRA to add support for 
> the warm-start estimator. 
> Treat model as a special parameter, passing it through ParamMap. e.g. val 
> partialModel: Param[Option[M]] = new Param(...). In the case of model 
> existing, we use it to warm-start, else we start the training process from 
> the beginning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11136) Warm-start support for ML estimator

2016-03-14 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192975#comment-15192975
 ] 

Nick Pentreath commented on SPARK-11136:


A question about the API design - it seems to me that it would be good to have 
the initial model (if it exists) set up the default params. e.g.
{code}
val model1 = new KMeans()
  .setK(10)
  .setInitSteps(5)
  .setTol(1e-3)
  .setInitMode("random")
  .fit(dataset)

val model2 = new KMeans()
  .setInitialModel(model1)
  .fit(dataset)
{code}
Here {{model2}} automatically is trained with the same {{k}}, {{tol}} and 
{{initMode}} as {{model1}} - but in this case the {{initSteps}} would be 
overridden to {{1}}. If the user wants to adjust those then they can of course 
set the params. Thoughts?


> Warm-start support for ML estimator
> ---
>
> Key: SPARK-11136
> URL: https://issues.apache.org/jira/browse/SPARK-11136
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Xusen Yin
>Priority: Minor
>
> The current implementation of Estimator does not support warm-start fitting, 
> i.e. estimator.fit(data, params, partialModel). But first we need to add 
> warm-start for all ML estimators. This is an umbrella JIRA to add support for 
> the warm-start estimator. 
> Treat model as a special parameter, passing it through ParamMap. e.g. val 
> partialModel: Param[Option[M]] = new Param(...). In the case of model 
> existing, we use it to warm-start, else we start the training process from 
> the beginning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11136) Warm-start support for ML estimator

2015-12-11 Thread Xusen Yin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052718#comment-15052718
 ] 

Xusen Yin commented on SPARK-11136:
---

I add a [design 
doc|https://docs.google.com/document/d/1LSRQDXOepVsOsCRT_PFwuiS9qmbgCzEskVPKXdqHoX0/edit?usp=sharing]
 here so that we can talk about different implementations easily.

> Warm-start support for ML estimator
> ---
>
> Key: SPARK-11136
> URL: https://issues.apache.org/jira/browse/SPARK-11136
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Xusen Yin
>Priority: Minor
>
> The current implementation of Estimator does not support warm-start fitting, 
> i.e. estimator.fit(data, params, partialModel). But first we need to add 
> warm-start for all ML estimators. This is an umbrella JIRA to add support for 
> the warm-start estimator. 
> Treat model as a special parameter, passing it through ParamMap. e.g. val 
> partialModel: Param[Option[M]] = new Param(...). In the case of model 
> existing, we use it to warm-start, else we start the training process from 
> the beginning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11136) Warm-start support for ML estimator

2015-10-15 Thread Xusen Yin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960116#comment-14960116
 ] 

Xusen Yin commented on SPARK-11136:
---

Sure. And I will add more subtasks on this JIRA to indicate other possible 
warm-start estimators.

> Warm-start support for ML estimator
> ---
>
> Key: SPARK-11136
> URL: https://issues.apache.org/jira/browse/SPARK-11136
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Xusen Yin
>Priority: Minor
>
> The current implementation of Estimator does not support warm-start fitting, 
> i.e. estimator.fit(data, params, partialModel). But first we need to add 
> warm-start for all ML estimators. This is an umbrella JIRA to add support for 
> the warm-start estimator. 
> Possible solutions:
> 1. Add warm-start fitting interface like def fit(dataset: DataFrame, 
> initModel: M, paramMap: ParamMap): M
> 2. Treat model as a special parameter, passing it through ParamMap. e.g. val 
> partialModel: Param[Option[M]] = new Param(...). In the case of model 
> existing, we use it to warm-start, else we start the training process from 
> the beginning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11136) Warm-start support for ML estimator

2015-10-15 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960073#comment-14960073
 ] 

Joseph K. Bradley commented on SPARK-11136:
---

We should definitely have it be a Param.  I just comment on the KMeans JIRA 
about that.  Thanks for pointing out that issue.  Would you mind updating this 
JIRA's description to specify that as the chosen option?

> Warm-start support for ML estimator
> ---
>
> Key: SPARK-11136
> URL: https://issues.apache.org/jira/browse/SPARK-11136
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Xusen Yin
>Priority: Minor
>
> The current implementation of Estimator does not support warm-start fitting, 
> i.e. estimator.fit(data, params, partialModel). But first we need to add 
> warm-start for all ML estimators. This is an umbrella JIRA to add support for 
> the warm-start estimator. 
> Possible solutions:
> 1. Add warm-start fitting interface like def fit(dataset: DataFrame, 
> initModel: M, paramMap: ParamMap): M
> 2. Treat model as a special parameter, passing it through ParamMap. e.g. val 
> partialModel: Param[Option[M]] = new Param(...). In the case of model 
> existing, we use it to warm-start, else we start the training process from 
> the beginning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11136) Warm-start support for ML estimator

2015-10-15 Thread Xusen Yin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960063#comment-14960063
 ] 

Xusen Yin commented on SPARK-11136:
---

I have already linked all related issues. [~josephkb] Which kind of methods of 
supporting warm-start do you prefer? Or other feasible suggestions? In 
[~jayants]'s code of KMeans warm-start we can see the 3rd implementation.

> Warm-start support for ML estimator
> ---
>
> Key: SPARK-11136
> URL: https://issues.apache.org/jira/browse/SPARK-11136
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Xusen Yin
>Priority: Minor
>
> The current implementation of Estimator does not support warm-start fitting, 
> i.e. estimator.fit(data, params, partialModel). But first we need to add 
> warm-start for all ML estimators. This is an umbrella JIRA to add support for 
> the warm-start estimator. 
> Possible solutions:
> 1. Add warm-start fitting interface like def fit(dataset: DataFrame, 
> initModel: M, paramMap: ParamMap): M
> 2. Treat model as a special parameter, passing it through ParamMap. e.g. val 
> partialModel: Param[Option[M]] = new Param(...). In the case of model 
> existing, we use it to warm-start, else we start the training process from 
> the beginning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org