Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23263#discussion_r240004006
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala ---
    @@ -65,7 +65,19 @@ abstract class Estimator[M <: Model[M]] extends 
PipelineStage {
        * Fits a model to the input data.
        */
       @Since("2.0.0")
    -  def fit(dataset: Dataset[_]): M
    +  def fit(dataset: Dataset[_]): M = MLEvents.withFitEvent(this, dataset) {
    +    fitImpl(dataset)
    +  }
    +
    +  /**
    +   * `fit()` handles events and then calls this method. Subclasses should 
override this
    +   * method to implement the actual fiting a model to the input data.
    +   */
    +  @Since("3.0.0")
    +  protected def fitImpl(dataset: Dataset[_]): M = {
    +    // Keep this default body for backward compatibility.
    +    throw new UnsupportedOperationException("fitImpl is not implemented.")
    --- End diff --
    
    Yes, that was my intention. I wanted to force to implement `fitImpl` but 
was thinking that might be too breaking change (it's going to at least break 
source compatibility). I am willing to follow other suggestions - I am pretty 
sure you or other guys are more familiar with ML side.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to