Ok, thank you. 2016-04-27 11:37 GMT-03:00 Nick Pentreath <nick.pentre...@gmail.com>:
> You should find that the first set of fits are called on the training set, > and the resulting models evaluated on the validation set. The final best > model is then retrained on the entire dataset. This is standard practice - > usually the dataset passed to the train validation split is itself further > split into a training and test set, where the final best model is evaluated > against the test set. > > On Wed, 27 Apr 2016 at 14:30, Dirceu Semighini Filho < > dirceu.semigh...@gmail.com> wrote: > >> Hi guys, I was testing a pipeline here, and found a possible duplicated >> call to fit method into the >> org.apache.spark.ml.tuning.TrainValidationSplit >> <https://github.com/apache/spark/blob/18c2c92580bdc27aa5129d9e7abda418a3633ea6/mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala> >> class >> In line 110 there is a call to est.fit method that call fit in all >> parameter combinations that we have setup. >> Down in the line 128, after discovering which is the bestmodel, we call >> fit aggain using the bestIndex, wouldn't be better to just access the >> result of the already call fit method stored in the models val? >> >> Kind regards, >> Dirceu >> >