Re: Duplicated fit into TrainValidationSplit

Dirceu Semighini Filho Wed, 27 Apr 2016 08:02:39 -0700

Ok, thank you.

2016-04-27 11:37 GMT-03:00 Nick Pentreath <nick.pentre...@gmail.com>:


> You should find that the first set of fits are called on the training set,
> and the resulting models evaluated on the validation set. The final best
> model is then retrained on the entire dataset. This is standard practice -
> usually the dataset passed to the train validation split is itself further
> split into a training and test set, where the final best model is evaluated
> against the test set.
>
> On Wed, 27 Apr 2016 at 14:30, Dirceu Semighini Filho <
> dirceu.semigh...@gmail.com> wrote:
>
>> Hi guys, I was testing a pipeline here, and found a possible duplicated
>> call to fit method into the
>> org.apache.spark.ml.tuning.TrainValidationSplit
>> <https://github.com/apache/spark/blob/18c2c92580bdc27aa5129d9e7abda418a3633ea6/mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala>
>> class
>> In line 110 there is a call to est.fit method that call fit in all
>> parameter combinations that we have setup.
>> Down in the line 128, after discovering which is the bestmodel, we call
>> fit aggain using the bestIndex, wouldn't be better to just access the
>> result of the already call fit method stored in the models val?
>>
>> Kind regards,
>> Dirceu
>>
>

Re: Duplicated fit into TrainValidationSplit

Reply via email to