Currently, fit for many (most I think) models will cache the input data. For LogisticRegression this is definitely the case, so you won't get any benefit from caching it yourself.
On Tue, 27 Feb 2018 at 21:25 Gevorg Hari <gevorgh...@gmail.com> wrote: > Imagine that I am training a Spark MLlib model as follows: > > val traingData = loadTrainingData(...)val logisticRegression = new > LogisticRegression() > > traingData.cacheval logisticRegressionModel = > logisticRegression.fit(trainingData) > > Does the call traingData.cache improve performances at training time or > is it not needed? > > Does the .fit(...) method for a ML algorithm call cache/unpersist > internally? > >