Re: Spark MLlib: Should I call .cache before fitting a model?

2018-02-27 Thread Nick Pentreath
Currently, fit for many (most I think) models will cache the input data. For LogisticRegression this is definitely the case, so you won't get any benefit from caching it yourself. On Tue, 27 Feb 2018 at 21:25 Gevorg Hari wrote: > Imagine that I am training a Spark MLlib

Spark MLlib: Should I call .cache before fitting a model?

2018-02-27 Thread Gevorg Hari
Imagine that I am training a Spark MLlib model as follows: val traingData = loadTrainingData(...)val logisticRegression = new LogisticRegression() traingData.cacheval logisticRegressionModel = logisticRegression.fit(trainingData) Does the call traingData.cache improve performances at training