Re: Spark MLlib: Should I call .cache before fitting a model?

2018-02-27 Thread Nick Pentreath
Currently, fit for many (most I think) models will cache the input data.
For LogisticRegression this is definitely the case, so you won't get any
benefit from caching it yourself.

On Tue, 27 Feb 2018 at 21:25 Gevorg Hari  wrote:

> Imagine that I am training a Spark MLlib model as follows:
>
> val traingData = loadTrainingData(...)val logisticRegression = new 
> LogisticRegression()
>
> traingData.cacheval logisticRegressionModel = 
> logisticRegression.fit(trainingData)
>
> Does the call traingData.cache improve performances at training time or
> is it not needed?
>
> Does the .fit(...) method for a ML algorithm call cache/unpersist
> internally?
>
>


Spark MLlib: Should I call .cache before fitting a model?

2018-02-27 Thread Gevorg Hari
Imagine that I am training a Spark MLlib model as follows:

val traingData = loadTrainingData(...)val logisticRegression = new
LogisticRegression()

traingData.cacheval logisticRegressionModel =
logisticRegression.fit(trainingData)

Does the call traingData.cache improve performances at training time or is
it not needed?

Does the .fit(...) method for a ML algorithm call cache/unpersist
internally?