Spark MLlib: Should I call .cache before fitting a model?

2018-02-27 Thread Gevorg Hari
Imagine that I am training a Spark MLlib model as follows:

val traingData = loadTrainingData(...)val logisticRegression = new
LogisticRegression()

traingData.cacheval logisticRegressionModel =
logisticRegression.fit(trainingData)

Does the call traingData.cache improve performances at training time or is
it not needed?

Does the .fit(...) method for a ML algorithm call cache/unpersist
internally?


Spark MLlib Question - Online Scoring of PipelineModel

2018-01-05 Thread Gevorg Hari
Is Spark planning to support *online scoring* (without any Spark
dependencies) of a PipelineModel trained offline? Not being able to do so
is a huge barrier to entry for using Spark in production at my company...

For online support, I found this https://github.com/combust/mleap
Any feedback on production use of *MLeap*? Will this ever be integrated
into the main Spark project? When?

Thanks a lot!