Spark MLlib: Should I call .cache before fitting a model?
Imagine that I am training a Spark MLlib model as follows: val traingData = loadTrainingData(...)val logisticRegression = new LogisticRegression() traingData.cacheval logisticRegressionModel = logisticRegression.fit(trainingData) Does the call traingData.cache improve performances at training time or is it not needed? Does the .fit(...) method for a ML algorithm call cache/unpersist internally?
Spark MLlib Question - Online Scoring of PipelineModel
Is Spark planning to support *online scoring* (without any Spark dependencies) of a PipelineModel trained offline? Not being able to do so is a huge barrier to entry for using Spark in production at my company... For online support, I found this https://github.com/combust/mleap Any feedback on production use of *MLeap*? Will this ever be integrated into the main Spark project? When? Thanks a lot!