I think you are finding the ability of prediction on single instance. It's a feature on the development, please refer SPARK-10413.
2015-12-10 4:37 GMT+08:00 Eugene Morozov <evgeny.a.moro...@gmail.com>: > Hello, > > I'm using RandomForest pipeline (ml package). Everything is working fine > (learning models, prediction, etc), but I'd like to tune it for the case, > when I predict with small dataset. > My issue is that when I apply > > (PipelineModel)model.transform(dataset) > > The model consists of the following stages: > > StringIndexerModel labelIndexer = new StringIndexer()... > RandomForestClassifier classifier = new RandomForestClassifier()... > IndexToString labelConverter = new IndexToString()... > Pipeline pipeline = new Pipeline().setStages(new > PipelineStage[]{labelIndexer, classifier, labelConverter}); > > it obviously takes some time to predict, but when my dataset consists of > just 1 (record) I'd expect it to be really fast. > > My observations are even though I use small dataset Spark broadcasts > something over and over again. That's fine, when I load my (serialized) > model from disk and use it just once for prediction, but when I use the > same model in a loop for the same! dataset, I'd say that everything should > already be on a worker nodes, thus I'd expect prediction to be fast. > It takes 20 seconds to predict dataset once (with one input row) and all > subsequent predictions over the same dataset with the same model takes > roughly 10 seconds. > My goal is to have 0.5 - 1 second response. > > My intention was to keep learned model on a driver (that's stay online > with created SparkContext) to use it for any subsequent predictions, but > these 10 seconds predictions basically kill the whole idea. > > Is it possible somehow to distribute the model over the cluster upfront so > that the prediction is really fast? > Are there any specific params to apply to the PipelineModel to stay > resident on a worker nodes? Anything to keep and reuse broadcasted data? > > Thanks in advance. > -- > Be well! > Jean Morozov >