Lewis, Many pipeline stages implement save/load methods, which can be used if you instantiate and call the underlying pipeline stages `transform` methods individually (instead of using the Pipeline.setStages API). See associated JIRAs <https://issues.apache.org/jira/browse/SPARK-4587>.
Pipeline persistence is on the 1.6 roadmap, JIRA here <https://issues.apache.org/jira/browse/SPARK-6725>. Feynman On Mon, Sep 14, 2015 at 9:20 PM, Jingchu Liu <liujing...@gmail.com> wrote: > Hi all, > > I have a question regarding the ability of ML pipeline to cache > intermediate results. I've posted this question on stackoverflow > <http://stackoverflow.com/questions/32561687/caching-intermediate-results-in-spark-ml-pipeline> > but got no answer, hope someone here can help me out. > > =========== > Lately I'm planning to migrate my standalone python ML code to spark. The > ML pipeline in spark.ml turns out quite handy, with streamlined API for > chaining up algorithm stages and hyper-parameter grid search. > > Still, I found its support for one important feature obscure in existing > documents: caching of intermediate results. The importance of this feature > arise when the pipeline involves computation intensive stages. > > For example, in my case I use a huge sparse matrix to perform multiple > moving averages on time series data in order to form input features. The > structure of the matrix is determined by some hyper-parameter. This step > turns out to be a bottleneck for the entire pipeline because I have to > construct the matrix in runtime. > > During parameter search, I usually have other parameters to examine in > addition to this "structure parameter". So if I can reuse the huge matrix > when the "structure parameter" is unchanged, I can save tons of time. For > this reason, I intentionally formed my code to cache and reuse these > intermediate results. > > So my question is: can Spark's ML pipeline handle intermediate caching > automatically? Or do I have to manually form code to do so? If so, is there > any best practice to learn from? > > P.S. I have looked into the official document and some other material, but > none of them seems to discuss this topic. > > > > Best, > Lewis >