Lewis,

Many pipeline stages implement save/load methods, which can be used if you
instantiate and call the underlying pipeline stages `transform` methods
individually (instead of using the Pipeline.setStages API). See associated
JIRAs <https://issues.apache.org/jira/browse/SPARK-4587>.

Pipeline persistence is on the 1.6 roadmap, JIRA here
<https://issues.apache.org/jira/browse/SPARK-6725>.

Feynman

On Mon, Sep 14, 2015 at 9:20 PM, Jingchu Liu <liujing...@gmail.com> wrote:

> Hi all,
>
> I have a question regarding the ability of ML pipeline to cache
> intermediate results. I've posted this question on stackoverflow
> <http://stackoverflow.com/questions/32561687/caching-intermediate-results-in-spark-ml-pipeline>
> but got no answer, hope someone here can help me out.
>
> ===========
> Lately I'm planning to migrate my standalone python ML code to spark. The
> ML pipeline in spark.ml turns out quite handy, with streamlined API for
> chaining up algorithm stages and hyper-parameter grid search.
>
> Still, I found its support for one important feature obscure in existing
> documents: caching of intermediate results. The importance of this feature
> arise when the pipeline involves computation intensive stages.
>
> For example, in my case I use a huge sparse matrix to perform multiple
> moving averages on time series data in order to form input features. The
> structure of the matrix is determined by some hyper-parameter. This step
> turns out to be a bottleneck for the entire pipeline because I have to
> construct the matrix in runtime.
>
> During parameter search, I usually have other parameters to examine in
> addition to this "structure parameter". So if I can reuse the huge matrix
> when the "structure parameter" is unchanged, I can save tons of time. For
> this reason, I intentionally formed my code to cache and reuse these
> intermediate results.
>
> So my question is: can Spark's ML pipeline handle intermediate caching
> automatically? Or do I have to manually form code to do so? If so, is there
> any best practice to learn from?
>
> P.S. I have looked into the official document and some other material, but
> none of them seems to discuss this topic.
>
>
>
> Best,
> Lewis
>

Reply via email to