Persisting PySpark ML Pipelines that include custom Transformers

Nicholas Chammas Fri, 19 Aug 2016 11:29:59 -0700

I understand persistence for PySpark ML pipelines is already present in
2.0, and further improvements are being made for 2.1 (e.g. SPARK-13786
<https://issues.apache.org/jira/browse/SPARK-13786>).


I’m having trouble, though, persisting a pipeline that includes a custom
Transformer (see SPARK-17025
<https://issues.apache.org/jira/browse/SPARK-17025>). It appears that there
is a magic _to_java() method that I need to implement.

Is the intention that developers implementing custom Transformers would
also specify how it should be persisted, or are there ideas about how to
make this automatic? I searched on JIRA but I’m not sure if I missed an
issue that already addresses this problem.

Nick

Persisting PySpark ML Pipelines that include custom Transformers

Reply via email to