ml Pipeline read write

Koert Kuipers Fri, 10 May 2019 13:31:37 -0700

i am trying to understand how ml persists pipelines. it seems a
SparkSession or SparkContext is needed for this, to write to hdfs.


MLWriter and MLReader both extend BaseReadWrite to have access to a
SparkSession. but this is where it gets confusing... the only way to set
the SparkSession seems to be in BaseReadWrite:

def session(sparkSession: SparkSession): this.type

and i can find no place this is actually used, except for in one unit test:
org.apache.spark.ml.util.JavaDefaultReadWriteSuite

i confirmed it is not used by simply adding a line inside that method that
throws an error, and all unit tests pass except for
JavaDefaultReadWriteSuite.

how is the sparkSession set?
thanks!

koert

ml Pipeline read write

Reply via email to