i am trying to understand how ml persists pipelines. it seems a
SparkSession or SparkContext is needed for this, to write to hdfs.

MLWriter and MLReader both extend BaseReadWrite to have access to a
SparkSession. but this is where it gets confusing... the only way to set
the SparkSession seems to be in BaseReadWrite:

def session(sparkSession: SparkSession): this.type

and i can find no place this is actually used, except for in one unit test:
org.apache.spark.ml.util.JavaDefaultReadWriteSuite

i confirmed it is not used by simply adding a line inside that method that
throws an error, and all unit tests pass except for
JavaDefaultReadWriteSuite.

how is the sparkSession set?
thanks!

koert

Reply via email to