Ruifeng Zheng created SPARK-50812: ------------------------------------- Summary: [Connect] Support spark.ml on Connect Key: SPARK-50812 URL: https://issues.apache.org/jira/browse/SPARK-50812 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Ruifeng Zheng
Starting from Apache Spark 3.4, Spark has supported Connect which introduced a decoupled client-server architecture that allows remote connectivity to Spark clusters using the DataFrame API and unresolved logical plans as the protocol. The separation between client and server allows Spark and its open ecosystem to be leveraged from everywhere. It can be embedded in modern data applications, in IDEs, Notebooks and programming languages. However, Spark Connect currently only supports Spark SQL, which means Spark ML could not run the training/inference via Spark Connect. It will probably result in losing some ML users. So I would like to propose a way to support Spark ML on the Connect. Users don't need to change their code to leverage connect to run Spark ML cases. Here are some links, Design doc: [Support spark.ml on Connect|https://docs.google.com/document/d/1EUvSZuI-so83cxb_fTVMoz0vUfAaFmqXt39yoHI-D9I/edit?usp=sharing] Draft PR: [https://github.com/wbo4958/spark/pull/5] Example code, {code:python} spark = SparkSession.builder.remote("sc://localhost").getOrCreate() df = spark.createDataFrame([ (Vectors.dense([1.0, 2.0]), 1), (Vectors.dense([2.0, -1.0]), 1), (Vectors.dense([-3.0, -2.0]), 0), (Vectors.dense([-1.0, -2.0]), 0), ], schema=['features', 'label']) lr = LogisticRegression() lr.setMaxIter(30) model: LogisticRegressionModel = lr.fit(df) z = model.summary x = model.predictRaw(Vectors.dense([1.0, 2.0])) print(f"predictRaw {x}") assert model.getMaxIter() == 30 model.summary.roc.show() print(model.summary.weightedRecall) print(model.summary.recallByLabel) print(model.coefficients) print(model.intercept) model.transform(df).show() {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org