[ https://issues.apache.org/jira/browse/SPARK-32053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17177073#comment-17177073 ]
Kayal commented on SPARK-32053: ------------------------------- The code to reproduce the issue on windows jupyter notebook: import pyspark #from pyspark.sql import SQLContext from pyspark import SparkContext sc = SparkContext("local", "First App") from pyspark.sql import SparkSession sess = SparkSession(sc) training = sess.createDataFrame([ ("0L", "a b c d e WML", 1.0), ("1L", "b d", 0.0), ("2L", "WML f g h", 1.0), ("3L", "hadoop mapreduce", 0.0)], ["id", "text", "label"]) evaluation = sess.createDataFrame([ ("4L", "a b c WML", 1.0), ("5L", "l m n o p", 0.0), ("6L", "WML g h i k", 1.0), ("7L", "apache hadoop zuzu", 0.0)], ["id", "text", "label"]) testing = sess.createDataFrame([ ("4L", "a b c z WML"), ("5L", "l m n"), ("6L", "WML g h i j k"), ("7L", "apache hadoop")], ["id", "text"]) import traceback from pyspark.ml.pipeline import Pipeline from pyspark.ml.classification import LogisticRegression from pyspark.ml.feature import HashingTF, Tokenizer from pyspark.sql import SQLContext as sql_context tokenizer = Tokenizer(inputCol="text", outputCol="words") hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features") lr = LogisticRegression(maxIter=10, regParam=0.01) stages=[tokenizer, hashingTF, lr] pipeline = Pipeline(stages=stages) model = pipeline.fit(training) test_result = model.transform(testing) pipeline.write().overwrite().save("tempfile") The write operation is failing with the error that I mentioned above. This is blocking our product delivery. could consider this with high priority blocker issue. Is there a work around for this ? sparkml is supported on windows pyspark ? I also noticed the same error with pipline.save() method. > pyspark save of serialized model is failing for windows. > -------------------------------------------------------- > > Key: SPARK-32053 > URL: https://issues.apache.org/jira/browse/SPARK-32053 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.3.0 > Reporter: Kayal > Priority: Major > Attachments: image-2020-06-22-18-19-32-236.png > > > {color:#172b4d}Hi, {color} > {color:#172b4d}We are using spark functionality to save the serialized model > to disk . On windows platform we are seeing save of the serialized model is > failing with the error: o288.save() failed. {color} > > > > !image-2020-06-22-18-19-32-236.png! > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org