write API [spark]

via GitHub Sat, 13 Jul 2024 17:59:23 -0700


HyukjinKwon commented on code in PR #47341:
URL: https://github.com/apache/spark/pull/47341#discussion_r1676985526



##########
mllib/src/main/scala/org/apache/spark/ml/r/AFTSurvivalRegressionWrapper.scala:
##########
@@ -129,7 +129,9 @@ private[r] object AFTSurvivalRegressionWrapper extends 
MLReadable[AFTSurvivalReg
       val rMetadata = ("class" -> instance.getClass.getName) ~
         ("features" -> instance.features.toImmutableArraySeq)
       val rMetadataJson: String = compact(render(rMetadata))
-      sc.parallelize(Seq(rMetadataJson), 1).saveAsTextFile(rMetadataPath)
+      // Note that we should write single file. If there are more than one row

Review Comment:
   We had a discussion about this somewhere and ended up with not having this 
(because we want to hide the concept of partition in DataFrame in general. But 
thinking about this again, I think it's probably good to have. SparkR has it 
FWIW.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48883][ML][R] Replace RDD read / write API invocation with Dataframe read / write API [spark]

Reply via email to