[ https://issues.apache.org/jira/browse/SPARK-50631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17907511#comment-17907511 ]
Costas Piliotis commented on SPARK-50631: ----------------------------------------- So a colleague of mine got to the source. This works: {code} x.write .format("org.apache.spark.sql.parquet") .mode("overwrite") .option("path","/tmp/sandbox/testoutput/") .save() {code} But this does not: {code} x.write .format("parquet") .mode("overwrite") .option("path","/tmp/sandbox/testoutput/") .save() //OR THE SHORTHAND: x.write .mode("overwrite") .parquet("/tmp/sandbox/testoutput/") {code} The .format method on DataFrameWriter.scala says it should be an out of the box supported format: {code} /** * Specifies the underlying output data source. Built-in options include "parquet", "json", etc. * * @since 1.4.0 */ def format(source: String): DataFrameWriter[T] = { this.source = source this } {code} > Local spark under scalatest hangs writing to local disk > ------------------------------------------------------- > > Key: SPARK-50631 > URL: https://issues.apache.org/jira/browse/SPARK-50631 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 4.0.0 > Reporter: Costas Piliotis > Priority: Minor > > This may be nothing. I apologize if it is. > Testing out spark 4.0.0 preview 2 and finding an issue with a local > integration test writing to local disk. > Env: > scala 2.13.15 > jdk 17.0.11 (Azul) > sbt 1.10.6 > I tried jdk 21 as well; no love either. > build.sbt dependencies: > {code:scala} > libraryDependencies ++= Seq( > "org.apache.spark" %% "spark-core" % sparkVersion % Provided, > "org.apache.spark" %% "spark-sql" % sparkVersion % Provided, > "org.scalatest" %% "scalatest" % scalatestVersion % Test > ) > {code} > Here's my test case: > {code:scala} > test("this is a test") { > val spark: SparkSession = SparkSession > .builder() > .master("local[*]") > .config("spark.executor.memory", "1g") // Adjust as needed > .config("spark.driver.memory", "1g") > .getOrCreate() > import spark.implicits._ > spark.sparkContext.setLogLevel("TRACE") > val x = List( > (1, 2L,"a", "2022-01-01"), > (1, 2L, "b", "2022-01-02"), > (1, 2L, "c", "2022-01-03") > ).toDF("a", "b", "c", "dt") > x.show(10) > x.write > .mode("append") > .csv("/tmp/sandbox") > } > {code} > It hangs. No job or stage in the UI. No activity in the console. TRACE > loglevel just gives me this: > {code:java} > {"ts":"2024-12-19T23:49:48.764Z","level":"TRACE","msg":"Checking for hosts > with no recent heartbeats in HeartbeatReceiver.","logger":"HeartbeatReceiver"} > {code} > I've confirmed the path is writeable and even at TRACE loglevel I can't seem > to get it to write and no clue why. > When I ctrl-c I get some spam afterwards from scalatest. > Maybe I'm daft, it's probable. Spark 3.5 and lower I've had no issues with > this, so it's quite a bit more likely that it's me. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org