[ 
https://issues.apache.org/jira/browse/SPARK-50631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17907511#comment-17907511
 ] 

Costas Piliotis commented on SPARK-50631:
-----------------------------------------

So a colleague of mine got to the source.

This works:
{code}
    x.write
      .format("org.apache.spark.sql.parquet")
      .mode("overwrite")
      .option("path","/tmp/sandbox/testoutput/")
      .save()
{code}


But this does not:
{code}
 x.write
      .format("parquet")
      .mode("overwrite")
      .option("path","/tmp/sandbox/testoutput/")
      .save()

//OR THE SHORTHAND:
 x.write
      .mode("overwrite")
      .parquet("/tmp/sandbox/testoutput/")

{code}


The .format method on DataFrameWriter.scala says it should be an out of the box 
supported format:

{code}
  /**
   * Specifies the underlying output data source. Built-in options include 
"parquet", "json", etc.
   *
   * @since 1.4.0
   */
  def format(source: String): DataFrameWriter[T] = {
    this.source = source
    this
  }
{code}



> Local spark under scalatest hangs writing to local disk
> -------------------------------------------------------
>
>                 Key: SPARK-50631
>                 URL: https://issues.apache.org/jira/browse/SPARK-50631
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 4.0.0
>            Reporter: Costas Piliotis
>            Priority: Minor
>
> This may be nothing. I apologize if it is.
> Testing out spark 4.0.0 preview 2 and finding an issue with a local 
> integration test writing to local disk.
> Env:
> scala 2.13.15
> jdk 17.0.11 (Azul) 
> sbt 1.10.6
> I tried jdk 21 as well; no love either.
> build.sbt dependencies:
> {code:scala}
> libraryDependencies ++= Seq(
>   "org.apache.spark" %% "spark-core" % sparkVersion % Provided,
>   "org.apache.spark" %% "spark-sql" % sparkVersion  % Provided,
>   "org.scalatest" %% "scalatest" % scalatestVersion % Test
> )
> {code}
> Here's my test case:
> {code:scala}
> test("this is a test") {
>     val spark: SparkSession = SparkSession
>         .builder()
>         .master("local[*]")
>         .config("spark.executor.memory", "1g") // Adjust as needed
>         .config("spark.driver.memory", "1g")
>         .getOrCreate()
>     import spark.implicits._
>     spark.sparkContext.setLogLevel("TRACE")
>     val x = List(
>       (1, 2L,"a", "2022-01-01"),
>       (1, 2L, "b", "2022-01-02"),
>       (1, 2L, "c", "2022-01-03")
>     ).toDF("a", "b", "c", "dt")
>     x.show(10)
>     x.write
>       .mode("append")
>       .csv("/tmp/sandbox")
>   }
> {code}
> It hangs. No job or stage in the UI. No activity in the console. TRACE 
> loglevel just gives me this:
> {code:java}
> {"ts":"2024-12-19T23:49:48.764Z","level":"TRACE","msg":"Checking for hosts 
> with no recent heartbeats in HeartbeatReceiver.","logger":"HeartbeatReceiver"}
> {code}
> I've confirmed the path is writeable and even at TRACE loglevel I can't seem 
> to get it to write and no clue why.
> When I ctrl-c I get some spam afterwards from scalatest.
> Maybe I'm daft, it's probable. Spark 3.5 and lower I've had no issues with 
> this, so it's quite a bit more likely that it's me.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to