[ https://issues.apache.org/jira/browse/SPARK-44884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867743#comment-17867743 ]
Steve Loughran commented on SPARK-44884: ---------------------------------------- FWIW The new manifest committer, written for performance on abfs and correctness + performance on gcs generates the exact same JSON file as the s3a committers, and can be executed against local file:// URLs as well as hdfs. If the base hadoop version spark uses includes this committer (MAPREDUCE-7341} then you could write a test to verify the copied file is JSON; the class org.apache.hadoop.mapreduce.lib.output.committer.manifest.files.ManifestSuccessData will actually load the manifest and let you access and print its internals bq. Also, ensure that when "spark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs"=”false”, the _SUCCESS marker file will not be created by the Hadoop output committers in stagingDir itself. that's in the hadoop mapreduce codebase, should all be good there -but tests are welcome. > Spark doesn't create SUCCESS file in Spark 3.3.0+ when partitionOverwriteMode > is dynamic > ---------------------------------------------------------------------------------------- > > Key: SPARK-44884 > URL: https://issues.apache.org/jira/browse/SPARK-44884 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.3.0 > Reporter: Dipayan Dev > Priority: Major > Labels: pull-request-available > Attachments: image-2023-08-20-18-46-53-342.png, > image-2023-08-25-13-01-42-137.png > > > The issue is not happening in Spark 2.x (I am using 2.4.0), but only in 3.3.0 > (tested with 3.4.1 as well) > Code to reproduce the issue > > {code:java} > scala> spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic") > scala> val DF = Seq(("test1", 123)).toDF("name", "num") > scala> DF.write.option("path", > "gs://test_bucket/table").mode("overwrite").partitionBy("num").format("orc").saveAsTable("test_schema.test_tb1") > {code} > > The above code succeeds and creates external Hive table, but {*}there is no > SUCCESS file generated{*}. > Adding the content of the bucket after table creation > !image-2023-08-25-13-01-42-137.png|width=500,height=130! > The same code when running with spark 2.4.0 (with or without external path), > generates the SUCCESS file. > {code:java} > scala> > DF.write.mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("test_schema.test_tb1"){code} > !image-2023-08-20-18-46-53-342.png|width=465,height=166! > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org