[ https://issues.apache.org/jira/browse/SPARK-33739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-33739: ------------------------------------ Assignee: Apache Spark > Jobs committed through the S3A Magic committer don't report the bytes written > ----------------------------------------------------------------------------- > > Key: SPARK-33739 > URL: https://issues.apache.org/jira/browse/SPARK-33739 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.1 > Reporter: Steve Loughran > Assignee: Apache Spark > Priority: Minor > > The spark statistics tracking doesn't correctly assess the size of the > uploaded files as it only calls getFileStatus on the zero byte objects -not > the yet-to-manifest files. Which, given they don't exist yet, isn't easy to > do. > HADOOP-17414 will attach the final length as a custom header to the marker > object, and implement getXAttr in the S3A FS to probe for it. > BasicWriteStatsTracker can probe for this custom Xattr if the size of the > generated file is 0 bytes; if found and parseable use that as the declared > length of the output. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org