[jira] [Commented] (SPARK-33739) Jobs committed through the S3A Magic committer don't report the bytes written
[ https://issues.apache.org/jira/browse/SPARK-33739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247497#comment-17247497 ] Apache Spark commented on SPARK-33739: -- User 'steveloughran' has created a pull request for this issue: https://github.com/apache/spark/pull/30714 > Jobs committed through the S3A Magic committer don't report the bytes written > - > > Key: SPARK-33739 > URL: https://issues.apache.org/jira/browse/SPARK-33739 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Steve Loughran >Priority: Minor > > The spark statistics tracking doesn't correctly assess the size of the > uploaded files as it only calls getFileStatus on the zero byte objects -not > the yet-to-manifest files. Which, given they don't exist yet, isn't easy to > do. > HADOOP-17414 will attach the final length as a custom header to the marker > object, and implement getXAttr in the S3A FS to probe for it. > BasicWriteStatsTracker can probe for this custom Xattr if the size of the > generated file is 0 bytes; if found and parseable use that as the declared > length of the output. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33739) Jobs committed through the S3A Magic committer don't report the bytes written
[ https://issues.apache.org/jira/browse/SPARK-33739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247496#comment-17247496 ] Apache Spark commented on SPARK-33739: -- User 'steveloughran' has created a pull request for this issue: https://github.com/apache/spark/pull/30714 > Jobs committed through the S3A Magic committer don't report the bytes written > - > > Key: SPARK-33739 > URL: https://issues.apache.org/jira/browse/SPARK-33739 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Steve Loughran >Priority: Minor > > The spark statistics tracking doesn't correctly assess the size of the > uploaded files as it only calls getFileStatus on the zero byte objects -not > the yet-to-manifest files. Which, given they don't exist yet, isn't easy to > do. > HADOOP-17414 will attach the final length as a custom header to the marker > object, and implement getXAttr in the S3A FS to probe for it. > BasicWriteStatsTracker can probe for this custom Xattr if the size of the > generated file is 0 bytes; if found and parseable use that as the declared > length of the output. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org