GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/18979
[SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn't yet visible ## What changes were proposed in this pull request? `BasicWriteTaskStatsTracker.getFileSize()` to catch `FileNotFoundException`, log @ info and then return 0 as a file size. This ensures that if a newly created file isn't visible due to the store not always having create consistency, the metric collection doesn't cause the failure. ## How was this patch tested? New test suite included, `BasicWriteTaskStatsTrackerSuite`. This not only checks the resilience to missing files, but verifies the existing logic as to how file statistics are gathered. Note that in the current implementation 1. if you call `Tracker..getFinalStats()` more than once, the file size count will increase by size of the last file. This could be fixed by clearing the filename field inside `getFinalStats()` itself. 2. If you pass in an empty or null string to `Tracker.newFile(path)` then IllegalArgumentException is raised, but only in `getFinalStats()`, rather than in `newFile`. There's a test for this behaviour in the new suite, as it verifies that only FNFEs get swallowed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/steveloughran/spark cloud/SPARK-21762-missing-files-in-metrics Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18979.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18979 ---- commit 8ad28b9bcd6a56b963ab57a5b4937d10f492de33 Author: Steve Loughran <ste...@hortonworks.com> Date: 2017-08-17T19:35:35Z SPARK-21762 handle FNFE events in BasicWriteStatsTracker; add a suite of tests for various file states. Change-Id: I3269cb901a38b33e399ebef10b2dbcd51ccf9b75 commit 2a113fde1653743a3543df8ada395f320b826a3e Author: Steve Loughran <ste...@hortonworks.com> Date: 2017-08-17T20:01:50Z SPARK-21762 add tests for "" and null filenames Change-Id: I38ac11c808849e2fd91f4931f4cb5cdfad43e2af ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org