GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/22324
[SPARK-25237][SQL] Remove updateBytesReadWithFileSize in FileScanRDD ## What changes were proposed in this pull request? This pr removed the method `updateBytesReadWithFileSize` in `FileScanRDD` because it computes input metrics by file size supported in Hadoop 2.5 and earlier. The current Spark does not support the versions, so it causes wrong input metric numbers. This is rework from #22232. Closes #22232 ## How was this patch tested? Added `FileSourceSuite` to tests this case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark pr22232-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22324.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22324 ---- commit 0f75257b50a611e069d406da8d72225bb4e73b51 Author: dujunling <dujunling@...> Date: 2018-08-25T06:20:35Z remove updateBytesReadWithFileSize because we use Hadoop FileSystem statistics to update the inputMetrics commit 53dd42c1facebf97044afb22b1f0894ec209f3bb Author: dujunling <dujunling@...> Date: 2018-08-27T03:26:30Z add ut commit 1c326466fbd24c432184be6e53afec93369970c1 Author: dujunling <dujunling@...> Date: 2018-08-27T03:33:46Z ut commit 510d729b0ed6f83b05a3b0f06c2631163d62ef1a Author: Takeshi Yamamuro <yamamuro@...> Date: 2018-09-04T01:47:59Z fix ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org