GitHub user maropu opened a pull request:

    https://github.com/apache/spark/pull/22324

    [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in FileScanRDD

    ## What changes were proposed in this pull request?
    This pr removed the method `updateBytesReadWithFileSize` in `FileScanRDD` 
because it computes input metrics by file size supported in Hadoop 2.5 and 
earlier. The current Spark does not support the versions, so it causes wrong 
input metric numbers.
    
    This is rework from #22232.
    
    Closes #22232
    
    ## How was this patch tested?
    Added `FileSourceSuite` to tests this case.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maropu/spark pr22232-2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22324.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22324
    
----
commit 0f75257b50a611e069d406da8d72225bb4e73b51
Author: dujunling <dujunling@...>
Date:   2018-08-25T06:20:35Z

    remove updateBytesReadWithFileSize because we use Hadoop FileSystem 
statistics to update the inputMetrics

commit 53dd42c1facebf97044afb22b1f0894ec209f3bb
Author: dujunling <dujunling@...>
Date:   2018-08-27T03:26:30Z

    add ut

commit 1c326466fbd24c432184be6e53afec93369970c1
Author: dujunling <dujunling@...>
Date:   2018-08-27T03:33:46Z

    ut

commit 510d729b0ed6f83b05a3b0f06c2631163d62ef1a
Author: Takeshi Yamamuro <yamamuro@...>
Date:   2018-09-04T01:47:59Z

    fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to