GitHub user devaraj-kavali opened a pull request:

    https://github.com/apache/spark/pull/22752

    [SPARK-24787][CORE] Revert hsync in EventLoggingListener and make 
FsHistoryProvider to read lastBlockBeingWritten data for logs

    ## What changes were proposed in this pull request?
    
    `hsync` has been added as part of SPARK-19531 to get the latest data in the 
history sever ui, but that is causing the performance overhead and also leading 
to drop many history log events. `hsync` uses the force `FileChannel.force` to 
sync the data to the disk and happens for the data pipeline, it is costly 
operation and making the application to face overhead and drop the events.
    
    I think getting the latest data in history server can be done in different 
way (no impact to application while writing events), there is an api 
`DFSInputStream.getFileLength()` which gives the file length including the 
`lastBlockBeingWrittenLength`(different from `FileStatus.getLen()`), this api 
can be used when the file status length and previously cached length are equal 
to verify whether any new data has been written or not, if there is any update 
in data length then the history server can update the in progress history log. 
And also I made this change as configurable with the default value false, and 
can be enabled for history server if users want to see the updated data in ui.
    
    ## How was this patch tested?
    
    Added new test and verified manually, with the added conf 
`spark.history.fs.inProgressAbsoluteLengthCheck.enabled=true`, history server 
is reading the logs including the last block data which is being written and 
updating the Web UI with the latest data.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/devaraj-kavali/spark SPARK-24787

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22752.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22752
    
----
commit a3f53c41879e28d71d4dbd79d80a51e50d82ecee
Author: Devaraj K <devaraj@...>
Date:   2018-10-16T23:50:20Z

    [SPARK-24787][CORE] Revert hsync in EventLoggingListener and make
    FsHistoryProvider to read lastBlockBeingWritten data for logs

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to