[ https://issues.apache.org/jira/browse/HDFS-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655128#comment-13655128 ]
Hadoop QA commented on HDFS-4817: --------------------------------- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12582755/HDFS-4817.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4382//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4382//console This message is automatically generated. > make HDFS advisory caching configurable on a per-file basis > ----------------------------------------------------------- > > Key: HDFS-4817 > URL: https://issues.apache.org/jira/browse/HDFS-4817 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client > Affects Versions: 3.0.0 > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Priority: Minor > Attachments: HDFS-4817.001.patch > > > HADOOP-7753 and related JIRAs introduced some performance optimizations for > the DataNode. One of them was readahead. When readahead is enabled, the > DataNode starts reading the next bytes it thinks it will need in the block > file, before the client requests them. This helps hide the latency of > rotational media and send larger reads down to the device. Another > optimization was "drop-behind." Using this optimization, we could remove > files from the Linux page cache after they were no longer needed. > Using {{dfs.datanode.drop.cache.behind.writes}} and > {{dfs.datanode.drop.cache.behind.reads}} can improve performance > substantially on many MapReduce jobs. In our internal benchmarks, we have > seen speedups of 40% on certain workloads. The reason is because if we know > the block data will not be read again any time soon, keeping it out of memory > allows more memory to be used by the other processes on the system. See > HADOOP-7714 for more benchmarks. > We would like to turn on these configurations on a per-file or per-client > basis, rather than on the DataNode as a whole. This will allow more users to > actually make use of them. It would also be good to add unit tests for the > drop-cache code path, to ensure that it is functioning as we expect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira