[ https://issues.apache.org/jira/browse/HDFS-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14049734#comment-14049734 ]
Mickael Olivier commented on HDFS-6515: --------------------------------------- As said on https://issues.apache.org/jira/browse/HDFS-6608, the bug might be related to the hard-coded limit maxBytes = 65536 bytes, assigned at the begining of the TestFsDatasetCache.java file as follows : // Most Linux installs allow a default of 64KB locked memory private static final long CACHE_CAPACITY = 64 * 1024; conf.setLong(DFSConfigKeys.DFS_DATANODE_MAX_LOCKED_MEMORY_KEY, CACHE_CAPACITY); Then on FsDatasetCache we have this.maxBytes = dataset.datanode.getDnConf().getMaxLockedMemory(); This call gets back the value. So I actually tried to come with something like private static final long CACHE_CAPACITY = 16 * 64 * 1024; Forking I indeed retrieve in the logs maxBytes : 1048576\0A ! But the count value is now capped at 4096, which is weird. So I finally get verifyExpectedCacheUsage: have 20480/327680 bytes cached; 5/5 blocks cached. memlock limit = 1125899906842624. Waiting...\0A each time the supplier tries to check the cache is used as expected. Though it seems all 5 blocks are cached, the osPageSize on the size of the cache is still 4096. Which is what should be changed ! public long round(long count) { long newCount = (count + (osPageSize - 1)) / osPageSize; return newCount * osPageSize; } private final long osPageSize = NativeIO.POSIX.getCacheManipulator().getOperatingSystemPageSize(); That should give 65536 again so when reserving 512 bytes, we should have newCount = 1, returning 65536 bytes to reserve. Why is that not the case ? (First step is @@reserve:: count : 4096 | next : 4096 | maxBytes : 1048576\0A) > testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) > ----------------------------------------------------------------------------- > > Key: HDFS-6515 > URL: https://issues.apache.org/jira/browse/HDFS-6515 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 2.4.0 > Environment: Linux on PPC64 > Reporter: Tony Reix > Priority: Blocker > Labels: test > > I have an issue with test : > testPageRounder > (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) > on Linux/PowerPC. > On Linux/Intel, test runs fine. > On Linux/PowerPC, I have: > testPageRounder(org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache) > Time elapsed: 64.037 sec <<< ERROR! > java.lang.Exception: test timed out after 60000 milliseconds > Looking at details, I see that some "Failed to cache " messages appear in the > traces. Only 10 on Intel, but 186 on PPC64. > On PPC64, it looks like some thread is waiting for something that never > happens, generating a TimeOut. > I'm now using IBM JVM, however I've just checked that the issue also appears > with OpenJDK. > I'm now using Hadoop latest, however, the issue appeared within Hadoop 2.4.0 . > I need help for understanding what the test is doing, what traces are > expected, in order to understand what/where is the root cause. -- This message was sent by Atlassian JIRA (v6.2#6252)