[ https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813397#comment-13813397 ]
Colin Patrick McCabe commented on HDFS-5394: -------------------------------------------- Looks like the issue is an environment issue on the build machines. They only have 64k of available mlock space: {code} 2013-11-04 20:34:51,387 WARN impl.FsDatasetCache (FsDatasetCache.java:run(329)) - Failed to cache block 1073741842 in BP-1183768563-67.195.138.24-1383597287811 ENOMEM: Cannot allocate memory at org.apache.hadoop.io.nativeio.NativeIO$POSIX.mlock_native(Native Method) at org.apache.hadoop.io.nativeio.NativeIO$POSIX.mlock(NativeIO.java:255) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MappableBlock$PosixMlocker.mlock(MappableBlock.java:54) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MappableBlock.load(MappableBlock.java:99) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetCache$CachingTask.run(FsDatasetCache.java:321) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2013-11-04 20:34:51,388 WARN impl.FsDatasetCache (FsDatasetCache.java:run(329)) - Failed to cache block 1073741841 in BP-1183768563-67.195.138.24-1383597287811 ENOMEM: Cannot allocate memory at org.apache.hadoop.io.nativeio.NativeIO$POSIX.mlock_native(Native Method) at org.apache.hadoop.io.nativeio.NativeIO$POSIX.mlock(NativeIO.java:255) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MappableBlock$PosixMlocker.mlock(MappableBlock.java:54) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MappableBlock.load(MappableBlock.java:99) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetCache$CachingTask.run(FsDatasetCache.java:321) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2013-11-04 20:34:51,543 INFO datanode.TestFsDatasetCache (TestFsDatasetCache.java:get(190)) - {code} and then later... {code} verifyExpectedCacheUsage: expected 65535, got 60074; memlock limit = 65536. Waiting... {code} I'm not sure why we can't seem to get up to 65535, considering the ulimit is supposed to be just higher than that. I'll see if I can reproduce locally. > fix race conditions in DN caching and uncaching > ----------------------------------------------- > > Key: HDFS-5394 > URL: https://issues.apache.org/jira/browse/HDFS-5394 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode > Affects Versions: 3.0.0 > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Attachments: HDFS-5394-caching.001.patch, > HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, > HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch > > > The DN needs to handle situations where it is asked to cache the same replica > more than once. (Currently, it can actually do two mmaps and mlocks.) It > also needs to handle the situation where caching a replica is cancelled > before said caching completes. -- This message was sent by Atlassian JIRA (v6.1#6144)