scalability issue with filecache in large clusters
--------------------------------------------------

                 Key: HADOOP-1182
                 URL: https://issues.apache.org/jira/browse/HADOOP-1182
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.12.1
            Reporter: Christian Kunz


When using filecache to distribute supporting files for map/reduce applications 
in a 1000 node cluster, many map tasks fail  because of timeouts. There was no 
such problem using a 200 node cluster for the same applications with comparable 
input data. Either the whole job fails because of too many map failures, or 
even worse, some map tasks hang indefinitely.


java.net.SocketTimeoutException: timed out waiting for rpc response
        at org.apache.hadoop.ipc.Client.call(Client.java:473)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:163)
        at org.apache.hadoop.dfs.$Proxy1.exists(Unknown Source)
        at org.apache.hadoop.dfs.DFSClient.exists(DFSClient.java:320)
        at 
org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.exists(DistributedFileSystem.java:170)
        at 
org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.open(DistributedFileSystem.java:125)
        at 
org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.(ChecksumFileSystem.java:110)
        at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:330)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:245)
        at 
org.apache.hadoop.filecache.DistributedCache.createMD5(DistributedCache.java:327)
        at 
org.apache.hadoop.filecache.DistributedCache.ifExistsAndFresh(DistributedCache.java:253)
        at 
org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:169)
        at 
org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:86)
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:117)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to