Hi,

I have a job that benefits many mappers, but the output of each of these 
mappers needs no further work and can be outputed directly to the HDFS as 
sequence files. I've set up a job to do this in java, specifying my mapper and 
setting reducers to 0 using:

job.setNumReduceTasks(0);

The mapper i have written works correctly when run locally through eclipse. 
However, when i submit my job to my hadoop cluster using:

hadoop jar <some memory increase arguments> my.jar

I am finding some problems. The following exception is thrown whenever i emit 
from one of my map tasks using the command:

context.write(key, new BytesWritable(baos.toByteArray()));
org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
/data/quantised_features/ascii-sift-ukbench/_temporary/_attempt_201010211037_0140_m_000000_0/part-m-00000
 File does not exist. Holder DFSClient_attempt_201010211037_0140_m_000000_0 
does not have any open files.
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1378)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1369)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1290)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:469)
        at sun.reflect.GeneratedMethodAccessor549.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)

        at org.apache.hadoop.ipc.Client.call(Client.java:817)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
        at $Proxy1.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at $Proxy1.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3000)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2881)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2139)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2329)
This seemed quite strange in itself. I proceeded to do some testing. At the 
setup() phase of the mapper, i can confirm that the output directory does not 
exist on the HDFS using:
Path destination = FileOutputFormat.getWorkOutputPath(context);
destination.getFileSystem(context.getConfiguration()).exists(destination)

Therefore i create the the output directory (for test purposes) at the setup 
phase using the following command:

destination.getFileSystem(context.getConfiguration()).mkdirs(destination);

The output location then does exist, but only until the end of the setup() 
call. When the map() function is reached the output directory is gone again!

If i set my number of reducers to 1 (without setting the reducer class, i.e. 
using the default reducer), this job works absolutely fine. The issue arises 
only with 0 reducers.

Can anyone help shine some light on this problem?

Thanks

- Sina

Reply via email to