Using MultipleOutputFormat and setting reducers to 0 causes
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException and job to fail
----------------------------------------------------------------------------------------------------------------------------------------
Key: HADOOP-5268
URL: https://issues.apache.org/jira/browse/HADOOP-5268
Project: Hadoop Core
Issue Type: Bug
Components: mapred
Affects Versions: 0.19.0
Reporter: Thibaut
Hi,
I'm trying to save the sorting step by only runnign the map phase (setting the
reducers to 0), but my job will fail then.
The job runs fine when the reduce phase is activated.
I'm using MultipleInputFormat and MultipleOutputFormat. Here is my outputformat
class, below is the exception.
public class MultipleSequenceFileOutputFormat<K extends WritableComparable, V
extends Writable> extends MultipleOutputFormat<K, V> {
private SequenceFileOutputFormat<K, V> sequencefileoutputformat = null;
private String uniqueprefix = "";
private boolean set = false;
private static Random r = new Random();
@Override
protected RecordWriter<K, V> getBaseRecordWriter(FileSystem fs, JobConf job,
String name, Progressable arg3) throws IOException {
if (sequencefileoutputformat == null) {
sequencefileoutputformat = new SequenceFileOutputFormat<K, V>();
}
return sequencefileoutputformat.getRecordWriter(fs, job, name, arg3);
}
@Override
protected String generateFileNameForKeyValue(K key, V value, String name) {
if (!set) {
synchronized (r) {
uniqueprefix = new Long(System.currentTimeMillis()).toString()
+ "_" + r.nextInt();
set = true;
}
}
return "prefix....." + uniqueprefix + "_" + name;
}
@Override
public void checkOutputSpecs(FileSystem fs, JobConf conf) {
}
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create
file ......1234809836818_-1723031414_part-00000 for
DFSClient_attempt_200902111714_0492_m_000000_0 on client 192.168.0.6 because
current leaseholder is trying to recreate file.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1052)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:995)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:301)
at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)
at org.apache.hadoop.ipc.Client.call(Client.java:696)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy1.create(Unknown Source)
at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.create(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:2587)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:454)
at
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:169)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:487)
at
org.apache.hadoop.io.SequenceFile$BlockCompressWriter.(SequenceFile.java:1198)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:401)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:354)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:427)
at
org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:57)
at
MultipleSequenceFileOutputFormat.getBaseRecordWriter(MultipleSequenceFileOutputFormat.java:33)
at
org.apache.hadoop.mapred.lib.MultipleOutputFormat$1.write(MultipleOutputFormat.java:99)
at
org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:385)
at ...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.