Re: why print this error when using MultipleOutputFormat?

ma qiang Tue, 24 Feb 2009 20:28:28 -0800

Thanks for your reply.
If I increase the number of computers, can we solve this problem of
running out of file descriptors?





On Wed, Feb 25, 2009 at 11:07 AM, jason hadoop <jason.had...@gmail.com> wrote:
> My 1st guess is that your application is running out of file
> descriptors,possibly because your MultipleOutputFormat  instance is opening
> more output files than you expect.
> Opening lots of files in HDFS is generally a quick route to bad job
> performance if not job failure.
>
> On Tue, Feb 24, 2009 at 6:58 PM, ma qiang <maqiang1...@gmail.com> wrote:
>
>> Hi all,
>>   I have one class extends MultipleOutputFormat as below,
>>
>>      public class MyMultipleTextOutputFormat<K, V> extends
>> MultipleOutputFormat<K, V> {
>>        private TextOutputFormat<K, V> theTextOutputFormat = null;
>>
>>       �...@override
>>        protected RecordWriter<K, V> getBaseRecordWriter(FileSystem fs,
>>                        JobConf job, String name, Progressable arg3) throws
>> IOException {
>>                if (theTextOutputFormat == null) {
>>                        theTextOutputFormat = new TextOutputFormat<K, V>();
>>                }
>>                return theTextOutputFormat.getRecordWriter(fs, job, name,
>> arg3);
>>        }
>>       �...@override
>>        protected String generateFileNameForKeyValue(K key, V value, String
>> name) {
>>                return name + "_" + key.toString();
>>        }
>> }
>>
>>
>> also conf.setOutputFormat(MultipleTextOutputFormat2.class) in my job
>> configuration. but when the program run, error print as follow:
>>
>> 09/02/25 10:22:32 INFO mapred.JobClient: Task Id :
>> attempt_200902250959_0002_r_000001_0, Status : FAILED
>> java.io.IOException: Could not read from stream
>>        at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:119)
>>        at java.io.DataInputStream.readByte(DataInputStream.java:248)
>>        at
>> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:325)
>>        at
>> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:346)
>>        at org.apache.hadoop.io.Text.readString(Text.java:400)
>>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2779)
>>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2704)
>>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
>>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
>>
>> 09/02/25 10:22:42 INFO mapred.JobClient:  map 100% reduce 69%
>> 09/02/25 10:22:55 INFO mapred.JobClient:  map 100% reduce 0%
>> 09/02/25 10:22:55 INFO mapred.JobClient: Task Id :
>> attempt_200902250959_0002_r_000000_1, Status : FAILED
>> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
>>
>> /user/qiang/output/_temporary/_attempt_200902250959_0002_r_000000_1/part-00000_t0x5y3
>> could only be replicated to 0 nodes, instead of 1
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
>>        at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)
>>        at org.apache.hadoop.ipc.Client.call(Client.java:696)
>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>>        at $Proxy1.addBlock(Unknown Source)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>>        at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>>        at $Proxy1.addBlock(Unknown Source)
>>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2815)
>>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2697)
>>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
>>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
>>
>>
>> Of course the program run successfully without MyMultipleOutputFormat.
>> who can help me solve this problem?
>> Thanks.
>>
>> yours,    Qiang
>>
>

Re: why print this error when using MultipleOutputFormat?

Reply via email to