Thanks for your reply. If I increase the number of computers, can we solve this problem of running out of file descriptors?
On Wed, Feb 25, 2009 at 11:07 AM, jason hadoop <jason.had...@gmail.com> wrote: > My 1st guess is that your application is running out of file > descriptors,possibly because your MultipleOutputFormat instance is opening > more output files than you expect. > Opening lots of files in HDFS is generally a quick route to bad job > performance if not job failure. > > On Tue, Feb 24, 2009 at 6:58 PM, ma qiang <maqiang1...@gmail.com> wrote: > >> Hi all, >> I have one class extends MultipleOutputFormat as below, >> >> public class MyMultipleTextOutputFormat<K, V> extends >> MultipleOutputFormat<K, V> { >> private TextOutputFormat<K, V> theTextOutputFormat = null; >> >> �...@override >> protected RecordWriter<K, V> getBaseRecordWriter(FileSystem fs, >> JobConf job, String name, Progressable arg3) throws >> IOException { >> if (theTextOutputFormat == null) { >> theTextOutputFormat = new TextOutputFormat<K, V>(); >> } >> return theTextOutputFormat.getRecordWriter(fs, job, name, >> arg3); >> } >> �...@override >> protected String generateFileNameForKeyValue(K key, V value, String >> name) { >> return name + "_" + key.toString(); >> } >> } >> >> >> also conf.setOutputFormat(MultipleTextOutputFormat2.class) in my job >> configuration. but when the program run, error print as follow: >> >> 09/02/25 10:22:32 INFO mapred.JobClient: Task Id : >> attempt_200902250959_0002_r_000001_0, Status : FAILED >> java.io.IOException: Could not read from stream >> at >> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:119) >> at java.io.DataInputStream.readByte(DataInputStream.java:248) >> at >> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:325) >> at >> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:346) >> at org.apache.hadoop.io.Text.readString(Text.java:400) >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2779) >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2704) >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997) >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) >> >> 09/02/25 10:22:42 INFO mapred.JobClient: map 100% reduce 69% >> 09/02/25 10:22:55 INFO mapred.JobClient: map 100% reduce 0% >> 09/02/25 10:22:55 INFO mapred.JobClient: Task Id : >> attempt_200902250959_0002_r_000000_1, Status : FAILED >> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File >> >> /user/qiang/output/_temporary/_attempt_200902250959_0002_r_000000_1/part-00000_t0x5y3 >> could only be replicated to 0 nodes, instead of 1 >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270) >> at >> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351) >> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892) >> at org.apache.hadoop.ipc.Client.call(Client.java:696) >> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) >> at $Proxy1.addBlock(Unknown Source) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) >> at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) >> at $Proxy1.addBlock(Unknown Source) >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2815) >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2697) >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997) >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) >> >> >> Of course the program run successfully without MyMultipleOutputFormat. >> who can help me solve this problem? >> Thanks. >> >> yours, Qiang >> >