Re: why print this error when using MultipleOutputFormat?

Rasit OZDAS Wed, 25 Feb 2009 04:29:21 -0800

Qiang,
I couldn't find now which one, but there is a JIRA issue about
MultipleTextOutputFormat (especially when reducers = 0).
If you have no reducers, you can try having one or two, then you can see if
your problem is related with this one.


Cheers,
Rasit

2009/2/25 ma qiang <maqiang1...@gmail.com>

> Thanks for your reply.
> If I increase the number of computers, can we solve this problem of
> running out of file descriptors?
>
>
>
>
> On Wed, Feb 25, 2009 at 11:07 AM, jason hadoop <jason.had...@gmail.com>
> wrote:
> > My 1st guess is that your application is running out of file
> > descriptors,possibly because your MultipleOutputFormat  instance is
> opening
> > more output files than you expect.
> > Opening lots of files in HDFS is generally a quick route to bad job
> > performance if not job failure.
> >
> > On Tue, Feb 24, 2009 at 6:58 PM, ma qiang <maqiang1...@gmail.com> wrote:
> >
> >> Hi all,
> >>   I have one class extends MultipleOutputFormat as below,
> >>
> >>      public class MyMultipleTextOutputFormat<K, V> extends
> >> MultipleOutputFormat<K, V> {
> >>        private TextOutputFormat<K, V> theTextOutputFormat = null;
> >>
> >>        @Override
> >>        protected RecordWriter<K, V> getBaseRecordWriter(FileSystem fs,
> >>                        JobConf job, String name, Progressable arg3)
> throws
> >> IOException {
> >>                if (theTextOutputFormat == null) {
> >>                        theTextOutputFormat = new TextOutputFormat<K,
> V>();
> >>                }
> >>                return theTextOutputFormat.getRecordWriter(fs, job, name,
> >> arg3);
> >>        }
> >>        @Override
> >>        protected String generateFileNameForKeyValue(K key, V value,
> String
> >> name) {
> >>                return name + "_" + key.toString();
> >>        }
> >> }
> >>
> >>
> >> also conf.setOutputFormat(MultipleTextOutputFormat2.class) in my job
> >> configuration. but when the program run, error print as follow:
> >>
> >> 09/02/25 10:22:32 INFO mapred.JobClient: Task Id :
> >> attempt_200902250959_0002_r_000001_0, Status : FAILED
> >> java.io.IOException: Could not read from stream
> >>        at
> >> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:119)
> >>        at java.io.DataInputStream.readByte(DataInputStream.java:248)
> >>        at
> >> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:325)
> >>        at
> >> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:346)
> >>        at org.apache.hadoop.io.Text.readString(Text.java:400)
> >>        at
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2779)
> >>        at
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2704)
> >>        at
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
> >>        at
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
> >>
> >> 09/02/25 10:22:42 INFO mapred.JobClient:  map 100% reduce 69%
> >> 09/02/25 10:22:55 INFO mapred.JobClient:  map 100% reduce 0%
> >> 09/02/25 10:22:55 INFO mapred.JobClient: Task Id :
> >> attempt_200902250959_0002_r_000000_1, Status : FAILED
> >> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> >>
> >>
> /user/qiang/output/_temporary/_attempt_200902250959_0002_r_000000_1/part-00000_t0x5y3
> >> could only be replicated to 0 nodes, instead of 1
> >>        at
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270)
> >>        at
> >>
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
> >>        at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> >>        at
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>        at java.lang.reflect.Method.invoke(Method.java:597)
> >>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
> >>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)
> >>        at org.apache.hadoop.ipc.Client.call(Client.java:696)
> >>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> >>        at $Proxy1.addBlock(Unknown Source)
> >>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>        at
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>        at
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>        at java.lang.reflect.Method.invoke(Method.java:597)
> >>        at
> >>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> >>        at
> >>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> >>        at $Proxy1.addBlock(Unknown Source)
> >>        at
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2815)
> >>        at
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2697)
> >>        at
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
> >>        at
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
> >>
> >>
> >> Of course the program run successfully without MyMultipleOutputFormat.
> >> who can help me solve this problem?
> >> Thanks.
> >>
> >> yours,    Qiang
> >>
> >
>



-- 
M. Raşit ÖZDAŞ

Re: why print this error when using MultipleOutputFormat?

Reply via email to