Re: why print this error when using MultipleOutputFormat?

jason hadoop Tue, 24 Feb 2009 19:08:07 -0800

My 1st guess is that your application is running out of file
descriptors,possibly because your MultipleOutputFormat  instance is opening
more output files than you expect.
Opening lots of files in HDFS is generally a quick route to bad job
performance if not job failure.


On Tue, Feb 24, 2009 at 6:58 PM, ma qiang <maqiang1...@gmail.com> wrote:

> Hi all,
>   I have one class extends MultipleOutputFormat as below,
>
>      public class MyMultipleTextOutputFormat<K, V> extends
> MultipleOutputFormat<K, V> {
>        private TextOutputFormat<K, V> theTextOutputFormat = null;
>
>        @Override
>        protected RecordWriter<K, V> getBaseRecordWriter(FileSystem fs,
>                        JobConf job, String name, Progressable arg3) throws
> IOException {
>                if (theTextOutputFormat == null) {
>                        theTextOutputFormat = new TextOutputFormat<K, V>();
>                }
>                return theTextOutputFormat.getRecordWriter(fs, job, name,
> arg3);
>        }
>        @Override
>        protected String generateFileNameForKeyValue(K key, V value, String
> name) {
>                return name + "_" + key.toString();
>        }
> }
>
>
> also conf.setOutputFormat(MultipleTextOutputFormat2.class) in my job
> configuration. but when the program run, error print as follow:
>
> 09/02/25 10:22:32 INFO mapred.JobClient: Task Id :
> attempt_200902250959_0002_r_000001_0, Status : FAILED
> java.io.IOException: Could not read from stream
>        at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:119)
>        at java.io.DataInputStream.readByte(DataInputStream.java:248)
>        at
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:325)
>        at
> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:346)
>        at org.apache.hadoop.io.Text.readString(Text.java:400)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2779)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2704)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
>
> 09/02/25 10:22:42 INFO mapred.JobClient:  map 100% reduce 69%
> 09/02/25 10:22:55 INFO mapred.JobClient:  map 100% reduce 0%
> 09/02/25 10:22:55 INFO mapred.JobClient: Task Id :
> attempt_200902250959_0002_r_000000_1, Status : FAILED
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
>
> /user/qiang/output/_temporary/_attempt_200902250959_0002_r_000000_1/part-00000_t0x5y3
> could only be replicated to 0 nodes, instead of 1
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270)
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
>        at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)
>        at org.apache.hadoop.ipc.Client.call(Client.java:696)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>        at $Proxy1.addBlock(Unknown Source)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>        at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>        at $Proxy1.addBlock(Unknown Source)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2815)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2697)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
>
>
> Of course the program run successfully without MyMultipleOutputFormat.
> who can help me solve this problem?
> Thanks.
>
> yours,    Qiang
>

Re: why print this error when using MultipleOutputFormat?

Reply via email to