DataXceiver WRITE_BLOCK: Premature EOF from inputStream: Using Avro Multiple Outputs
Hello, We are running a job that makes use of Avro Multiple Ouputs (Avro 1.7.5). When there are lots of output files the job was failing with the following error which I believed caused the job to fail: hc1hdfs2p.thecarlylegroup.local:50010:DataXceiverServer: java.io.IOException: Xceiver count 4097 exceeds the limit of concurrent xcievers: 4096 at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:137) at java.lang.Thread.run(Thread.java:744) This error starts to appear when we have lots of output directories due to our use of AvroMultipleOutputs (all maps complete without issue and the multi output is being done in the reducers which fail). I went ahead and increased dfs.datanode.max.xcievers to 8192 and reran the job. Using Cloudera Manager I saw that the Transceivers across nodes maxed out at 5376 so setting the max to 8192 solved that first error. Unfortunately, the job still failed. When checking HDFS logs the above error about the xciever limit was gone but I now saw lots of the following. hc1hdfs3p.thecarlylegroup.local:50010:DataXceiver error processing WRITE_BLOCK operation src: /10.14.5.83:53280 dest: /10.14.5.81:50010 java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:446) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:711) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229) at java.lang.Thread.run(Thread.java:744) The above errors seem to be happening a lot and I'm not sure if they are related to the job failure. This error seems to match exactly the error pattern seen in this thread below (which unfortunately had no responses) http://mail-archives.apache.org/mod_mbox/hadoop-user/201408.mbox/%3CCAJOOh6E1D1bx_9NrAUPPzAb6x1=fxd52rgqwxfzwy5tpjiw...@mail.gmail.com%3E The only other warnings I see occurring around the same time as the job failure are: WARN Failed to place enough replicas, still in need of 1 to reach 3. For > more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > WARN > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor Exit > code from container container_1417712817932_31879_01_002464 is : 143 Does anyone have any ideas for what could be causing the job to fail? I did not see anything obvious looking through Cloudera Manager charts or logs. For example, open files was below the limit and memory was well within what the nodes have (6 nodes with 90GB each). No errors in YARN either. Thank you! Best, Ed
Re: Job fails while re attempting the task in multiple outputs case
I think if the task fails, the output related to that task will be clean up before the second attempt. I am guessing you have this exception is because you have two reducers tried to write to the same file. One thing you need to be aware of is that all data that is supposed to be in the same file should go to the same reducer. Let's say if you have data1 on reducer1 and data2 on reducer2 and they are all going to be stored on fileAll, then you will end up having that exception. On Mon, Dec 30, 2013 at 11:22 AM, AnilKumar B wrote: > Thanks Harsh. > > @Are you using the MultipleOutputs class shipped with Apache Hadoop or > one of your own? > I am using Apache Hadoop's multipleOutputs. > > But as you see in stack trace, it's not appending the attempt id to file > name, it's only consists of task id. > > > > Thanks & Regards, > B Anil Kumar. > > > On Mon, Dec 30, 2013 at 7:42 PM, Harsh J wrote: > >> Are you using the MultipleOutputs class shipped with Apache Hadoop or >> one of your own? >> >> If its the latter, please take a look at gotchas to take care of >> described at >> http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2Fwrite-to_hdfs_files_directly_from_map.2Freduce_tasks.3F >> >> On Mon, Dec 30, 2013 at 4:22 PM, AnilKumar B >> wrote: >> > Hi, >> > >> > I am using multiple outputs in our job. So whenever any reduce task >> fails, >> > all it's next task attempts are failing with file exist exception. >> > >> > >> > The output file name should also append the task attempt right? But it's >> > only appending the task id. Is this the bug or Some thing wrong from my >> > side? >> > >> > Where should look in src code? I went through code at >> > FileOutputFormat$getTaskOutputPath(), but there it's only considering >> task >> > id. >> > >> > >> > Exception Trace: >> > 13/12/29 09:13:00 INFO mapred.JobClient: Task Id : >> > attempt_201312162255_60465_r_08_0, Status : FAILED >> > 13/12/29 09:14:42 WARN mapred.JobClient: Error reading task >> > >> outputhttp://localhost:50050/tasklog?plaintext=true&attemptid=attempt_201312162255_60465_r_08_0&filter=stdout >> > 13/12/29 09:14:42 WARN mapred.JobClient: Error reading task >> > >> outputhttp://localhost:50050/tasklog?plaintext=true&attemptid=attempt_201312162255_60465_r_08_0&filter=stderr >> > 13/12/29 09:15:04 INFO mapred.JobClient: map 100% reduce 93% >> > 13/12/29 09:15:23 INFO mapred.JobClient: map 100% reduce 96% >> > 13/12/29 09:17:31 INFO mapred.JobClient: map 100% reduce 97% >> > 13/12/29 09:19:34 INFO mapred.JobClient: Task Id : >> > attempt_201312162255_60465_r_08_1, Status : FAILED >> > org.apache.hadoop.ipc.RemoteException: java.io.IOException: failed to >> create >> > /x/y/z/2013/12/29/04/o_2013_12_29_03-r-8.gz on client 10.103.10.31 >> > either because the filename is invalid or the file exists >> > at >> > >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1672) >> > at >> > >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1599) >> > at >> > >> org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:732) >> > at >> > >> org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:711) >> > at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown >> > Source) >> > at >> > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> > at java.lang.reflect.Method.invoke(Method.java:597) >> > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) >> > at >> > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1448) >> > at >> > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1444) >> > at java.security.AccessController.doPrivileged(Native >> > Method) >> > at javax.security.auth.Subject.doAs(Subject.java:396) >> > at >> > >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) >> > at >> > org.apache.hadoop.ipc.Server$Handler.run(Server.java:1442) >> > >> >
Re: Job fails while re attempting the task in multiple outputs case
Thanks Harsh. @Are you using the MultipleOutputs class shipped with Apache Hadoop or one of your own? I am using Apache Hadoop's multipleOutputs. But as you see in stack trace, it's not appending the attempt id to file name, it's only consists of task id. Thanks & Regards, B Anil Kumar. On Mon, Dec 30, 2013 at 7:42 PM, Harsh J wrote: > Are you using the MultipleOutputs class shipped with Apache Hadoop or > one of your own? > > If its the latter, please take a look at gotchas to take care of > described at > http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2Fwrite-to_hdfs_files_directly_from_map.2Freduce_tasks.3F > > On Mon, Dec 30, 2013 at 4:22 PM, AnilKumar B > wrote: > > Hi, > > > > I am using multiple outputs in our job. So whenever any reduce task > fails, > > all it's next task attempts are failing with file exist exception. > > > > > > The output file name should also append the task attempt right? But it's > > only appending the task id. Is this the bug or Some thing wrong from my > > side? > > > > Where should look in src code? I went through code at > > FileOutputFormat$getTaskOutputPath(), but there it's only considering > task > > id. > > > > > > Exception Trace: > > 13/12/29 09:13:00 INFO mapred.JobClient: Task Id : > > attempt_201312162255_60465_r_08_0, Status : FAILED > > 13/12/29 09:14:42 WARN mapred.JobClient: Error reading task > > > outputhttp://localhost:50050/tasklog?plaintext=true&attemptid=attempt_201312162255_60465_r_08_0&filter=stdout > > 13/12/29 09:14:42 WARN mapred.JobClient: Error reading task > > > outputhttp://localhost:50050/tasklog?plaintext=true&attemptid=attempt_201312162255_60465_r_08_0&filter=stderr > > 13/12/29 09:15:04 INFO mapred.JobClient: map 100% reduce 93% > > 13/12/29 09:15:23 INFO mapred.JobClient: map 100% reduce 96% > > 13/12/29 09:17:31 INFO mapred.JobClient: map 100% reduce 97% > > 13/12/29 09:19:34 INFO mapred.JobClient: Task Id : > > attempt_201312162255_60465_r_08_1, Status : FAILED > > org.apache.hadoop.ipc.RemoteException: java.io.IOException: failed to > create > > /x/y/z/2013/12/29/04/o_2013_12_29_03-r-8.gz on client 10.103.10.31 > > either because the filename is invalid or the file exists > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1672) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1599) > > at > > org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:732) > > at > > org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:711) > > at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown > > Source) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) > > at > > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1448) > > at > > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1444) > > at java.security.AccessController.doPrivileged(Native > > Method) > > at javax.security.auth.Subject.doAs(Subject.java:396) > > at > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > > at > > org.apache.hadoop.ipc.Server$Handler.run(Server.java:1442) > > > > at org.apache.hadoop.ipc.Client.call(Client.java:1118) > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) > > at $Proxy7.create(Unknown Source) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > > Method) > > at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) > > at > > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) > > at $Proxy7.create(Unknown Source) > >
Re: Job fails while re attempting the task in multiple outputs case
Are you using the MultipleOutputs class shipped with Apache Hadoop or one of your own? If its the latter, please take a look at gotchas to take care of described at http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2Fwrite-to_hdfs_files_directly_from_map.2Freduce_tasks.3F On Mon, Dec 30, 2013 at 4:22 PM, AnilKumar B wrote: > Hi, > > I am using multiple outputs in our job. So whenever any reduce task fails, > all it's next task attempts are failing with file exist exception. > > > The output file name should also append the task attempt right? But it's > only appending the task id. Is this the bug or Some thing wrong from my > side? > > Where should look in src code? I went through code at > FileOutputFormat$getTaskOutputPath(), but there it's only considering task > id. > > > Exception Trace: > 13/12/29 09:13:00 INFO mapred.JobClient: Task Id : > attempt_201312162255_60465_r_08_0, Status : FAILED > 13/12/29 09:14:42 WARN mapred.JobClient: Error reading task > outputhttp://localhost:50050/tasklog?plaintext=true&attemptid=attempt_201312162255_60465_r_08_0&filter=stdout > 13/12/29 09:14:42 WARN mapred.JobClient: Error reading task > outputhttp://localhost:50050/tasklog?plaintext=true&attemptid=attempt_201312162255_60465_r_08_0&filter=stderr > 13/12/29 09:15:04 INFO mapred.JobClient: map 100% reduce 93% > 13/12/29 09:15:23 INFO mapred.JobClient: map 100% reduce 96% > 13/12/29 09:17:31 INFO mapred.JobClient: map 100% reduce 97% > 13/12/29 09:19:34 INFO mapred.JobClient: Task Id : > attempt_201312162255_60465_r_08_1, Status : FAILED > org.apache.hadoop.ipc.RemoteException: java.io.IOException: failed to create > /x/y/z/2013/12/29/04/o_2013_12_29_03-r-8.gz on client 10.103.10.31 > either because the filename is invalid or the file exists > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1672) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1599) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:732) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:711) > at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown > Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) > at > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1448) > at > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1444) > at java.security.AccessController.doPrivileged(Native > Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at > org.apache.hadoop.ipc.Server$Handler.run(Server.java:1442) > > at org.apache.hadoop.ipc.Client.call(Client.java:1118) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) > at $Proxy7.create(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) > at $Proxy7.create(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:3753) > at > org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:937) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:207) > at > org.apache.hadoop.fs.FileSystem.create(FileSystem.java:555) > at > org.apache.hadoop.fs.FileSystem.create(FileSystem.java:536) > at > org.apache.hadoop.fs.FileSystem.create(FileSystem.java:443) > at > org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:131) > at > org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:411) >
Job fails while re attempting the task in multiple outputs case
Hi, I am using multiple outputs in our job. So whenever any reduce task fails, all it's next task attempts are failing with file exist exception. The output file name should also append the task attempt right? But it's only appending the task id. Is this the bug or Some thing wrong from my side? Where should look in src code? I went through code at FileOutputFormat$getTaskOutputPath(), but there it's only considering task id. Exception Trace: 13/12/29 09:13:00 INFO mapred.JobClient: Task Id : attempt_201312162255_60465_r_08_0, Status : FAILED 13/12/29 09:14:42 WARN mapred.JobClient: Error reading task outputhttp://localhost:50050/tasklog?plaintext=true&attemptid=attempt_201312162255_60465_r_08_0&filter=stdout 13/12/29 09:14:42 WARN mapred.JobClient: Error reading task outputhttp://localhost:50050/tasklog?plaintext=true&attemptid=attempt_201312162255_60465_r_08_0&filter=stderr 13/12/29 09:15:04 INFO mapred.JobClient: map 100% reduce 93% 13/12/29 09:15:23 INFO mapred.JobClient: map 100% reduce 96% 13/12/29 09:17:31 INFO mapred.JobClient: map 100% reduce 97% 13/12/29 09:19:34 INFO mapred.JobClient: Task Id : attempt_201312162255_60465_r_08_1, Status : FAILED org.apache.hadoop.ipc.RemoteException: java.io.IOException: failed to create /x/y/z/2013/12/29/04/o_2013_12_29_03-r-8.gz on client 10.103.10.31 either because the filename is invalid or the file exists at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1672) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1599) at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:732) at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:711) at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1448) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1444) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1442) at org.apache.hadoop.ipc.Client.call(Client.java:1118) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at $Proxy7.create(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) at $Proxy7.create(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:3753) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:937) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:207) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:555) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:536) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:443) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:131) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:411) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:370) at com.team.hadoop.mapreduce1$Reducer1.reduce(MapReduce1.java:254) at com.team.hadoop.mapreduce1$Reducer1.reduce(MapReduce1.java::144) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.sec
Real Multiple Outputs for Hadoop -- is this implementation correct?
Hey guys I spent some time last week thinking about Hadoop before I wrote my own class, RealMultipleOutputs, that does something like what MultipleOutputs does, except that you can specify different hdfs paths for the different output streams. My pals were telling me to use Cascading or Pig if I want this functionality, but otherwise I was happy writing Plain M/R jars I wrote up the implementation here: https://github.com/paulhoule/infovore/wiki/Real-Multiple-Outputs-in-Hadoop And this works hand-in hand with an abstraction layer that supports unit testing w/ Mockito https://github.com/paulhoule/infovore/wiki/Unit-Testing-Hadoop-Mappers-and-Reducers Anyway, I'd appreciate anybody looking at this code and trying to poke holes in it. It runs OK on my tiny dev cluster in 1.0.4, 1.1.2 and in AMZN EMR but I am wondering if I missed something.
Re: Real Multiple Outputs for Hadoop -- is this implementation correct?
I took a very brief look, and the approach to use multiple OCs, one per unique parent path from a task, seems the right thing to do. Nice work! Do consider contributing this if its working well for you :) On Sat, Sep 14, 2013 at 12:53 AM, Paul Houle wrote: > Hey guys I spent some time last week thinking about Hadoop before I wrote my > own class, RealMultipleOutputs, that does something like what > MultipleOutputs does, except that you can specify different hdfs paths for > the different output streams. My pals were telling me to use Cascading or > Pig if I want this functionality, but otherwise I was happy writing Plain > M/R jars > > I wrote up the implementation here: > > https://github.com/paulhoule/infovore/wiki/Real-Multiple-Outputs-in-Hadoop > > And this works hand-in hand with an abstraction layer that supports unit > testing w/ Mockito > > https://github.com/paulhoule/infovore/wiki/Unit-Testing-Hadoop-Mappers-and-Reducers > > Anyway, I'd appreciate anybody looking at this code and trying to poke > holes in it. It runs OK on my tiny dev cluster in 1.0.4, 1.1.2 and in AMZN > EMR but I am wondering if I missed something. > > -- Harsh J
Re: Multiple outputs
MultipleOutputs is the way to go :) On Tue, Mar 12, 2013 at 12:48 PM, Fatih Haltas wrote: > Hi Everyone, > > I would like to have 2 different output (having different columns of a same > input text file.) > When I googled a bit, I got multipleoutputs classes, is this the common way > of doing it or is there any way to create context kind of > things/is there context array/is it possible to have two different context > object as reducer output by changing the "public void reduce(Text > key,Iterable values, Context context)" part as one more Context > context1, Context context2) ? > > Any help will be appraciated. > Thank you very much. > > > > > Below is my reducer function, how should I modify it? > > > > static class MyReducer extends Reducer >{ > public void reduce(Text key,Iterable values, Context context) > throws IOException, >InterruptedException >{ > Iterator iter = values.iterator(); >while(iter.hasNext()) >{ > Text externalip_starttime_endtime = iter.next(); > Text outValue = new > Text(externalip_starttime_endtime); > context.write(key, new Text(outValue)); > >} > >} >} -- Harsh J
Multiple outputs
Hi Everyone, I would like to have 2 different output (having different columns of a same input text file.) When I googled a bit, I got multipleoutputs classes, is this the common way of doing it or is there any way to create context kind of things/is there context array/is it possible to have two different context object as reducer output by changing the "public void reduce(Text key,Iterable values, Context context)" part as one more Context context1, Context context2) ? Any help will be appraciated. Thank you very much. Below is my reducer function, how should I modify it? static class MyReducer extends Reducer { public void reduce(Text key,Iterable values, Context context) throws IOException, InterruptedException { Iterator iter = values.iterator(); while(iter.hasNext()) { Text externalip_starttime_endtime = iter.next(); Text outValue = new Text(externalip_starttime_endtime); context.write(key, new Text(outValue)); } } }