Re: Multiple Output Format -Unrecognizable Characters in Output File
Hi James, Not sure if you meant to write both key and value as text. key.write(output); This line of code writes long numbers as binary format, that might be the reason you saw unrecognizable characters in output file. Yaozhen On Mon, Jul 18, 2011 at 2:00 PM, Teng, James xt...@ebay.com wrote: ** ** Hi, I encounter a problem why try to define my own MultipleOutputFormat class, here is the codes bellow. *public* *class* MultipleOutputFormat *extends*FileOutputFormatLongWritable,Text{ *public* *class* LineWriter *extends*RecordWriterLongWritable,Text{ *private* DataOutputStream output; *private* *byte* *separatorBytes*[]; *public* LineWriter(DataOutputStream output, String separator) *throws* UnsupportedEncodingException { *this*.output=output; *this*.separatorBytes=separator.getBytes(UTF-8); } @Override *public* *synchronized* *void* close(TaskAttemptContext context) *throws* IOException, InterruptedException { // *TODO* Auto-generated method stub output.close(); } ** ** @Override *public* *void* write(LongWritable key, Text value) *throws*IOException, InterruptedException { System.*out*.println(key:+key.get()); System.*out*.println(value:+value.toString()); // *TODO* Auto-generated method stub //output.writeLong(key.) //output.write(separatorBytes); //output.write(value.toString().getBytes(UTF-8)); //output.write(\n.getBytes(UTF-8)); //key.write(output); key.write(output); value.write(output); ** ** output.write(\n.getBytes(UTF-8)); } } *private* Path *path*; *protected* String generateFileNameForKeyValue(LongWritable key,Text value,String name) { *return* key+Math.*random*(); } ** ** @Override *public* RecordWriterLongWritable, Text getRecordWriter( TaskAttemptContext context) *throws* IOException, InterruptedException { path=*getOutputPath*(context); System.*out*.println( d ); // *TODO* Auto-generated method stub Path file = getDefaultWorkFile(context, ); FileSystem fs = file.getFileSystem(context.getConfiguration()); ** ** FSDataOutputStream fileOut = fs.create(file, *false*); ** ** *return* *new* LineWriter(fileOut, \t); ** ** } ** ** however, there is a problem of unrecognizable characters occurrences in the output file, is there any one encounter the problem before, any comment is greatly appreciated, thanks in advance. ** ** *James, Teng (Teng Linxiao)* *eRL, CDC,eBay,Shanghai* *Extension*:86-21-28913530 *MSN*: tenglinx...@hotmail.com *Skype*:James,Teng *Email*:xt...@ebay.com
Does hadoop local mode support running multiple jobs in different threads?
Hi, I am not sure if this question (as title) has been asked before, but I didn't find an answer by googling. I'd like to explain the scenario of my problem: My program launches several threads in the same time, while each thread will submit a hadoop job and wait for the job to complete. The unit tests were run in local mode, mini-cluster and the real hadoop cluster. I found the unit tests may fail in local mode, but they always succeeded in mini-cluster and real hadoop cluster. When unit test failed in local mode, the causes may be different (stack traces are posted at the end of mail). It seems multi-thead running multiple jobs is not supported in local mode, is it? Error 1: 2011-07-01 20:24:36,460 WARN [Thread-38] mapred.LocalJobRunner (LocalJobRunner.java:run(256)) - job_local_0001 java.io.FileNotFoundException: File build/test/tmp/mapred/local/taskTracker/jobcache/job_local_0001/attempt_local_0001_m_00_0/output/spill0.out does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:192) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142) at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:253) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1447) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1154) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:549) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:623) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) Error 2: 2011-07-01 19:00:25,546 INFO [Thread-32] fs.FSInputChecker (FSInputChecker.java:readChecksumChunk(247)) - Found checksum error: b[3584, 4096]=696f6e69643c2f6e616d653e3c76616c75653e47302e4120636f696e636964656e63652047312e413c2f76616c75653e3c2f70726f70657274793e0a3c70726f70657274793e3c6e616d653e6d61707265642e6a6f622e747261636b65722e706572736973742e6a6f627374617475732e6469723c2f6e616d653e3c76616c75653e2f6a6f62747261636b65722f6a6f6273496e666f3c2f76616c75653e3c2f70726f70657274793e0a3c70726f70657274793e3c6e616d653e6d61707265642e6a61723c2f6e616d653e3c76616c75653e66696c653a2f686f6d652f70616e797a682f6861646f6f7063616c632f6275696c642f746573742f746d702f6d61707265642f73797374656d2f6a6f625f6c6f63616c5f303030332f6a6f622e6a61723c2f76616c75653e3c2f70726f70657274793e0a3c70726f70657274793e3c6e616d653e66732e73332e627565722e6469723c2f6e616d653e3c76616c75653e247b6861646f6f702e746d702e6469727d2f7c2f76616c75653e3c2f70726f70657274793e0a3c70726f70657274793e3c6e616d653e6a6f622e656e642e72657472792e617474656d7074733c2f6e616d653e3c76616c75653e303c2f76616c75653e3c2f70726f70657274793e0a3c70726f70657274793e3c6e616d653e66732e66696c652e696d706c3c2f6e616d653e3c76616c75653e6f org.apache.hadoop.fs.ChecksumException: Checksum error: file:/home/hadoop-user/hadoop-proj/build/test/tmp/mapred/system/job_local_0003/job.xml at 3584 at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277) at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241) at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189) at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158) at java.io.DataInputStream.read(DataInputStream.java:83) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:49) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:209) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142) at org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile(LocalFileSystem.java:61) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1197) at org.apache.hadoop.mapred.LocalJobRunner$Job.init(LocalJobRunner.java:92) at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:373) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:800) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:448) at hadoop.GroupingRunnable.run(GroupingRunnable.java:126) at java.lang.Thread.run(Thread.java:619)
Re: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files
Hi, I am using Eclipse Helios Service Release 2. I encountered a similar problem (map/reduce perspective failed to load) when upgrading eclipse plugin from 0.20.2 to 0.20.3-append version. I compared the source code of eclipse plugin and found only a few difference. I tried to revert the differences one by one to see if it can work. What surprised me was that when I only reverted the jar name from hadoop-0.20.3-eclipse-plugin.jar to hadoop-0.20.2-eclipse-plugin.jar, it worked in eclipse. Yaozhen On Thu, Jun 23, 2011 at 1:22 AM, praveenesh kumar praveen...@gmail.comwrote: I am doing that.. its not working.. If I am replacing the hadoop-core from hadoop-plugin.jar.. I am not able to see map-reduce perspective at all. Guys.. any help.. !!! Thanks, Praveenesh On Wed, Jun 22, 2011 at 12:34 PM, Devaraj K devara...@huawei.com wrote: Every time when hadoop builds, it also builds the hadoop eclipse plug-in using the latest hadoop core jar. In your case eclipse plug-in contains the other version jar and cluster is running with other version. That's why it is giving the version mismatch error. Just replace the hadoop-core jar in your eclipse plug-in with the jar whatever the hadoop cluster is using and check. Devaraj K _ From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: Wednesday, June 22, 2011 12:07 PM To: common-user@hadoop.apache.org; devara...@huawei.com Subject: Re: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files I followed michael noll's tutorial for making hadoop-0-20-append jars.. http://www.michael-noll.com/blog/2011/04/14/building-an-hadoop-0-20-x-versio n-for-hbase-0-90-2/ After following the article.. we get 5 jar files which we need to replace it from hadoop.0.20.2 jar file. There is no jar file for hadoop-eclipse plugin..that I can see in my repository if I follow that tutorial. Also the hadoop-plugin I am using..has no info on JIRA MAPREDUCE-1280 regarding whether it is compatible with hadoop-0.20-append. Does anyone else. faced this kind of issue ??? Thanks, Praveenesh On Wed, Jun 22, 2011 at 11:48 AM, Devaraj K devara...@huawei.com wrote: Hadoop eclipse plugin also uses hadoop-core.jar file communicate to the hadoop cluster. For this it needs to have same version of hadoop-core.jar for client as well as server(hadoop cluster). Update the hadoop eclipse plugin for your eclipse which is provided with hadoop-0.20-append release, it will work fine. Devaraj K -Original Message- From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: Wednesday, June 22, 2011 11:25 AM To: common-user@hadoop.apache.org Subject: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files Guys, I was using hadoop eclipse plugin on hadoop 0.20.2 cluster.. It was working fine for me. I was using Eclipse SDK Helios 3.6.2 with the plugin hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar downloaded from JIRA MAPREDUCE-1280 Now for Hbase installation.. I had to use hadoop-0.20-append compiled jars..and I had to replace the old jar files with new 0.20-append compiled jar files.. But now after replacing .. my hadoop eclipse plugin is not working well for me. Whenever I am trying to connect to my hadoop master node from that and try to see DFS locations.. it is giving me the following error: * Error : Protocol org.apache.hadoop.hdfs.protocol.clientprotocol version mismatch (client 41 server 43)* However the hadoop cluster is working fine if I go directly on hadoop namenode use hadoop commands.. I can add files to HDFS.. run jobs from there.. HDFS web console and Map-Reduce web console are also working fine. but not able to use my previous hadoop eclipse plugin. Any suggestions or help for this issue ? Thanks, Praveenesh
Re: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files
Hi, Our hadoop version was built on 0.20-append with a few patches. However, I didn't see big differences in eclipse-plugin. Yaozhen On Thu, Jun 23, 2011 at 11:29 AM, 叶达峰 (Jack Ye) kobe082...@qq.com wrote: do you use hadoop 0.20.203.0? I also have problem about this plugin. Yaozhen Pan itzhak@gmail.com编写: Hi, I am using Eclipse Helios Service Release 2. I encountered a similar problem (map/reduce perspective failed to load) when upgrading eclipse plugin from 0.20.2 to 0.20.3-append version. I compared the source code of eclipse plugin and found only a few difference. I tried to revert the differences one by one to see if it can work. What surprised me was that when I only reverted the jar name from hadoop-0.20.3-eclipse-plugin.jar to hadoop-0.20.2-eclipse-plugin.jar, it worked in eclipse. Yaozhen On Thu, Jun 23, 2011 at 1:22 AM, praveenesh kumar praveen...@gmail.com wrote: I am doing that.. its not working.. If I am replacing the hadoop-core from hadoop-plugin.jar.. I am not able to see map-reduce perspective at all. Guys.. any help.. !!! Thanks, Praveenesh On Wed, Jun 22, 2011 at 12:34 PM, Devaraj K devara...@huawei.com wrote: Every time when hadoop builds, it also builds the hadoop eclipse plug-in using the latest hadoop core jar. In your case eclipse plug-in contains the other version jar and cluster is running with other version. That's why it is giving the version mismatch error. Just replace the hadoop-core jar in your eclipse plug-in with the jar whatever the hadoop cluster is using and check. Devaraj K _ From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: Wednesday, June 22, 2011 12:07 PM To: common-user@hadoop.apache.org; devara...@huawei.com Subject: Re: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files I followed michael noll's tutorial for making hadoop-0-20-append jars.. http://www.michael-noll.com/blog/2011/04/14/building-an-hadoop-0-20-x-versio n-for-hbase-0-90-2/ After following the article.. we get 5 jar files which we need to replace it from hadoop.0.20.2 jar file. There is no jar file for hadoop-eclipse plugin..that I can see in my repository if I follow that tutorial. Also the hadoop-plugin I am using..has no info on JIRA MAPREDUCE-1280 regarding whether it is compatible with hadoop-0.20-append. Does anyone else. faced this kind of issue ??? Thanks, Praveenesh On Wed, Jun 22, 2011 at 11:48 AM, Devaraj K devara...@huawei.com wrote: Hadoop eclipse plugin also uses hadoop-core.jar file communicate to the hadoop cluster. For this it needs to have same version of hadoop-core.jar for client as well as server(hadoop cluster). Update the hadoop eclipse plugin for your eclipse which is provided with hadoop-0.20-append release, it will work fine. Devaraj K -Original Message- From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: Wednesday, June 22, 2011 11:25 AM To: common-user@hadoop.apache.org Subject: Hadoop eclipse plugin stopped working after replacing hadoop-0.20.2 jar files with hadoop-0.20-append jar files Guys, I was using hadoop eclipse plugin on hadoop 0.20.2 cluster.. It was working fine for me. I was using Eclipse SDK Helios 3.6.2 with the plugin hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar downloaded from JIRA MAPREDUCE-1280 Now for Hbase installation.. I had to use hadoop-0.20-append compiled jars..and I had to replace the old jar files with new 0.20-append compiled jar files.. But now after replacing .. my hadoop eclipse plugin is not working well for me. Whenever I am trying to connect to my hadoop master node from that and try to see DFS locations.. it is giving me the following error: * Error : Protocol org.apache.hadoop.hdfs.protocol.clientprotocol version mismatch (client 41 server 43)* However the hadoop cluster is working fine if I go directly on hadoop namenode use hadoop commands.. I can add files to HDFS.. run jobs from there.. HDFS web console and Map-Reduce web console are also working fine. but not able to use my previous hadoop eclipse plugin. Any suggestions or help for this issue ? Thanks, Praveenesh
Re: Make reducer task exit early
It can be achieved by overwriting Reducer.run() in new mapreduce API. But I don't know how to achieve it in old API. On Sat, Jun 4, 2011 at 8:14 AM, Aaron Baff aaron.b...@telescope.tv wrote: Is there a way to make a Reduce task exit early before it has finished reading all of it's data? Basically I'm doing a group by with a sum, and I only want to return the top 1000 records say. So I have local class int variable to keep track of how many have current been written to the output, and as soon as that is exceeded, simply return at the top of the reduce() function. Is there any way to optimize it even more to tell the Reduce task, stop reading data, I don't need any more data? --Aaron