Re: Is Mapper's map method thread safe?
Each mapper instance will be executed in separate JVM. On Thu, May 14, 2009 at 2:04 PM, imcaptor imcap...@gmail.com wrote: Dear all: Any one knows Is Mapper's map method thread safe? Thank you! imcaptor -- 朱盛凯 Jash Zhu 复旦大学软件学院 Software School, Fudan University
Re: If I make two map reduce, can I don't save the medial output?
More detail description? On Fri, Apr 17, 2009 at 2:21 PM, 王红宝 imcap...@gmail.com wrote: as the tittle. Thank You! imcaptor -- 朱盛凯 Jash Zhu 复旦大学软件学院 Software School, Fudan University
Re: custom writable class
Your custom implementation of any interface from hadoop-core should be archived together with the application (i.e. in the same jar). Andt he jar will be added to CLASSPATH of the task runner, then your customwritable.java could be found. On Thu, Sep 18, 2008 at 8:09 PM, Deepak Diwakar [EMAIL PROTECTED] wrote: Hi, I am new to hadoop. For my map/reduce task I want to write my on custom writable class. Could anyone please let me know where exactly to place the customwritable.java file? I found that in {hadoop-home} /hadoop-{version}/src/java/org/apache/hadoop/io/ all type of writable class files are there. Then in the main task, we just include import org.apache.hadoop.io.{X}Writable; But this is not working for me. Basically at the time of compilation compiler doesn't find my customwritable class which i have placed in the mentioned folder. plz help me in this endevor. Thanks deepak -- 朱盛凯 Jash Zhu 复旦大学软件学院 Software School, Fudan University
Re: custom writable class
You can refer to the Hadoop Map-Reduce Tutorial On Thu, Sep 18, 2008 at 8:40 PM, Shengkai Zhu [EMAIL PROTECTED] wrote: Your custom implementation of any interface from hadoop-core should be archived together with the application (i.e. in the same jar). Andt he jar will be added to CLASSPATH of the task runner, then your customwritable.java could be found. On Thu, Sep 18, 2008 at 8:09 PM, Deepak Diwakar [EMAIL PROTECTED]wrote: Hi, I am new to hadoop. For my map/reduce task I want to write my on custom writable class. Could anyone please let me know where exactly to place the customwritable.java file? I found that in {hadoop-home} /hadoop-{version}/src/java/org/apache/hadoop/io/ all type of writable class files are there. Then in the main task, we just include import org.apache.hadoop.io.{X}Writable; But this is not working for me. Basically at the time of compilation compiler doesn't find my customwritable class which i have placed in the mentioned folder. plz help me in this endevor. Thanks deepak -- 朱盛凯 Jash Zhu 复旦大学软件学院 Software School, Fudan University -- 朱盛凯 Jash Zhu 复旦大学软件学院 Software School, Fudan University
Re: custom writable class
Here is the link http://hadoop.apache.org/core/docs/current/mapred_tutorial.html On Thu, Sep 18, 2008 at 9:16 PM, chanel [EMAIL PROTECTED] wrote: Where can you find the Hadoop Map-Reduce Tutorial? Shengkai Zhu wrote: You can refer to the Hadoop Map-Reduce Tutorial On Thu, Sep 18, 2008 at 8:40 PM, Shengkai Zhu [EMAIL PROTECTED] wrote: Your custom implementation of any interface from hadoop-core should be archived together with the application (i.e. in the same jar). Andt he jar will be added to CLASSPATH of the task runner, then your customwritable.java could be found. On Thu, Sep 18, 2008 at 8:09 PM, Deepak Diwakar [EMAIL PROTECTED] wrote: Hi, I am new to hadoop. For my map/reduce task I want to write my on custom writable class. Could anyone please let me know where exactly to place the customwritable.java file? I found that in {hadoop-home} /hadoop-{version}/src/java/org/apache/hadoop/io/ all type of writable class files are there. Then in the main task, we just include import org.apache.hadoop.io.{X}Writable; But this is not working for me. Basically at the time of compilation compiler doesn't find my customwritable class which i have placed in the mentioned folder. plz help me in this endevor. Thanks deepak -- 朱盛凯 Jash Zhu 复旦大学软件学院 Software School, Fudan University No virus found in this outgoing message. Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database: 270.6.21/1678 - Release Date: 9/18/2008 9:01 AM -- 朱盛凯 Jash Zhu 复旦大学软件学院 Software School, Fudan University
Re: hadoop hanging (probably misconfiguration) assistance
Logs may probably tell what happened. On Thu, Sep 11, 2008 at 3:20 PM, [EMAIL PROTECTED] wrote: Hi All, I have been trying to move from pseudo distributed hadoop cluster which worked perfectly well, to a real hadoop cluster. I was able to execute the wordcount example on my pseudo cluster but my real cluster hangs at this point: # bin/hadoop jar hadoop*jar wordcount /myinput /myoutput 08/09/10 17:10:30 INFO mapred.FileInputFormat: Total input paths to process : 2 08/09/10 17:10:30 INFO mapred.FileInputFormat: Total input paths to process : 2 08/09/10 17:10:31 INFO mapred.JobClient: Running job: job_200809101706_0001 08/09/10 17:10:32 INFO mapred.JobClient: map 0% reduce 0% The machines are doing nothing ie all processes at 0.0% I have changed the configuration a couple of times to see where the issue lies. Currently I have 2 machines in the cluster the namenode and the jobtracker one one machine with the datanode on a separate machine. I have moved from named nodes to ip addresses with negligible improvement. The only errors in the logfiles are regarding flushing for log4j so I did not consider that to be relevant. If anyone has seen this or has any ideas where I might find the source of my issues I would be grateful. Regards Damien # cat hadoop-site.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namemapred.task.timeout/name value6000/value descriptionThe number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string. /description /property property namefs.default.name/name valuehdfs://10.7.3.164:54130//value /property property namedfs.replication/name value1/value /property property namehadoop.logfile.size/name value100/value /property property namehadoop.logfile.count/name value2/value /property property nameio.sort.mb/name value25/value /property property namedfs.block.size/name value8388608/value /property property namedfs.namenode.handler.count/name value5/value /property property namemapred.job.tracker/name value10.7.3.164:54131/value /property property namemapred.job.tracker.handler.count/name value3/value /property property namemapred.tasktracker.map.tasks.maximum/name value2/value /property property namemapred.tasktracker.reduce.tasks.maximum/name value2/value /property property namemapred.child.java.opts/name value-Xmx128m/value /property property namemapred.map.tasks.speculative.execution/name valuefalse/value /property property namemapred.reduce.tasks.speculative.execution/name valuefalse/value /property property namemapred.submit.replication/name value1/value /property property nametasktracker.http.threads/name value4/value /property /configuration -- 朱盛凯 Jash Zhu 复旦大学软件学院 Software School, Fudan University
Re: Issue in reduce phase with SortedMapWritable and custom Writables as values
AFAIK, tasktracker will load your job archive automatically while running the map/reduce task. On Tue, Sep 9, 2008 at 10:28 PM, Ryan LeCompte [EMAIL PROTECTED] wrote: Based on some similar problems that I found others were having in the mailing lists, it looks like the solution was to list my Map/Reduce job JAR In the conf/hadoop-env.sh file under HADOOP_CLASSPATH. After doing that and re-submitting the job, it all worked fine! I guess the MapWritable class somehow doesn't share the same classpath as the program that actually submits the job conf. Is this expected? Thanks, Ryan On Tue, Sep 9, 2008 at 9:44 AM, Ryan LeCompte [EMAIL PROTECTED] wrote: Okay, I think I'm getting closer but now I'm running into another problem. First off, I created my own CustomMapWritable that extends MapWritable and invokes AbstractMapWritable.addToMap() to add my custom classes. Now the map/reduce phases actually complete and the job as a whole completes. However, when I try to use the SequenceFile API to later read the output data, I'm getting a strange exception. First the code: FileSystem fileSys = FileSystem.get(conf); SequenceFile.Reader reader = new SequenceFile.Reader(fileSys, inFile, conf); Text key = new Text(); CustomWritable stats = new CustomWritable(); reader.next(key, stats); reader.close(); And now the exception that's thrown: java.io.IOException: can't find class: com.test.CustomStatsWritable because com.test.CustomStatsWritable at org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:210) at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:145) at com.test.CustomStatsWritable.readFields(UserStatsWritable.java:49) at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1751) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879) ... Any ideas? Thanks, Ryan On Tue, Sep 9, 2008 at 12:36 AM, Ryan LeCompte [EMAIL PROTECTED] wrote: Hello, I'm attempting to use a SortedMapWritable with a LongWritable as the key and a custom implementation of org.apache.hadoop.io.Writable as the value. I notice that my program works fine when I use another primitive wrapper (e.g. Text) as the value, but fails with the following exception when I use my custom Writable instance: 2008-09-08 23:25:02,072 INFO org.apache.hadoop.mapred.ReduceTask: Initiating in-memory merge with 1 segments... 2008-09-08 23:25:02,077 INFO org.apache.hadoop.mapred.Merger: Merging 1 sorted segments 2008-09-08 23:25:02,077 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 5492 bytes 2008-09-08 23:25:02,099 WARN org.apache.hadoop.mapred.ReduceTask: attempt_200809082247_0005_r_00_0 Merge of the inmemory files threw a n exception: java.io.IOException: Intermedate merge failed at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2133) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2064) Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:80) at org.apache.hadoop.io.SortedMapWritable.readFields(SortedMapWritable.java:179) ... I noticed that the AbstractMapWritable class has a protected addToMap(Class clazz) method. Do I somehow need to let my SortedMapWritable instance know about my custom Writable value? I've properly implemented the custom Writable object (it just contains a few primitives, like longs and ints). Any insight is appreciated. Thanks, Ryan -- 朱盛凯 Jash Zhu 复旦大学软件学院 Software School, Fudan University
Re: number of tasks on a node.
You can monitor the task or node status through web pages provided. On Wed, Sep 10, 2008 at 11:24 AM, Edward J. Yoon [EMAIL PROTECTED]wrote: TaskTrackers are communicates with JobTracker so I guess you can handle it via JobTracker. -Edward On Wed, Sep 10, 2008 at 12:03 PM, Dmitry Pushkarev [EMAIL PROTECTED] wrote: Hi. How can node find out how many task are being run on it at a given time? I want tasktracer nodes (which are assigned from amazon EC) to shutdown if nothing is being run for some period of time, but don't yet see right way of implementing this. -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org -- 朱盛凯 Jash Zhu 复旦大学软件学院 Software School, Fudan University
Re: Failing MR jobs!
Do you have some more detailed information? Logs are helpful. On Mon, Sep 8, 2008 at 3:26 AM, Erik Holstad [EMAIL PROTECTED] wrote: Hi! I'm trying to run a MR job, but it keeps on failing and I can't understand why. Sometimes it shows output at 66% and sometimes 98% or so. I had a couple of exception before that I didn't catch that made the job to fail. The log file from the task can be found at: http://pastebin.com/m4414d369 and the code looks like: //Java import java.io.*; import java.util.*; import java.net.*; //Hadoop import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; //HBase import org.apache.hadoop.hbase.*; import org.apache.hadoop.hbase.mapred.*; import org.apache.hadoop.hbase.io.*; import org.apache.hadoop.hbase.client.*; // org.apache.hadoop.hbase.client.HTable //Extra import org.apache.commons.cli.ParseException; import org.apache.commons.httpclient.MultiThreadedHttpConnectionManager; import org.apache.commons.httpclient.*; import org.apache.commons.httpclient.methods.*; import org.apache.commons.httpclient.params.HttpMethodParams; public class SerpentMR1 extends TableMap implements Mapper, Tool { //Setting DebugLevel private static final int DL = 0; //Setting up the variables for the MR job private static final String NAME = SerpentMR1; private static final String INPUTTABLE = sources; private final String[] COLS = {content:feedurl, content:ttl, content:updated}; private Configuration conf; public JobConf createSubmittableJob(String[] args) throws IOException{ JobConf c = new JobConf(getConf(), SerpentMR1.class); String jar = /home/hbase/SerpentMR/ +NAME+.jar; c.setJar(jar); c.setJobName(NAME); int mapTasks = 4; int reduceTasks = 20; c.setNumMapTasks(mapTasks); c.setNumReduceTasks(reduceTasks); String inputCols = ; for (int i=0; iCOLS.length; i++){inputCols += COLS[i] + ; } TableMap.initJob(INPUTTABLE, inputCols, this.getClass(), Text.class, BytesWritable.class, c); //Classes between: c.setOutputFormat(TextOutputFormat.class); Path path = new Path(users); //inserting into a temp table FileOutputFormat.setOutputPath(c, path); c.setReducerClass(MyReducer.class); return c; } public void map(ImmutableBytesWritable key, RowResult res, OutputCollector output, Reporter reporter) throws IOException { Cell cellLast= res.get(COLS[2].getBytes());//lastupdate long oldTime = cellLast.getTimestamp(); Cell cell_ttl= res.get(COLS[1].getBytes());//ttl long ttl = StreamyUtil.BytesToLong(cell_ttl.getValue() ); byte[] url = null; long currTime = time.GetTimeInMillis(); if(currTime - oldTime ttl){ url = res.get(COLS[0].getBytes()).getValue();//url output.collect(new Text(Base64.encode_strip(res.getRow())), new BytesWritable(url) );/ } } public static class MyReducer implements Reducer{ //org.apache.hadoop.mapred.Reducer{ private int timeout = 1000; //Sets the connection timeout time ms; public void reduce(Object key, Iterator values, OutputCollector output, Reporter rep) throws IOException { HttpClient client = new HttpClient();//new MultiThreadedHttpConnectionManager()); client.getHttpConnectionManager(). getParams().setConnectionTimeout(timeout); GetMethod method = null; int stat = 0; String content = ; byte[] colFam = select.getBytes(); byte[] column = lastupdate.getBytes(); byte[] currTime = null; HBaseRef hbref = new HBaseRef(); JerlType sendjerl = null; //new JerlType(); ArrayList jd = new ArrayList(); InputStream is = null; while(values.hasNext()){ BytesWritable bw = (BytesWritable)values.next(); String address = new String(bw.get()); try{ System.out.println(address); method = new GetMethod(address); method.setFollowRedirects(true); } catch (Exception e){ System.err.println(Invalid Address); e.printStackTrace(); } if (method != null){ try { // Execute the method. stat = client.executeMethod(method); if(stat == 200){ content = ; is = (InputStream)(method.getResponseBodyAsStream()); //Write to HBase new stamp
Re: Failing MR jobs!
Sorry, I didn't see the log link. On Tue, Sep 9, 2008 at 12:01 PM, Shengkai Zhu [EMAIL PROTECTED] wrote: Do you have some more detailed information? Logs are helpful. On Mon, Sep 8, 2008 at 3:26 AM, Erik Holstad [EMAIL PROTECTED]wrote: Hi! I'm trying to run a MR job, but it keeps on failing and I can't understand why. Sometimes it shows output at 66% and sometimes 98% or so. I had a couple of exception before that I didn't catch that made the job to fail. The log file from the task can be found at: http://pastebin.com/m4414d369 and the code looks like: //Java import java.io.*; import java.util.*; import java.net.*; //Hadoop import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; //HBase import org.apache.hadoop.hbase.*; import org.apache.hadoop.hbase.mapred.*; import org.apache.hadoop.hbase.io.*; import org.apache.hadoop.hbase.client.*; // org.apache.hadoop.hbase.client.HTable //Extra import org.apache.commons.cli.ParseException; import org.apache.commons.httpclient.MultiThreadedHttpConnectionManager; import org.apache.commons.httpclient.*; import org.apache.commons.httpclient.methods.*; import org.apache.commons.httpclient.params.HttpMethodParams; public class SerpentMR1 extends TableMap implements Mapper, Tool { //Setting DebugLevel private static final int DL = 0; //Setting up the variables for the MR job private static final String NAME = SerpentMR1; private static final String INPUTTABLE = sources; private final String[] COLS = {content:feedurl, content:ttl, content:updated}; private Configuration conf; public JobConf createSubmittableJob(String[] args) throws IOException{ JobConf c = new JobConf(getConf(), SerpentMR1.class); String jar = /home/hbase/SerpentMR/ +NAME+.jar; c.setJar(jar); c.setJobName(NAME); int mapTasks = 4; int reduceTasks = 20; c.setNumMapTasks(mapTasks); c.setNumReduceTasks(reduceTasks); String inputCols = ; for (int i=0; iCOLS.length; i++){inputCols += COLS[i] + ; } TableMap.initJob(INPUTTABLE, inputCols, this.getClass(), Text.class, BytesWritable.class, c); //Classes between: c.setOutputFormat(TextOutputFormat.class); Path path = new Path(users); //inserting into a temp table FileOutputFormat.setOutputPath(c, path); c.setReducerClass(MyReducer.class); return c; } public void map(ImmutableBytesWritable key, RowResult res, OutputCollector output, Reporter reporter) throws IOException { Cell cellLast= res.get(COLS[2].getBytes());//lastupdate long oldTime = cellLast.getTimestamp(); Cell cell_ttl= res.get(COLS[1].getBytes());//ttl long ttl = StreamyUtil.BytesToLong(cell_ttl.getValue() ); byte[] url = null; long currTime = time.GetTimeInMillis(); if(currTime - oldTime ttl){ url = res.get(COLS[0].getBytes()).getValue();//url output.collect(new Text(Base64.encode_strip(res.getRow())), new BytesWritable(url) );/ } } public static class MyReducer implements Reducer{ //org.apache.hadoop.mapred.Reducer{ private int timeout = 1000; //Sets the connection timeout time ms; public void reduce(Object key, Iterator values, OutputCollector output, Reporter rep) throws IOException { HttpClient client = new HttpClient();//new MultiThreadedHttpConnectionManager()); client.getHttpConnectionManager(). getParams().setConnectionTimeout(timeout); GetMethod method = null; int stat = 0; String content = ; byte[] colFam = select.getBytes(); byte[] column = lastupdate.getBytes(); byte[] currTime = null; HBaseRef hbref = new HBaseRef(); JerlType sendjerl = null; //new JerlType(); ArrayList jd = new ArrayList(); InputStream is = null; while(values.hasNext()){ BytesWritable bw = (BytesWritable)values.next(); String address = new String(bw.get()); try{ System.out.println(address); method = new GetMethod(address); method.setFollowRedirects(true); } catch (Exception e){ System.err.println(Invalid Address); e.printStackTrace(); } if (method != null){ try { // Execute the method. stat = client.executeMethod(method); if(stat == 200){ content
Re: How to control the map and reduce step sequentially
The real reduce logic is actually started when all map tasks are finished. Is it still unexpected? On 7/28/08, 晋光峰 [EMAIL PROTECTED] wrote: Dear All, When i using Hadoop, I noticed that the reducer step is started immediately when the mappers are still running. According to my project requirement, the reducer step should not start until all the mappers finish their execution. Anybody knows how to use some Hadoop API to achieve this? When all the mappers finish their process, then the reducer is started. Thanks -- Guangfeng Jin -- 朱盛凯 Jash Zhu 复旦大学软件学院 Software School, Fudan University
Re: Reg: Problem in Build Versions of Hadoop-0.17.0
Replace the hadoop-*-core.jar in datanodes with your jar compiled under jobs On 7/16/08, chaitanya krishna [EMAIL PROTECTED] wrote: Hi, I'm using hadoop-0.17.0 and recently, when i stopped and restarted dfs, the datanodes are being created and soon, they r not present. the logs of namenode shows the following error: / SHUTDOWN_MSG: Shutting down NameNode at 172.16.45.171/172.16.45.171 / 2008-07-16 10:13:34,885 INFO org.apache.hadoop.dfs.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = 172.16.45.171/172.16.45.171 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.17.1-dev STARTUP_MSG: build = -r ; compiled by 'jobs' on Wed Jul 16 10:13:11 IST 2008 / 2008-07-16 10:13:34,973 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=9000 2008-07-16 10:13:35,011 INFO org.apache.hadoop.dfs.NameNode: Namenode up at: 172.16.45.171/172.16.45.171:9000 2008-07-16 10:13:35,015 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2008-07-16 10:13:35,017 INFO org.apache.hadoop.dfs.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2008-07-16 10:13:35,060 INFO org.apache.hadoop.fs.FSNamesystem: fsOwner=jobs,jobs 2008-07-16 10:13:35,060 INFO org.apache.hadoop.fs.FSNamesystem: supergroup=supergroup 2008-07-16 10:13:35,060 INFO org.apache.hadoop.fs.FSNamesystem: isPermissionEnabled=true 2008-07-16 10:14:44,869 INFO org.apache.hadoop.fs.FSNamesystem: Finished loading FSImage in 69827 msecs 2008-07-16 10:14:44,870 INFO org.apache.hadoop.dfs.StateChange: STATE* Leaving safe mode after 69 secs. 2008-07-16 10:14:44,871 INFO org.apache.hadoop.dfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes 2008-07-16 10:14:44,871 INFO org.apache.hadoop.dfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks 2008-07-16 10:14:44,877 INFO org.apache.hadoop.fs.FSNamesystem: Registered FSNamesystemStatusMBean 2008-07-16 10:14:44,930 INFO org.mortbay.util.Credential: Checking Resource aliases 2008-07-16 10:14:44,993 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4 2008-07-16 10:14:44,993 INFO org.mortbay.util.Container: Started HttpContext[/static,/static] 2008-07-16 10:14:44,993 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs] 2008-07-16 10:14:45,190 INFO org.mortbay.util.Container: Started [EMAIL PROTECTED] 2008-07-16 10:14:45,219 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/] 2008-07-16 10:14:45,221 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50070 2008-07-16 10:14:45,221 INFO org.mortbay.util.Container: Started [EMAIL PROTECTED] 2008-07-16 10:14:45,221 INFO org.apache.hadoop.fs.FSNamesystem: Web-server up at: 0.0.0.0:50070 2008-07-16 10:14:45,307 INFO org.apache.hadoop.dfs.NameNode: Error report from 172.16.45.174:50010: Incompatible build versions: namenode BV = ; datanode BV = 656523 2008-07-16 10:14:45,308 INFO org.apache.hadoop.dfs.NameNode: Error report from 172.16.45.178:50010: Incompatible build versions: namenode BV = ; datanode BV = 656523 2008-07-16 10:14:45,308 INFO org.apache.hadoop.dfs.NameNode: Error report from 172.16.45.176:50010: Incompatible build versions: namenode BV = ; datanode BV = 656523 2008-07-16 10:14:45,308 INFO org.apache.hadoop.dfs.NameNode: Error report from 172.16.45.177:50010: Incompatible build versions: namenode BV = ; datanode BV = 656523 2008-07-16 10:14:45,309 INFO org.apache.hadoop.dfs.NameNode: Error report from 172.16.45.172:50010: Incompatible build versions: namenode BV = ; datanode BV = 656523 2008-07-16 10:14:45,309 INFO org.apache.hadoop.dfs.NameNode: Error report from 172.16.45.175:50010: Incompatible build versions: namenode BV = ; datanode BV = 656523 2008-07-16 10:14:45,309 INFO org.apache.hadoop.dfs.NameNode: Error report from 172.16.45.179:50010: Incompatible build versions: namenode BV = ; datanode BV = 656523 2008-07-16 10:14:45,311 INFO org.apache.hadoop.dfs.NameNode: Error report from 172.16.45.173:50010: Incompatible build versions: namenode BV = ; datanode BV = 656523 2008-07-16 10:14:45,341 INFO org.apache.hadoop.dfs.NameNode: Error report from 172.16.45.180:50010: Incompatible build versions: namenode BV = ; datanode BV = 656523 2008-07-16 10:14:49,224 INFO org.apache.hadoop.fs.FSNamesystem: Number of transactions: 1 Total time for transactions(ms): 1 Number of syncs: 0 SyncTimes(ms): 0 2008-07-16 10:19:45,726 INFO org.apache.hadoop.fs.FSNamesystem: Roll Edit Log from 172.16.45.171 2008-07-16 10:19:45,726 INFO org.apache.hadoop.fs.FSNamesystem: Number of transactions: 3
Re: Why is the task run in a child JVM?
Well, I got it. On 7/14/08, Jason Venner [EMAIL PROTECTED] wrote: One benefit is that if your map or reduce behaves badly it can't take down the task tracker. In our case we have some poorly behaved external native libraries we use, and we have to forcibly ensure that the child vms are killed when the child main finishes (often by kill -9), so the fact the child (task) is a separate jvm process is very helpful. The downside is the jvm start time. Has anyone experimented with the jar freezing for more than the standard boot class path jars to speed up startup? Shengkai Zhu wrote: What's the benefits from such design compared to multi-thread? -- Jason Venner Attributor - Program the Web http://www.attributor.com/ Attributor is hiring Hadoop Wranglers and coding wizards, contact if interested -- 朱盛凯 Jash Zhu 复旦大学软件学院 Software School, Fudan University
Re: java.io.IOException: All datanodes are bad. Aborting...
Did you do the clean work on all the datanodes? rm -Rf /path/to/my/hadoop/dfs/data On 6/20/08, novice user [EMAIL PROTECTED] wrote: Hi Mori Bellamy, I did this twice. and still the same problem is persisting. I don't know how to solve this issue. If any one know the answer, please let me know. Thanks Mori Bellamy wrote: That's bizarre. I'm not sure why your DFS would have magically gotten full. Whenever hadoop gives me trouble, i try the following sequence of commands stop-all.sh rm -Rf /path/to/my/hadoop/dfs/data hadoop namenode -format start-all.sh maybe you would get some luck if you ran that on all of the machines? (of course, don't run it if you don't want to lose all of that data) On Jun 19, 2008, at 4:32 AM, novice user wrote: Hi Every one, I am running a simple map-red application similar to k-means. But, when I ran it in on single machine, it went fine with out any issues. But, when I ran the same on a hadoop cluster of 9 machines. It fails saying java.io.IOException: All datanodes are bad. Aborting... Here is more explanation about the problem: I tried to upgrade my hadoop cluster to hadoop-17. During this process, I made a mistake of not installing hadoop on all machines. So, the upgrade failed. Nor I was able to roll back. So, I re-formatted the name node afresh. and then hadoop installation was successful. Later, when I ran my map-reduce job, it ran successfully,but the same job with zero reduce tasks is failing with the error as: java.io.IOException: All datanodes are bad. Aborting... When I looked into the data nodes, I figured out that file system is 100% full with different directories of name subdir in hadoop-username/dfs/data/current directory. I am wondering where I went wrong. Can some one please help me on this? The same job went fine on a single machine with same amount of input data. Thanks -- View this message in context: http://www.nabble.com/java.io.IOException%3A-All-datanodes-are-bad.-Aborting...-tp18006296p18006296.html Sent from the Hadoop core-user mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/java.io.IOException%3A-All-datanodes-are-bad.-Aborting...-tp18006296p18022330.html Sent from the Hadoop core-user mailing list archive at Nabble.com. -- 朱盛凯 Jash Zhu 复旦大学软件学院 Software School, Fudan University
Re: dfs copyFromLocal/put fails
Firewall problem or you need entries into /etc/hosts On 6/18/08, Alexander Arimond [EMAIL PROTECTED] wrote: hi, i'm new in hadoop and im just testing it at the moment. i set up a cluster with 2 nodes and it seems like they are running normally, the log files of the namenode and the datanodes dont show errors. Firewall should be set right. but when i try to upload a file to the dfs i get following message: [EMAIL PROTECTED]:~/hadoop$ bin/hadoop dfs -put file.txt file.txt 08/06/12 14:44:19 INFO dfs.DFSClient: Exception in createBlockOutputStream java.net.ConnectException: Connection refused 08/06/12 14:44:19 INFO dfs.DFSClient: Abandoning block blk_5837981856060447217 08/06/12 14:44:28 INFO dfs.DFSClient: Exception in createBlockOutputStream java.net.ConnectException: Connection refused 08/06/12 14:44:28 INFO dfs.DFSClient: Abandoning block blk_2573458924311304120 08/06/12 14:44:37 INFO dfs.DFSClient: Exception in createBlockOutputStream java.net.ConnectException: Connection refused 08/06/12 14:44:37 INFO dfs.DFSClient: Abandoning block blk_1207459436305221119 08/06/12 14:44:46 INFO dfs.DFSClient: Exception in createBlockOutputStream java.net.ConnectException: Connection refused 08/06/12 14:44:46 INFO dfs.DFSClient: Abandoning block blk_-8263828216969765661 08/06/12 14:44:52 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. 08/06/12 14:44:52 WARN dfs.DFSClient: Error Recovery for block blk_-8263828216969765661 bad datanode[0] dont know what that means and didnt found something about that.. Hope somebody can help with that. Thank you! -- 朱盛凯 Jash Zhu 复旦大学软件学院 Software School, Fudan University
Re: JobClient question
You should provide JobTracker address and port through configuration. On 7/11/08, Larry Compton [EMAIL PROTECTED] wrote: I'm coming up to speed on the Hadoop APIs. I need to be able to invoke a job from within a Java application (as opposed to running from the command-line hadoop executable). The JobConf and JobClient appear to support this and I've written a test program to configure and run a job. However, the job doesn't appear to be submitted to the JobTracker. Here's a code excerpt from my client... String rdfInputPath = args[0]; String outputPath = args[1]; String uriInputPath = args[2]; String jarPath = args[3]; JobConf conf = new JobConf(MaterializeMap.class); conf.setJobName(materialize); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(Text.class); conf.setMapperClass(MaterializeMapper.class); conf.setCombinerClass(MaterializeReducer.class); conf.setReducerClass(MaterializeReducer.class); conf.setJar(jarPath); DistributedCache.addCacheFile(new Path(uriInputPath).toUri(), conf); FileInputFormat.setInputPaths(conf, new Path(rdfInputPath)); FileOutputFormat.setOutputPath(conf, new Path(outputPath)); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); JobClient.runJob(conf); It seems like I should be providing a URL to the JobTracker somewhere, but I can't figure out where to provide the information. -- Larry Compton -- 朱盛凯 Jash Zhu 复旦大学软件学院 Software School, Fudan University
Re: JobClient question
Yes, you have already invoke he submitJob method through RPC. But you have no configuration to describe your hadoop system dir, which is default set /tmp/hadoop/mapred/system So in your client program, it saved job.xml under the default dir. But in JobTracker, your configuration make the system dir /home/larry/pkg/hadoop/hdfs/mapred/system/ So you can't find the job.xml under it. On 7/11/08, Larry Compton [EMAIL PROTECTED] wrote: Thanks. Is this the correct syntax? conf.set(mapred.job.tracker, localhost:54311); It does appear to be communicating with the JobTracker now, but I get the following stack trace. Is there anything else that needs to be done to configure the job? Exception in thread main org.apache.hadoop.ipc.RemoteException: java.io.IOException: /home/larry/pkg/hadoop/hdfs/mapred/system/job_20080714_0001/job.xml: No such file or directory at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136) at org.apache.hadoop.mapred.JobInProgress.init(JobInProgress.java:175) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896) at org.apache.hadoop.ipc.Client.call(Client.java:557) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212) at $Proxy0.submitJob(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy0.submitJob(Unknown Source) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:758) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:973) at jobclient.MaterializeMain.main(MaterializeMain.java:44) On Fri, Jul 11, 2008 at 11:41 AM, Shengkai Zhu [EMAIL PROTECTED] wrote: You should provide JobTracker address and port through configuration. On 7/11/08, Larry Compton [EMAIL PROTECTED] wrote: I'm coming up to speed on the Hadoop APIs. I need to be able to invoke a job from within a Java application (as opposed to running from the command-line hadoop executable). The JobConf and JobClient appear to support this and I've written a test program to configure and run a job. However, the job doesn't appear to be submitted to the JobTracker. Here's a code excerpt from my client... String rdfInputPath = args[0]; String outputPath = args[1]; String uriInputPath = args[2]; String jarPath = args[3]; JobConf conf = new JobConf(MaterializeMap.class); conf.setJobName(materialize); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(Text.class); conf.setMapperClass(MaterializeMapper.class); conf.setCombinerClass(MaterializeReducer.class); conf.setReducerClass(MaterializeReducer.class); conf.setJar(jarPath); DistributedCache.addCacheFile(new Path(uriInputPath).toUri(), conf); FileInputFormat.setInputPaths(conf, new Path(rdfInputPath)); FileOutputFormat.setOutputPath(conf, new Path(outputPath)); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); JobClient.runJob(conf); It seems like I should be providing a URL to the JobTracker somewhere, but I can't figure out where to provide the information. -- Larry Compton -- 朱盛凯 Jash Zhu 复旦大学软件学院 Software School, Fudan University -- Larry Compton -- 朱盛凯 Jash Zhu 复旦大学软件学院 Software School, Fudan University
Re: Cannot get passwordless ssh to work right
You should chmod ssh directory and authorized_keys of the * datanode/tasktracker* instead of jobtracker. On 7/11/08, Jim Lowell [EMAIL PROTECTED] wrote: I'm trying to get a 2-node Hadoop cluster up and running on Ubuntu. I've already gotten both nodes to run Hadoop as single-node following the excellent instructions at http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster) . Now I'm trying to convert them to a 2-node cluster and am failing because I cannot get passwordless ssh to work. Long story short, the slave can currently ssh to the master without being prompted for a password, but no matter what I do the master cannot ssh to the slave without being prompted for a password. Here's everything I've tried so far: 1. Uninstalled installed openssh 2. Diffed sshd_config on both nodes, they have identical settings 3. Regenerated the RSA keys repopulated them at least a dozen times 4. Compared ~/.ssh/authorized_keys on both systems and they match 5. Updated permissions with the following: . server$ chmod go-w ~/ . server$ chmod 700 ~/.ssh . server$ chmod 600 ~/.ssh/authorized_keys 6. Run ssh -vvv on both systems and diffed the output. The output from both matches to this line: debug2: we sent a publickey packet, wait for reply On the slave (the ssh that works without a password prompt), the next line is this: debug1: Server accepts key: pkalg ssh-rsa blen 277 On the master (the ssh that always prompts for a password), the next line is this: debug1: Authentications that can continue: publickey,password I'm not sure what this output means other than the server didn't accept the key for some reason. No matter what I do, I am always faced with a prompt for a password when trying to ssh from the master to the slave node. I'd like to compare a log file for the openssh server but I haven't figured out yet where that is located or even if it exists (I'm pretty new to Linux / Ubuntu). Any help would be appreciated. At this point, I'm probably going to nuke both boxes and reinstall Ubuntu from scratch and hope for the best. I'd like to avoid that and (more importantly) learn something from this experience. - Jim
Re: Version Mismatch when accessing hdfs through a nonhadoop java application?
I've check cod ed in DataNode.java, exactly where you get the error; *...* *DataInputStream in=null;* *in = new DataInputStream( new BufferedInputStream(s.getInputStream(), BUFFER_SIZE)); short version = in.readShort(); if ( version != DATA_TRANFER_VERSION ) { throw new IOException( Version Mismatch ); }* *...* May be useful for you. On 7/11/08, Thibaut_ [EMAIL PROTECTED] wrote: Hi, I'm trying to access the hdfs of my hadoop cluster in a non hadoop application. Hadoop 0.17.1 is running on standart ports This is the code I use: FileSystem fileSystem = null; String hdfsurl = hdfs://localhost:50010; fileSystem = new DistributedFileSystem(); try { fileSystem.initialize(new URI(hdfsurl), new Configuration()); } catch (Exception e) { e.printStackTrace(); System.out.println(init error:); System.exit(1); } which fails with the exception: java.net.SocketTimeoutException: timed out waiting for rpc response at org.apache.hadoop.ipc.Client.call(Client.java:559) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212) at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:313) at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:102) at org.apache.hadoop.dfs.DFSClient.init(DFSClient.java:178) at org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:68) at com.iterend.spider.conf.Config.getRemoteFileSystem(Config.java:72) at tests.RemoteFileSystemTest.main(RemoteFileSystemTest.java:22) init error: The haddop logfile contains the following error: 2008-07-10 23:05:47,840 INFO org.apache.hadoop.dfs.Storage: Storage directory \hadoop\tmp\hadoop-sshd_server\dfs\data is not formatted. 2008-07-10 23:05:47,840 INFO org.apache.hadoop.dfs.Storage: Formatting ... 2008-07-10 23:05:47,928 INFO org.apache.hadoop.dfs.DataNode: Registered FSDatasetStatusMBean 2008-07-10 23:05:47,929 INFO org.apache.hadoop.dfs.DataNode: Opened server at 50010 2008-07-10 23:05:47,933 INFO org.apache.hadoop.dfs.DataNode: Balancing bandwith is 1048576 bytes/s 2008-07-10 23:05:48,128 INFO org.mortbay.util.Credential: Checking Resource aliases 2008-07-10 23:05:48,344 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4 2008-07-10 23:05:48,346 INFO org.mortbay.util.Container: Started HttpContext[/static,/static] 2008-07-10 23:05:48,346 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs] 2008-07-10 23:05:49,047 INFO org.mortbay.util.Container: Started [EMAIL PROTECTED] 2008-07-10 23:05:49,244 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/] 2008-07-10 23:05:49,247 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50075 2008-07-10 23:05:49,247 INFO org.mortbay.util.Container: Started [EMAIL PROTECTED] 2008-07-10 23:05:49,257 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=DataNode, sessionId=null 2008-07-10 23:05:49,535 INFO org.apache.hadoop.dfs.DataNode: New storage id DS-2117780943-192.168.1.130-50010-1215723949510 is assigned to data-node 127.0.0.1:50010 2008-07-10 23:05:49,586 INFO org.apache.hadoop.dfs.DataNode: 127.0.0.1:50010In DataNode.run, data = FSDataset{dirpath='c:\hadoop\tmp\hadoop-sshd_server\dfs\data\current'} 2008-07-10 23:05:49,586 INFO org.apache.hadoop.dfs.DataNode: using BLOCKREPORT_INTERVAL of 360msec Initial delay: 6msec 2008-07-10 23:06:04,636 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 0 blocks got processed in 11 msecs 2008-07-10 23:19:54,512 ERROR org.apache.hadoop.dfs.DataNode: 127.0.0.1:50010:DataXceiver: java.io.IOException: Version Mismatch at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:961) at java.lang.Thread.run(Thread.java:619) Any ideas how I can fix this? The haddop cluster and my application are both using the same hadoop jar! Thanks for your help, Thibaut -- View this message in context: http://www.nabble.com/Version-Mismatch-when-accessing-hdfs-through-a-nonhadoop-java-application--tp18392343p18392343.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Failed to repeat the Quickstart guide for Pseudo-distributed operation
After your formatting the namenode second time, your datanodes and namenode may stay in inconsistency, namely, under imcompatible namespace. On 7/2/08, Xuan Dzung Doan [EMAIL PROTECTED] wrote: I was exactly following the Hadoop 0.16.4 quickstart guide to run a Pseudo-distributed operation on my Fedora 8 machine. The first time I did it, everything ran successfully (formated a new hdfs, started hadoop daemons, then ran the grep example). A moment later, I decided to redo everything again. Reformating the hdfs and starting the daemons seemed to have no problem; but from the homepage of the namenode's web interface ( http://localhost:50070/), when I clicked Browse the filesystem, it said the following: HTTP ERROR: 404 /browseDirectory.jsp RequestURI=/browseDirectory.jsp Then when I tried to copy files to the hdfs to re-run the grep example, I couldn't with the following long list of exceptions (looks like some replication or block allocation issue): # bin/hadoop dfs -put conf input 08/06/29 09:38:42 INFO dfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/root/input/hadoop-env.sh could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1127) at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:312) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901) at org.apache.hadoop.ipc.Client.call(Client.java:512) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2074) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:1967) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:1487) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1601) 08/06/29 09:38:42 WARN dfs.DFSClient: NotReplicatedYetException sleeping /user/root/input/hadoop-env.sh retries left 4 08/06/29 09:38:42 INFO dfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/root/input/hadoop-env.sh could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1127) at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:312) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901) at org.apache.hadoop.ipc.Client.call(Client.java:512) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2074) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:1967) at
Re: Re: modified word count example
It's an example M-R application in Phoenix coded in C. I've no idea whether there's a popular hadoop version for it and I ported it into hadoop-style application. FYI. Src attached. On 7/9/08, heyongqiang [EMAIL PROTECTED] wrote: where i can find the Reverse-Index application? heyongqiang 2008-07-09 发件人: Shengkai Zhu 发送时间: 2008-07-09 09:06:38 收件人: core-user@hadoop.apache.org 抄送: 主题: Re: modified word count example Another Map Reduce application, Reverse-Index, behaviors similarly as you description. You can refer to that. On 7/9/08, heyongqiang [EMAIL PROTECTED] wrote: InputFormat's method RecordReader K, V getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException; return a RecordReader. You can implement your own InputFormat and RecordReader: 1)the RecorderReader remember the FileSplit(subclass of InputSplit) field in its class 2) RecordReader's createValue() method always return the FileSplit's file field. hope this helps. heyongqiang 2008-07-09 发件人: Sandy 发送时间: 2008-07-09 01:45:15 收件人: core-user@hadoop.apache.org 抄送: 主题: modified word count example Hi, Let's say I want to run a map reduce job on a series of text files (let's say x.txt y.txt and z.txt) Given the following mapper function in python (from WordCount.py): class WordCountMap(Mapper, MapReduceBase): one = IntWritable(1) # removed def map(self, key, value, output, reporter): for w in value.toString().split(): output.collect(Text(w), self.one) #how can I modify this line? Instead of creating pairs for each word found and the numeral one as the example is doing, is there a function I can invoke to store the name of the file it came from instead? thus, i'd have pairs like water, x.txt hadoop, y.txt hadoop, z.txt etc. I took a look at javadoc, but i'm not sure if I've checked in the right places. Could someone point me in the right direction? Thanks! -SM rindex.tar.gz Description: GNU Zip compressed data