Re: Decommissioning Nodes
Hey Alyssa, If one of those datanodes down, a few minutes will pass when master discover this phenomenon. Master node takes those nodes which have not send heatbeat for quite a while as dead ones. On Thu, Jan 22, 2009 at 8:34 AM, Hargraves, Alyssa aly...@wpi.edu wrote: Hello Hadoop Users, I was hoping someone would be able to answer a question about node decommissioning. I have a test Hadoop cluster set up which only consists of my computer and a master node. I am looking at the removal and addition of nodes. Adding a node is nearly instant (only about 5 seconds), but removing a node by decommissioning it takes a while, and I don't understand why. Currently, the systems are running no map/reduce tasks and storing no data. DFS Health reports: 7 files and directories, 0 blocks = 7 total. Heap Size is 6.68 MB / 992.31 MB (0%) Capacity: 298.02 GB DFS Remaining : 245.79 GB DFS Used: 4 KB DFS Used% : 0 % Live Nodes : 2 Dead Nodes : 0 Node Last ContactAdmin State Size (GB) Used (%) Used (%)Remaining (GB) Blocks master 0 In Service 149.01 0 122.22 0 slave 82 Decommission In Progress149.01 0 123.58 0 However, even with nothing stored and nothing running, the decommission process takes 3 to 5 minutes, and I'm not quite sure why. There isn't any data to move anywhere, and there aren't any jobs to worry about. I am using 0.18.2. Thank you for any help in solving this, Alyssa Hargraves -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. http://coderplay.javaeye.com
How to coordinate nodes of different computing powers in a same cluster?
Hi list, I've come up against a scenario like this, to finish a same task, one of my hadoop cluster only needs 5 seconds, and another one needs more than 2 minutes. It's a common phenomenon that will decrease the parallelism of our system due to the faster one will wait the slower one. How to coordinate those nodes of different computing powers in a same cluster? Thanks, Jeremy -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. http://coderplay.javaeye.com
how can I decommission nodes on-the-fly?
Hi list, I added a property dfs.hosts.exclude to my conf/hadoop-site.xml. Then refreshed my cluster with command bin/hadoop dfsadmin -refreshNodes It showed that it can only shut down the DataNode process but not included the TaskTracker process on each slaver specified in the excludes file. The jobtracker web still show that I hadnot shut down these nodes. How can i totally decommission these slaver nodes on-the-fly? Is it can be achieved only by operation on the master node? Thanks, Jeremy -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. http://coderplay.javaeye.com
how can I decommission nodes on-the-fly?
Hi list, I added a property dfs.hosts.exclude to my conf/hadoop-site.xml. Then refreshed my cluster with command bin/hadoop dfsadmin -refreshNodes It showed that it can only shut down the DataNode process but not included the TaskTracker process on each slaver specified in the excludes file. The jobtracker web still show that I hadnot shut down these nodes. How can i totally decommission these slaver nodes on-the-fly? Is it can be achieved only by operation on the master node? Thanks, Jeremy -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. http://coderplay.javaeye.com
Re: Newbie: multiple output files
Hi Tim, You can write a class inherit from org.apache.hadoop.mapred.lib. MultipleOutputFormat. Override method generateFileNameForKeyValue() like this 1. @Override 2. protected String generateFileNameForKeyValue(K key, V value, String name) { 3. return name + _ + value.toString(); 4. } you can also check out http://coderplay.javaeye.com/blog/191188 for example. On Sun, Nov 23, 2008 at 9:12 PM, tim robertson [EMAIL PROTECTED]wrote: Hi, Can someone please point me at the best way to create multiple output files based on the Key outputted from the Map? So I end up with no reduction, but a file per Key outputted in the Mapping phase, ideally with the Key as the file name. Many thanks, Tim Re -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. http://coderplay.javaeye.com
Re: Newbie: error=24, Too many open files
There are a file number limitation each process can open in unix/linux. The default number in linux is 1024, you can use ulimit -n number to custom this limitation and ulimit -n to show this limitation. Regards, Jeremy -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. http://coderplay.javaeye.com
Re: A question about the combiner, reducer and the Output value class: can they be different?
Hey Saptarshi , In fact there is a interesting wrapper can help you output many different types of values, that is org.apache.hadoop.io.GenericWritable. You can write your own Writable class, inherite from it. Following is its document, A wrapper for Writable instances. When two sequence files, which have same Key type but different Value types, are mapped out to reduce, multiple Value types is not allowed. In this case, this class can help you wrap instances with different types. Compared with ObjectWritable, this class is much more effective, because ObjectWritable will append the class declaration as a String into the output file in every Key-Value pair. Generic Writable implements Configurable interface, so that it will be configured by the framework. The configuration is passed to the wrapped objects implementing Configurable interface *before deserialization*. how to use it: 1. Write your own class, such as GenericObject, which extends GenericWritable. 2. Implements the abstract method getTypes(), defines the classes which will be wrapped in GenericObject in application. Attention: this classes defined in getTypes() method, must implement Writable interface. The code looks like this: public class GenericObject extends GenericWritable { private static Class[] CLASSES = { ClassType1.class, ClassType2.class, ClassType3.class, }; protected Class[] getTypes() { return CLASSES; } } For example, in your case, public class YourWritable extends GenericWritable { private static Class? extends Writable[] CLASSES = null; static { CLASSES = (Class? extends Writable[]) new Class[] { org.apache.hadoop.io.IntWritable.class, org.apache.hadoop.io.BytesWritable.class}; } public YourWritable () { } public YourWritable(Writable instance) { set(instance); } @Override protected Class? extends Writable[] getTypes() { return CLASSES; } } then modify your Jobconf like this, theJob.setOutputKeyClass(IntWritable.class); theJob.setOutputValueClass(YourWritable.class); ... after that, you mapper and reducer class can be written as public static class ClosestCenterCB extends MapReduceBase implements ReducerIntWritable, Text, IntWritable, YourWritable{ public void reduce(IntWritable key, IteratorText values, OutputCollectorIntWritable, YourWritable output, Reporter reporter){ BytesWritable outValue = ; ouput.collect(outKey, new YourWritable(outValue)); // wrap it } } public static class YourReducer extends MapReduceBase implements ReducerIntWritable, YourWritable, IntWritable, YourWritable{ public void reduce(IntWritable key, IteratorYourWritable values, OutputCollectorIntWritable, YourWritable output, Reporter reporter) throws IOException { // retrieve value like this BytesWritable realValue = (BytesWritable) values.next().get(); // generate the output Value, then wrap it Text outValue = ...; ouput.collect(outKey, new YourWritable(outValue)); } } you can also check out http://coderplay.javaeye.com/blog/259880, this link will show you a real example in every inch. -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. http://coderplay.javaeye.com
Re: Hadoop Beijing Meeting
Hi Mr. He Yongqiang, I apply as a speaker, though is very hurried. I have always been a fan of hadoop. This is my technical blog, http://coderplay.javaeye.com/. Regards, Jeremy -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. http://coderplay.javaeye.com
Re: writable class to be used to read floating point values from input?
Hi pols, I Have set the input format to be TextInputFormat.class (is this right?) Please tell me what signature am I supposed to use for the Map/ reduce methods? as of now I am trying to write the Map function as public static class Map extends MapReduceBase implements MapperLongWritable, FloatWritable, Text, FloatWritable { public void map(LongWritable key, FloatWritable value, OutputCollectorText, FloatWritable output, Reporter reporter) throws IOException { ... .. } } But I am getting an error saying text cannot be cast to longwritable. Unfortunately no, you are wrong. The default key-value type for TextInputFormat is LongWritable, Text. A type named LineRecordReader, which implmented a method LineRecordeReader.next(LongWritable key, Text value), is the default reader of TextInputFormat. If you wanna change the key-value type that reveived in map function, you can define your own InputFormat and RecordReader. -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. http://coderplay.javaeye.com
Re: Failed to start datanodes
Hey, I've fixed it. :) The server has turn on a firewall. Regards, Jeremy
How can I get the record number of a SequenceFile?
Hi list, I've generated a sequence file by a reducer, then I will use it to start the second map step, which need the record number of that sequence file. How can it fast ? thanks a lot. -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. http://coderplay.javaeye.com
Can hadoop sort by values rather than keys?
Hi list, The default way hadoop doing its sorting is by keys , can it sort by values rather than keys? Regards, Jeremy -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. http://coderplay.javaeye.com
can each mapper's key is a file name, and value is the content of corresponding file, not just a line?
Hi list, I want to read a directory of text files using mappers, can each mapper's key is a text file name, and value is the content of corresponding file, not just a line? It's seems that the MultiFileInputFormat may do this job, how can I use it? Thanks, Jeremy -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. http://coderplay.javaeye.com
Can reducer output multiple files?
Hi list, I want to output my reduced results into several files according to some types the results blongs to. How can I implement this? Thx, Jeremy -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. http://coderplay.javaeye.com
If I wanna read a config file before map task, which class I should choose?
Hi list, If I define a method named configure in a mapper class which try to read a config file before all map tasks start, which class I should choose? A normal FileReader from jdk or another Reader provided by hadoop ? Can anyone give me an example? Thx, Jeremy -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. http://coderplay.javaeye.com
Re: If I wanna read a config file before map task, which class I should choose?
the config file is a normal text file. -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. http://coderplay.javaeye.com
Re: If I wanna read a config file before map task, which class I should choose?
thanks, the configure file format looks like below, @tag_name0 name0 {value00, value01, value02} @tag_name1 name1 {value10, value11, value12} and reading it from HDFS. Then how can I parse them ?