Re: Decommissioning Nodes

2009-01-21 Thread Jeremy Chow
Hey Alyssa,
If one of those datanodes down, a few minutes will pass when master discover
this phenomenon. Master node takes those nodes which have not send heatbeat
for quite a while as dead ones.

On Thu, Jan 22, 2009 at 8:34 AM, Hargraves, Alyssa aly...@wpi.edu wrote:

 Hello Hadoop Users,

 I was hoping someone would be able to answer a question about node
 decommissioning.  I have a test Hadoop cluster set up which only consists of
 my computer and a master node.  I am looking at the removal and addition of
 nodes.  Adding a node is nearly instant (only about 5 seconds), but removing
 a node by decommissioning it takes a while, and I don't understand why.
 Currently, the systems are running no map/reduce tasks and storing no data.
 DFS Health reports:

 7 files and directories, 0 blocks = 7 total. Heap Size is 6.68 MB / 992.31
 MB (0%)
 Capacity:   298.02 GB
 DFS Remaining   :   245.79 GB
 DFS Used:   4 KB
 DFS Used%   :   0 %
 Live Nodes  :   2
 Dead Nodes  :   0

 Node Last ContactAdmin State Size (GB)   Used (%)
  Used (%)Remaining (GB)  Blocks
 master  0   In Service  149.01  0
122.22  0
 slave   82  Decommission In Progress149.01  0
123.58  0

 However, even with nothing stored and nothing running, the decommission
 process takes 3 to 5 minutes, and I'm not quite sure why. There isn't any
 data to move anywhere, and there aren't any jobs to worry about.  I am using
 0.18.2.

 Thank you for any help in solving this,
 Alyssa Hargraves




-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com


How to coordinate nodes of different computing powers in a same cluster?

2008-12-23 Thread Jeremy Chow
Hi list,
I've come up against a scenario like this,  to finish a same task, one of my
hadoop cluster only needs 5 seconds, and another one needs more than 2
minutes.
It's a common phenomenon that will decrease the parallelism of our system
due to the faster one will wait the slower one. How to coordinate those
nodes of different computing powers in a same cluster?

Thanks,
Jeremy
-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com


how can I decommission nodes on-the-fly?

2008-11-25 Thread Jeremy Chow
Hi list,

 I added a property dfs.hosts.exclude to my conf/hadoop-site.xml. Then
refreshed my cluster with command
 bin/hadoop dfsadmin -refreshNodes
It showed that it can only shut down the DataNode process but not included
the TaskTracker process on each slaver specified in the excludes file.
The jobtracker web still show that I hadnot shut down these nodes.
How can i totally decommission these slaver nodes on-the-fly? Is it can be
achieved only by operation on the master node?

Thanks,
Jeremy
-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com


how can I decommission nodes on-the-fly?

2008-11-25 Thread Jeremy Chow
Hi list,

 I added a property dfs.hosts.exclude to my conf/hadoop-site.xml. Then
refreshed my cluster with command
 bin/hadoop dfsadmin -refreshNodes
It showed that it can only shut down the DataNode process but not included
the TaskTracker process on each slaver specified in the excludes file.
The jobtracker web still show that I hadnot shut down these nodes.
How can i totally decommission these slaver nodes on-the-fly? Is it can be
achieved only by operation on the master node?

Thanks,
Jeremy

-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com


Re: Newbie: multiple output files

2008-11-23 Thread Jeremy Chow
Hi Tim,

You can write a class inherit from org.apache.hadoop.mapred.lib.
MultipleOutputFormat. Override method generateFileNameForKeyValue() like
this


   1. @Override
   2. protected
String generateFileNameForKeyValue(K key, V value, String name) {
   3. return name + _ + value.toString();
   4. }


you can also check out http://coderplay.javaeye.com/blog/191188 for example.

On Sun, Nov 23, 2008 at 9:12 PM, tim robertson [EMAIL PROTECTED]wrote:

 Hi,

 Can someone please point me at the best way to create multiple output
 files based on the Key outputted from the Map?  So I end up with no
 reduction, but a file per Key outputted in the Mapping phase, ideally
 with the Key as the file name.

 Many thanks,

 Tim


Re

-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com


Re: Newbie: error=24, Too many open files

2008-11-23 Thread Jeremy Chow
There are a file number limitation each process can open in unix/linux. The
default number in linux is 1024, you can use

ulimit -n number

to custom this limitation and

ulimit -n

to show this limitation.

Regards,
Jeremy
-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com


Re: A question about the combiner, reducer and the Output value class: can they be different?

2008-11-20 Thread Jeremy Chow
Hey Saptarshi ,

In fact there is a interesting wrapper can help you output many different
types of values, that is org.apache.hadoop.io.GenericWritable.
You can write your own Writable class, inherite from it. Following is its
document,

A wrapper for Writable instances.

When two sequence files, which have same Key type but different Value types,
are mapped out to reduce, multiple Value types is not allowed. In this case,
this class can help you wrap instances with different types.

Compared with ObjectWritable, this class is much more effective, because
ObjectWritable will append the class declaration as a String into the output
file in every Key-Value pair.

Generic Writable implements Configurable interface, so that it will be
configured by the framework. The configuration is passed to the wrapped
objects implementing Configurable interface *before deserialization*.
how to use it:
1. Write your own class, such as GenericObject, which extends
GenericWritable.
2. Implements the abstract method getTypes(), defines the classes which will
be wrapped in GenericObject in application. Attention: this classes defined
in getTypes() method, must implement Writable interface.

The code looks like this:

 public class GenericObject extends GenericWritable {

   private static Class[] CLASSES = {
   ClassType1.class,
   ClassType2.class,
   ClassType3.class,
   };

   protected Class[] getTypes() {
   return CLASSES;
   }

 }

For example,  in your case,

public class YourWritable extends GenericWritable {
  private static Class? extends Writable[] CLASSES = null;

  static {
CLASSES = (Class? extends Writable[]) new Class[] {
org.apache.hadoop.io.IntWritable.class,
org.apache.hadoop.io.BytesWritable.class};
  }

  public YourWritable () {
  }

  public YourWritable(Writable instance) {
set(instance);
  }

  @Override
  protected Class? extends Writable[] getTypes() {
return CLASSES;
  }
}

then modify your Jobconf like this,

  theJob.setOutputKeyClass(IntWritable.class);
  theJob.setOutputValueClass(YourWritable.class);
  ...

after that, you mapper and reducer class can be written as

   public static class ClosestCenterCB extends MapReduceBase implements
ReducerIntWritable, Text, IntWritable, YourWritable{
   public void reduce(IntWritable key, IteratorText values,
OutputCollectorIntWritable, YourWritable output, Reporter reporter){
BytesWritable outValue =  ;
ouput.collect(outKey, new YourWritable(outValue)); // wrap it
  }
   }

   public static class YourReducer extends MapReduceBase implements
ReducerIntWritable, YourWritable, IntWritable, YourWritable{
   public void reduce(IntWritable key, IteratorYourWritable
values, OutputCollectorIntWritable, YourWritable output, Reporter
reporter) throws IOException {
  // retrieve value like this
  BytesWritable realValue = (BytesWritable)
values.next().get();
  // generate the output Value, then wrap it
  Text outValue = ...;
  ouput.collect(outKey, new YourWritable(outValue));
   }
   }


you can also check out http://coderplay.javaeye.com/blog/259880,  this link
will show you a real example in every inch.


-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com


Re: Hadoop Beijing Meeting

2008-11-12 Thread Jeremy Chow
Hi Mr. He Yongqiang,
  I  apply as a speaker, though is very hurried. I have always been a fan of
hadoop. This is my technical blog, http://coderplay.javaeye.com/.

Regards,
Jeremy
-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com


Re: writable class to be used to read floating point values from input?

2008-10-25 Thread Jeremy Chow
Hi pols,



 I Have set the input format to be TextInputFormat.class   (is this right?)


 Please tell me what signature am I supposed to use for the Map/ reduce
 methods?
  as of now I am trying to write the Map function as

 public static class Map extends MapReduceBase implements
 MapperLongWritable,  FloatWritable, Text, FloatWritable
{
public void map(LongWritable key,  FloatWritable value,
 OutputCollectorText, FloatWritable output, Reporter reporter) throws
 IOException
 { ...
 ..
 }
 }

 But I am getting an error saying text cannot be cast to longwritable.


Unfortunately no, you are wrong. The default key-value type for
TextInputFormat is LongWritable, Text. A  type named LineRecordReader,
which implmented a method LineRecordeReader.next(LongWritable key, Text
value), is the default reader of TextInputFormat.  If you wanna change the
key-value type that reveived in map function, you can define your own
InputFormat and RecordReader.

-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com


Re: Failed to start datanodes

2008-09-26 Thread Jeremy Chow
Hey,

I've fixed it. :)  The server has turn on a firewall.


Regards,
Jeremy


How can I get the record number of a SequenceFile?

2008-09-24 Thread Jeremy Chow
Hi list,

I've generated a sequence file by a reducer, then I will use it to start the
second map step, which need the record number of that sequence file. How can
it fast ?
thanks a lot.

-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com


Can hadoop sort by values rather than keys?

2008-09-24 Thread Jeremy Chow
Hi list,
  The default way hadoop doing its sorting is by keys , can it sort by
values rather than keys?

Regards,
Jeremy
-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com


can each mapper's key is a file name, and value is the content of corresponding file, not just a line?

2008-05-18 Thread Jeremy Chow
Hi list,
I want to read a directory of text files using mappers, can each mapper's
key is a text file name, and value is the content of corresponding file, not
just a line?
It's seems that the MultiFileInputFormat may do this job,  how can I use it?

Thanks,
Jeremy
-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com


Can reducer output multiple files?

2008-05-08 Thread Jeremy Chow
Hi list,
I want to output my reduced results into several files according to some
types the results blongs to. How can I implement this?

Thx,
Jeremy

-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com


If I wanna read a config file before map task, which class I should choose?

2008-04-03 Thread Jeremy Chow
Hi list,

If I define a method named configure in a mapper class which try to read a
config file before all map tasks start, which class I should choose?
A normal FileReader from jdk or another Reader provided by hadoop ?  Can
anyone give me an example?

Thx,
Jeremy
-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com


Re: If I wanna read a config file before map task, which class I should choose?

2008-04-03 Thread Jeremy Chow
the config file is a normal text file.
-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com


Re: If I wanna read a config file before map task, which class I should choose?

2008-04-03 Thread Jeremy Chow
thanks, the configure file format looks like below,

@tag_name0 name0 {value00, value01, value02}
@tag_name1 name1 {value10, value11, value12}

and reading it from HDFS. Then how can I parse them ?