Re: Implementing VectorWritable

2009-12-29 Thread bharath v
Can you please tell me , what is the functionality of those 2 methods.
(How should i implement the same in this VectorWritable) ..

Thanks

On Tue, Dec 29, 2009 at 11:25 AM, Jeff Zhang zjf...@gmail.com wrote:

 The readFields and write method is empty ?

 When data is transfered from map phase to reduce phase, data is serialized
 and deserialized , so the write and readFields will be called. You should
 not leave them empty.


 Jeff Zhang


 On Tue, Dec 29, 2009 at 1:29 PM, bharath v 
 bharathvissapragada1...@gmail.com wrote:

  Hi ,
 
  I've implemented a simple VectorWritable class as follows
 
 
  package com;
 
  import org.apache.hadoop.*;
  import org.apache.hadoop.io.*;
  import java.io.*;
  import java.util.Vector;
 
 
  public class VectorWritable implements WritableComparable {
   private VectorString value = new Vector();
 
   public VectorWritable() {}
 
   public VectorWritable(VectorString value) { set(value); }
 
   public void set(VectorString val) { this.value = val;
   }
 
   public VectorString get() { return this.value; }
 
   public void readFields(DataInput in) throws IOException {
 //value = in.readInt();
   }
 
   public void write(DataOutput out) throws IOException {
   //  out.writeInt(value);
   }
 
   public boolean equals(Object o) {
 if (!(o instanceof VectorWritable))
   return false;
 VectorWritable other = (VectorWritable)o;
 return this.value.equals(other.value);
   }
 
   public int hashCode() {
 return value.hashCode();
   }
 
   public int compareTo(Object o) {
 Vector thisValue = this.value;
 Vector thatValue = ((VectorWritable)o).value;
 return (thisValue.size()thatValue.size() ? -1 :
  (thisValue.size()==thatValue.size() ? 0 : 1));
   }
 
   public String toString() {
 return value.toString();
   }
 
   public static class Comparator extends WritableComparator {
 public Comparator() {
   super(VectorWritable.class);
 }
 
 public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
 
   int thisValue = readInt(b1, s1);
   int thatValue = readInt(b2, s2);
   return (thisValuethatValue ? -1 : (thisValue==thatValue ? 0 : 1));
 }
   }
 
   static {// register this
  comparator
 WritableComparator.define(VectorWritable.class, new Comparator());
   }
  }
 
  The map phase is outputting correct Text,VectorWritable pairs .. but in
  reduce phase
  when I iterate over the values Iterable.. Iam getting the size of the
  vector
  to be 0; I think there is a minor
  mistake in my VectorWritable Implementation .. Can anyone point it..
 
  Thanks
 



multiple jobs on the cluster?

2009-12-29 Thread Mark Kerzner
Hi,

what happens when I submit a few jobs on the cluster? To me, it seems like
they all are running - which I know can't be, because I only have 2 slaves.
Where do I read about this?

I am using Cloudera with EC2.

Thank you,
Mark


Re: Implementing VectorWritable

2009-12-29 Thread Tom White
Have a look at org.apache.hadoop.io.ArrayWritable. You may be able to
use this class in your application, or at least use it as a basis for
writing VectorWritable.

Cheers,
Tom

On Tue, Dec 29, 2009 at 1:37 AM, bharath v
bharathvissapragada1...@gmail.com wrote:
 Can you please tell me , what is the functionality of those 2 methods.
 (How should i implement the same in this VectorWritable) ..

 Thanks

 On Tue, Dec 29, 2009 at 11:25 AM, Jeff Zhang zjf...@gmail.com wrote:

 The readFields and write method is empty ?

 When data is transfered from map phase to reduce phase, data is serialized
 and deserialized , so the write and readFields will be called. You should
 not leave them empty.


 Jeff Zhang


 On Tue, Dec 29, 2009 at 1:29 PM, bharath v 
 bharathvissapragada1...@gmail.com wrote:

  Hi ,
 
  I've implemented a simple VectorWritable class as follows
 
 
  package com;
 
  import org.apache.hadoop.*;
  import org.apache.hadoop.io.*;
  import java.io.*;
  import java.util.Vector;
 
 
  public class VectorWritable implements WritableComparable {
   private VectorString value = new Vector();
 
   public VectorWritable() {}
 
   public VectorWritable(VectorString value) { set(value); }
 
   public void set(VectorString val) { this.value = val;
   }
 
   public VectorString get() { return this.value; }
 
   public void readFields(DataInput in) throws IOException {
     //value = in.readInt();
   }
 
   public void write(DataOutput out) throws IOException {
   //  out.writeInt(value);
   }
 
   public boolean equals(Object o) {
     if (!(o instanceof VectorWritable))
       return false;
     VectorWritable other = (VectorWritable)o;
     return this.value.equals(other.value);
   }
 
   public int hashCode() {
     return value.hashCode();
   }
 
   public int compareTo(Object o) {
     Vector thisValue = this.value;
     Vector thatValue = ((VectorWritable)o).value;
     return (thisValue.size()thatValue.size() ? -1 :
  (thisValue.size()==thatValue.size() ? 0 : 1));
   }
 
   public String toString() {
     return value.toString();
   }
 
   public static class Comparator extends WritableComparator {
     public Comparator() {
       super(VectorWritable.class);
     }
 
     public int compare(byte[] b1, int s1, int l1,
                        byte[] b2, int s2, int l2) {
 
       int thisValue = readInt(b1, s1);
       int thatValue = readInt(b2, s2);
       return (thisValuethatValue ? -1 : (thisValue==thatValue ? 0 : 1));
     }
   }
 
   static {                                        // register this
  comparator
     WritableComparator.define(VectorWritable.class, new Comparator());
   }
  }
 
  The map phase is outputting correct Text,VectorWritable pairs .. but in
  reduce phase
  when I iterate over the values Iterable.. Iam getting the size of the
  vector
  to be 0; I think there is a minor
  mistake in my VectorWritable Implementation .. Can anyone point it..
 
  Thanks
 




Re: multiple jobs on the cluster?

2009-12-29 Thread abhishek sharma
Hi Mark,

When you submit multiple jobs to the same cluster, these jobs are
queued up at the jobtracker, and executed in FIFO order.

Based on my understanding of the Hadoop FIFO scheduler, the order in
which jobs get executed is determined by two things: (1) priority of
the job. All jobs have the NORMAL priority by default, (2) the start
time of the job.  So in a scenario where all jobs have the same
priority, they will be executed in the order in which they arrive at
the cluster.

If you submit multiple jobs, there is some initial processing that is
done before the job gets executed at the end of which a message
Running job+JOBID is printed. At this point, the job has been queued
up at the jobtracker awaiting execution.

Hadoop also comes with other types of scheduler, for example, the Fair
Scheduler (http://hadoop.apache.org/common/docs/current/fair_scheduler.html).

Hope this helps,
Abhishek

On Tue, Dec 29, 2009 at 12:16 PM, Mark Kerzner markkerz...@gmail.com wrote:
 Hi,

 what happens when I submit a few jobs on the cluster? To me, it seems like
 they all are running - which I know can't be, because I only have 2 slaves.
 Where do I read about this?

 I am using Cloudera with EC2.

 Thank you,
 Mark



Re: multiple jobs on the cluster?

2009-12-29 Thread abhishek sharma
Hi Mark,

When you submit multiple jobs to the same cluster, these jobs are
queued up at the jobtracker, and executed in FIFO order.

Based on my understanding of the Hadoop FIFO scheduler, the order in
which jobs get executed is determined by two things: (1) priority of
the job. All jobs have the NORMAL priority by default, (2) the start
time of the job.  So in a scenario where all jobs have the same
priority, they will be executed in the order in which they arrive at
the cluster.

If you submit multiple jobs, there is some initial processing that is
done before the job gets executed at the end of which a message
Running job+JOBID is printed. At this point, the job has been queued
up at the jobtracker awaiting execution.

Hadoop also comes with other types of scheduler, for example, the Fair
Scheduler (http://hadoop.apache.org/common/docs/current/fair_scheduler.html).

Hope this helps,
Abhishek

On Tue, Dec 29, 2009 at 12:16 PM, Mark Kerzner markkerz...@gmail.com wrote:
 Hi,

 what happens when I submit a few jobs on the cluster? To me, it seems like
 they all are running - which I know can't be, because I only have 2 slaves.
 Where do I read about this?

 I am using Cloudera with EC2.

 Thank you,
 Mark



Re: multiple jobs on the cluster?

2009-12-29 Thread Mark Kerzner
Thank you. This explains why they appear to be running - they are queued.

Mark

On Tue, Dec 29, 2009 at 11:30 AM, abhishek sharma absha...@gmail.comwrote:

 Hi Mark,

 When you submit multiple jobs to the same cluster, these jobs are
 queued up at the jobtracker, and executed in FIFO order.

 Based on my understanding of the Hadoop FIFO scheduler, the order in
 which jobs get executed is determined by two things: (1) priority of
 the job. All jobs have the NORMAL priority by default, (2) the start
 time of the job.  So in a scenario where all jobs have the same
 priority, they will be executed in the order in which they arrive at
 the cluster.

 If you submit multiple jobs, there is some initial processing that is
 done before the job gets executed at the end of which a message
 Running job+JOBID is printed. At this point, the job has been queued
 up at the jobtracker awaiting execution.

 Hadoop also comes with other types of scheduler, for example, the Fair
 Scheduler (
 http://hadoop.apache.org/common/docs/current/fair_scheduler.html).

 Hope this helps,
 Abhishek

 On Tue, Dec 29, 2009 at 12:16 PM, Mark Kerzner markkerz...@gmail.com
 wrote:
  Hi,
 
  what happens when I submit a few jobs on the cluster? To me, it seems
 like
  they all are running - which I know can't be, because I only have 2
 slaves.
  Where do I read about this?
 
  I am using Cloudera with EC2.
 
  Thank you,
  Mark
 



wiki spam

2009-12-29 Thread John Sichi
Hi,

Not sure if this is the right mailing list for reporting it, but while browsing 
RecentChanges in the hadoop wiki, I noticed some recent attachment uploads with 
names having to do with free ringtones and celebrities in various states of 
undress.  Maybe a captcha is needed on the account creation?

http://wiki.apache.org/hadoop/RecentChanges

JVS



Re: Defining the number of map tasks

2009-12-29 Thread He Chen
in the hadoop-site.xml or hadoop-default.xml file. you can find a parameter:
mapred.map.tasks. Change it value to 3. At the same time set
mapred.tasktracker.map.tasks.maximum to 3 if you use only one tasktracker.

On Wed, Dec 16, 2009 at 3:26 PM, psdc1978 psdc1...@gmail.com wrote:

 Hi,

 I would like to have several Map tasks that execute the same tasks.
 For example, I've 3 map tasks (M1, M2 and M3) and a 1Gb of input data
 to be read by each map. Each map should read the same input data and
 send the result to the same Reduce. At the end, the reduce should
 produce the same 3 results.

 Put in conf/slaves file 3 instances of the same machine

 file
 localhost
 localhost
 localhost
 /file

 does it solve the problem?


 How I define the number of map tasks to run?



 Best regards,
 --
 xeon


Chen


December Seattle Hadoop/HBase/Etc. Meetup

2009-12-29 Thread Bradford Stephens
Greetings,

Due to the holiday season, the Hadoop/HBase/Etc. Meetup is not going to
happen. If anyone wants to get together for casual coffee or drinks, though,
let me know! We'll be back on schedule in January.

Cheers,
Bradford

-- 
http://www.drawntoscalehq.com -- Big Data for all. The Big Data Platform.

http://www.roadtofailure.com -- The Fringes of Scalability, Social Media,
and Computer Science


Killing a Hadoop job

2009-12-29 Thread Mark Kerzner
Hi,

I was running a job (Cloudera distro on EC2) and I killed it with a Ctrl-C
on the master. Does it really kill it? If not, is there a way to really
cancel the job?

Thank you,
Mark


Re: Killing a Hadoop job

2009-12-29 Thread Jeff Zhang
invoke command: hadoop job -kill jobID


Jeff Zhang


On Tue, Dec 29, 2009 at 10:02 PM, Mark Kerzner markkerz...@gmail.comwrote:

 Hi,

 I was running a job (Cloudera distro on EC2) and I killed it with a Ctrl-C
 on the master. Does it really kill it? If not, is there a way to really
 cancel the job?

 Thank you,
 Mark



Re: Killing a Hadoop job

2009-12-29 Thread Mark Kerzner
Thanks!

On Wed, Dec 30, 2009 at 12:07 AM, Jeff Zhang zjf...@gmail.com wrote:

 invoke command: hadoop job -kill jobID


 Jeff Zhang


 On Tue, Dec 29, 2009 at 10:02 PM, Mark Kerzner markkerz...@gmail.com
 wrote:

  Hi,
 
  I was running a job (Cloudera distro on EC2) and I killed it with a
 Ctrl-C
  on the master. Does it really kill it? If not, is there a way to really
  cancel the job?
 
  Thank you,
  Mark