Re: Implementing VectorWritable
Can you please tell me , what is the functionality of those 2 methods. (How should i implement the same in this VectorWritable) .. Thanks On Tue, Dec 29, 2009 at 11:25 AM, Jeff Zhang zjf...@gmail.com wrote: The readFields and write method is empty ? When data is transfered from map phase to reduce phase, data is serialized and deserialized , so the write and readFields will be called. You should not leave them empty. Jeff Zhang On Tue, Dec 29, 2009 at 1:29 PM, bharath v bharathvissapragada1...@gmail.com wrote: Hi , I've implemented a simple VectorWritable class as follows package com; import org.apache.hadoop.*; import org.apache.hadoop.io.*; import java.io.*; import java.util.Vector; public class VectorWritable implements WritableComparable { private VectorString value = new Vector(); public VectorWritable() {} public VectorWritable(VectorString value) { set(value); } public void set(VectorString val) { this.value = val; } public VectorString get() { return this.value; } public void readFields(DataInput in) throws IOException { //value = in.readInt(); } public void write(DataOutput out) throws IOException { // out.writeInt(value); } public boolean equals(Object o) { if (!(o instanceof VectorWritable)) return false; VectorWritable other = (VectorWritable)o; return this.value.equals(other.value); } public int hashCode() { return value.hashCode(); } public int compareTo(Object o) { Vector thisValue = this.value; Vector thatValue = ((VectorWritable)o).value; return (thisValue.size()thatValue.size() ? -1 : (thisValue.size()==thatValue.size() ? 0 : 1)); } public String toString() { return value.toString(); } public static class Comparator extends WritableComparator { public Comparator() { super(VectorWritable.class); } public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { int thisValue = readInt(b1, s1); int thatValue = readInt(b2, s2); return (thisValuethatValue ? -1 : (thisValue==thatValue ? 0 : 1)); } } static {// register this comparator WritableComparator.define(VectorWritable.class, new Comparator()); } } The map phase is outputting correct Text,VectorWritable pairs .. but in reduce phase when I iterate over the values Iterable.. Iam getting the size of the vector to be 0; I think there is a minor mistake in my VectorWritable Implementation .. Can anyone point it.. Thanks
multiple jobs on the cluster?
Hi, what happens when I submit a few jobs on the cluster? To me, it seems like they all are running - which I know can't be, because I only have 2 slaves. Where do I read about this? I am using Cloudera with EC2. Thank you, Mark
Re: Implementing VectorWritable
Have a look at org.apache.hadoop.io.ArrayWritable. You may be able to use this class in your application, or at least use it as a basis for writing VectorWritable. Cheers, Tom On Tue, Dec 29, 2009 at 1:37 AM, bharath v bharathvissapragada1...@gmail.com wrote: Can you please tell me , what is the functionality of those 2 methods. (How should i implement the same in this VectorWritable) .. Thanks On Tue, Dec 29, 2009 at 11:25 AM, Jeff Zhang zjf...@gmail.com wrote: The readFields and write method is empty ? When data is transfered from map phase to reduce phase, data is serialized and deserialized , so the write and readFields will be called. You should not leave them empty. Jeff Zhang On Tue, Dec 29, 2009 at 1:29 PM, bharath v bharathvissapragada1...@gmail.com wrote: Hi , I've implemented a simple VectorWritable class as follows package com; import org.apache.hadoop.*; import org.apache.hadoop.io.*; import java.io.*; import java.util.Vector; public class VectorWritable implements WritableComparable { private VectorString value = new Vector(); public VectorWritable() {} public VectorWritable(VectorString value) { set(value); } public void set(VectorString val) { this.value = val; } public VectorString get() { return this.value; } public void readFields(DataInput in) throws IOException { //value = in.readInt(); } public void write(DataOutput out) throws IOException { // out.writeInt(value); } public boolean equals(Object o) { if (!(o instanceof VectorWritable)) return false; VectorWritable other = (VectorWritable)o; return this.value.equals(other.value); } public int hashCode() { return value.hashCode(); } public int compareTo(Object o) { Vector thisValue = this.value; Vector thatValue = ((VectorWritable)o).value; return (thisValue.size()thatValue.size() ? -1 : (thisValue.size()==thatValue.size() ? 0 : 1)); } public String toString() { return value.toString(); } public static class Comparator extends WritableComparator { public Comparator() { super(VectorWritable.class); } public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { int thisValue = readInt(b1, s1); int thatValue = readInt(b2, s2); return (thisValuethatValue ? -1 : (thisValue==thatValue ? 0 : 1)); } } static { // register this comparator WritableComparator.define(VectorWritable.class, new Comparator()); } } The map phase is outputting correct Text,VectorWritable pairs .. but in reduce phase when I iterate over the values Iterable.. Iam getting the size of the vector to be 0; I think there is a minor mistake in my VectorWritable Implementation .. Can anyone point it.. Thanks
Re: multiple jobs on the cluster?
Hi Mark, When you submit multiple jobs to the same cluster, these jobs are queued up at the jobtracker, and executed in FIFO order. Based on my understanding of the Hadoop FIFO scheduler, the order in which jobs get executed is determined by two things: (1) priority of the job. All jobs have the NORMAL priority by default, (2) the start time of the job. So in a scenario where all jobs have the same priority, they will be executed in the order in which they arrive at the cluster. If you submit multiple jobs, there is some initial processing that is done before the job gets executed at the end of which a message Running job+JOBID is printed. At this point, the job has been queued up at the jobtracker awaiting execution. Hadoop also comes with other types of scheduler, for example, the Fair Scheduler (http://hadoop.apache.org/common/docs/current/fair_scheduler.html). Hope this helps, Abhishek On Tue, Dec 29, 2009 at 12:16 PM, Mark Kerzner markkerz...@gmail.com wrote: Hi, what happens when I submit a few jobs on the cluster? To me, it seems like they all are running - which I know can't be, because I only have 2 slaves. Where do I read about this? I am using Cloudera with EC2. Thank you, Mark
Re: multiple jobs on the cluster?
Hi Mark, When you submit multiple jobs to the same cluster, these jobs are queued up at the jobtracker, and executed in FIFO order. Based on my understanding of the Hadoop FIFO scheduler, the order in which jobs get executed is determined by two things: (1) priority of the job. All jobs have the NORMAL priority by default, (2) the start time of the job. So in a scenario where all jobs have the same priority, they will be executed in the order in which they arrive at the cluster. If you submit multiple jobs, there is some initial processing that is done before the job gets executed at the end of which a message Running job+JOBID is printed. At this point, the job has been queued up at the jobtracker awaiting execution. Hadoop also comes with other types of scheduler, for example, the Fair Scheduler (http://hadoop.apache.org/common/docs/current/fair_scheduler.html). Hope this helps, Abhishek On Tue, Dec 29, 2009 at 12:16 PM, Mark Kerzner markkerz...@gmail.com wrote: Hi, what happens when I submit a few jobs on the cluster? To me, it seems like they all are running - which I know can't be, because I only have 2 slaves. Where do I read about this? I am using Cloudera with EC2. Thank you, Mark
Re: multiple jobs on the cluster?
Thank you. This explains why they appear to be running - they are queued. Mark On Tue, Dec 29, 2009 at 11:30 AM, abhishek sharma absha...@gmail.comwrote: Hi Mark, When you submit multiple jobs to the same cluster, these jobs are queued up at the jobtracker, and executed in FIFO order. Based on my understanding of the Hadoop FIFO scheduler, the order in which jobs get executed is determined by two things: (1) priority of the job. All jobs have the NORMAL priority by default, (2) the start time of the job. So in a scenario where all jobs have the same priority, they will be executed in the order in which they arrive at the cluster. If you submit multiple jobs, there is some initial processing that is done before the job gets executed at the end of which a message Running job+JOBID is printed. At this point, the job has been queued up at the jobtracker awaiting execution. Hadoop also comes with other types of scheduler, for example, the Fair Scheduler ( http://hadoop.apache.org/common/docs/current/fair_scheduler.html). Hope this helps, Abhishek On Tue, Dec 29, 2009 at 12:16 PM, Mark Kerzner markkerz...@gmail.com wrote: Hi, what happens when I submit a few jobs on the cluster? To me, it seems like they all are running - which I know can't be, because I only have 2 slaves. Where do I read about this? I am using Cloudera with EC2. Thank you, Mark
wiki spam
Hi, Not sure if this is the right mailing list for reporting it, but while browsing RecentChanges in the hadoop wiki, I noticed some recent attachment uploads with names having to do with free ringtones and celebrities in various states of undress. Maybe a captcha is needed on the account creation? http://wiki.apache.org/hadoop/RecentChanges JVS
Re: Defining the number of map tasks
in the hadoop-site.xml or hadoop-default.xml file. you can find a parameter: mapred.map.tasks. Change it value to 3. At the same time set mapred.tasktracker.map.tasks.maximum to 3 if you use only one tasktracker. On Wed, Dec 16, 2009 at 3:26 PM, psdc1978 psdc1...@gmail.com wrote: Hi, I would like to have several Map tasks that execute the same tasks. For example, I've 3 map tasks (M1, M2 and M3) and a 1Gb of input data to be read by each map. Each map should read the same input data and send the result to the same Reduce. At the end, the reduce should produce the same 3 results. Put in conf/slaves file 3 instances of the same machine file localhost localhost localhost /file does it solve the problem? How I define the number of map tasks to run? Best regards, -- xeon Chen
December Seattle Hadoop/HBase/Etc. Meetup
Greetings, Due to the holiday season, the Hadoop/HBase/Etc. Meetup is not going to happen. If anyone wants to get together for casual coffee or drinks, though, let me know! We'll be back on schedule in January. Cheers, Bradford -- http://www.drawntoscalehq.com -- Big Data for all. The Big Data Platform. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
Killing a Hadoop job
Hi, I was running a job (Cloudera distro on EC2) and I killed it with a Ctrl-C on the master. Does it really kill it? If not, is there a way to really cancel the job? Thank you, Mark
Re: Killing a Hadoop job
invoke command: hadoop job -kill jobID Jeff Zhang On Tue, Dec 29, 2009 at 10:02 PM, Mark Kerzner markkerz...@gmail.comwrote: Hi, I was running a job (Cloudera distro on EC2) and I killed it with a Ctrl-C on the master. Does it really kill it? If not, is there a way to really cancel the job? Thank you, Mark
Re: Killing a Hadoop job
Thanks! On Wed, Dec 30, 2009 at 12:07 AM, Jeff Zhang zjf...@gmail.com wrote: invoke command: hadoop job -kill jobID Jeff Zhang On Tue, Dec 29, 2009 at 10:02 PM, Mark Kerzner markkerz...@gmail.com wrote: Hi, I was running a job (Cloudera distro on EC2) and I killed it with a Ctrl-C on the master. Does it really kill it? If not, is there a way to really cancel the job? Thank you, Mark