Re: Which release to use?

2011-07-19 Thread Rita
Arun,

I second Joeś comment.
Thanks for giving us a heads up.
I will wait patiently until 0.23 is considered stable.


On Mon, Jul 18, 2011 at 11:19 PM, Joe Stein
charmal...@allthingshadoop.comwrote:

 Arun,

 Thanks for the update.

 Again, I hate to have to play the part of captain obvious.

 Glad to hear the same contiguous mantra for this next release.  I think
 sometimes the plebeians ( of which I am one ) need that affirmation.

 One love, Apache Hadoop!

 /*
 Joe Stein
 http://www.medialets.com
 Twitter: @allthingshadoop
 */

 On Jul 18, 2011, at 11:06 PM, Arun Murthy a...@hortonworks.com wrote:

  Joe,
 
  The dev community is currently gearing up for hadoop-0.23 off trunk.
 
  0.23 is a massive step forward with with HDFS Federation, NextGen
  MapReduce and possible others such as wire-compat and HA NameNode.
 
  In a couple of weeks I plan to create the 0.23 branch off trunk and we
  then spend all our energies stabilizing  pushing the release out.
  Please see my note to general@ for more details.
 
  Arun
 
  On Jul 18, 2011, at 7:01 PM, Joe Stein charmal...@allthingshadoop.com
 wrote:
 
  So, last I checked this list was about Apache Hadoop not about
 derivative works.
 
  The Cloudera team has always been diligent (you rock) about redirecting
 non apache CDH releases to their list for answers.
 
  I commend those supporting apache releases of Hadoop too, very cool!!!
 
  But yeah, even I have to ask what the latest release will be.  Is there
 going to be a single Hadoop release or a continued branch that Horton
 maintains and will only support?
 
  There is something to be said for release from trunk that gets everyone
 on the same page towards our common goals.  You can pin the state the
 obvious paper on my back but kinda feel it had to be said.
 
  One love, Apache Hadoop!
 
  /*
  Joe Stein
  http://www.medialets.com
  Twitter: @allthingshadoop
  */
 
  On Jul 18, 2011, at 9:51 PM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 
 
 
  Date: Mon, 18 Jul 2011 18:19:38 -0700
  Subject: Re: Which release to use?
  From: mcsri...@gmail.com
  To: common-user@hadoop.apache.org
 
  Mike,
 
  Just a minor inaccuracy in your email. Here's setting the record
 straight:
 
  1. MapR directly sells their distribution of Hadoop. Support is from
  MapR.
  2. EMC also sells the MapR distribution, for use on any hardware.
 Support is
  from EMC worldwide.
  3. EMC also sells a Hadoop appliance, which has the MapR distribution
  specially built for it. Support is from EMC.
 
  4. MapR also has a free, unlimited, unrestricted version called M3,
 which
  has the same 2-5x performance, management and stability improvements,
 and
  includes NFS. It is not crippleware, and the unlimited, unrestricted,
 free
  use does not expire on any date.
 
  Hope that clarifies what MapR is doing.
 
  thanks  regards,
  Srivas.
 
  Srivas,
 
  I'm sorry, I thought I was being clear in that I was only addressing
 EMC and not MapR directly.
  I was responding to post about EMC selling a Greenplum appliance. I
 wanted to point out that EMC will resell MapR's release along with their own
 (EMC) support.
 
  The point I was trying to make was that with respect to derivatives of
 Hadoop, I believe that MapR has a more compelling story than either EMC or
 DataStax. IMHO replacing Java HDFS w either GreenPlum or Cassandra has a
 limited market.  When a company is going to look at a M/R solution cost and
 performance are going to be at the top of the list. MapR isn't cheap but if
 you look at the features in M5, if they work, then you have a very
 compelling reason to look at their release. Some of the people I spoke to
 when I was in Santa Clara were in the beta program. They indicated that MapR
 did what they claimed.
 
  Things are definitely starting to look interesting.
 
  -Mike
 
  On Mon, Jul 18, 2011 at 11:33 AM, Michael Segel
  michael_se...@hotmail.comwrote:
 
 
  EMC has inked a deal with MapRTech to resell their release and
 support
  services for MapRTech.
  Does this mean that they are going to stop selling their own release
 on
  Greenplum? Maybe not in the near future, however,
  a Greenplum appliance may not get the customer transaction that their
  reselling of MapR will generate.
 
  It sounds like they are hedging their bets and are taking an 'IBM'
  approach.
 
 
  Subject: RE: Which release to use?
  Date: Mon, 18 Jul 2011 08:30:59 -0500
  From: jeff.schm...@shell.com
  To: common-user@hadoop.apache.org
 
  Steve,
 
  I read your blog nice post - I believe EMC is selling the Greenplumb
  solution as an appliance -
 
  Cheers -
 
  Jeffery
 
  -Original Message-
  From: Steve Loughran [mailto:ste...@apache.org]
  Sent: Friday, July 15, 2011 4:07 PM
  To: common-user@hadoop.apache.org
  Subject: Re: Which release to use?
 
  On 15/07/2011 18:06, Arun C Murthy wrote:
  Apache Hadoop is a volunteer driven, open-source project. The
  contributors to Apache Hadoop, both individuals and folks across a
  

Re: Which release to use?

2011-07-19 Thread Steve Loughran

On 19/07/11 12:44, Rita wrote:

Arun,

I second Joeś comment.
Thanks for giving us a heads up.
I will wait patiently until 0.23 is considered stable.



API-wise, 0.21 is better. I know that as I'm working with 0.20.203 right 
now, and it is a step backwards.


Regarding future releases, the best way to get it stable is participate 
in release testing in your own infrastructure. Nothing else will find 
the problems unique to your setup of hardware, network and software




RE: Hadoop upgrade Java version

2011-07-19 Thread Michael Segel

Yeah... you can do that...

I haven't tried to mix/match different releases within a cluster, although I 
suspect I could without any problems, but I don't want to risk it.

Until we have a problem, or until we expand our clouds with a batch of new 
nodes, I like to follow the mantra... if it aint broke, don't fix it. 
(I would suggest if / when you upgrade your java that you bounce the cloud. 
Even with a rolling restart, you have to plan for it...)



 Date: Mon, 18 Jul 2011 22:54:54 -0500
 Subject: RE: Hadoop upgrade Java version
 From: jshrini...@gmail.com
 To: common-user@hadoop.apache.org
 
 We are using Oracle JDK 6 update 26 and have not observed any problems so
 far. EA of JDK 6 update 27 is available now. We are planning to move to
 update 27 when the GA release is made available.
 
 -Shrinivas
 On Jul 18, 2011 7:52 PM, Michael Segel michael_se...@hotmail.com wrote:
 
  Any release after _21 seems to work fine.
 
 
  CC: highpoint...@gmail.com; common-user@hadoop.apache.org
  From: john.c.st...@gmail.com
  Subject: Re: Hadoop upgrade Java version
  Date: Mon, 18 Jul 2011 19:37:02 -0600
  To: common-user@hadoop.apache.org
 
  We're using u26 without any problems.
 
  On Jul 18, 2011, at 4:45 PM, highpointe highpoint...@gmail.com wrote:
 
   So uhm yeah. Thanks for the Informica commercial.
  
   Now back to my original question.
  
   Anyone have a suggestion on what version of Java I should be using with
 the latest Hadoop release.
  
   Sent from my iPhone
  
   On Jul 18, 2011, at 11:26 AM, high pointe highpoint...@gmail.com
 wrote:
  
   We are in the process of upgrading to the most current version of
 Hadoop.
  
   At the same time we are in need of upgrading Java. We are currently
 running u17.
  
   I have read elsewhere that u21 or up is the best route to go.
 Currently the version is u26.
  
   Has anyone gone all the way to u26 with or without issues?
  
   Thanks for the help.
 
  

Re: Which release to use?

2011-07-19 Thread Vitalii Tymchyshyn

19.07.11 14:50, Steve Loughran написав(ла):

On 19/07/11 12:44, Rita wrote:

Arun,

I second Joeś comment.
Thanks for giving us a heads up.
I will wait patiently until 0.23 is considered stable.



API-wise, 0.21 is better. I know that as I'm working with 0.20.203 
right now, and it is a step backwards.


Regarding future releases, the best way to get it stable is 
participate in release testing in your own infrastructure. Nothing 
else will find the problems unique to your setup of hardware, network 
and software




My little hadoop adoption story (or why I won't test 0.23)
I am among those who think that latest release is what is supported and 
so we got to 0.21 way.
BTW: I've tried to find some release roadmap, but could not find 
anything up to date.

We are using HDFR without Map/Reduce.
As far as I can see now 0.21 nowhere near beta quality with non-working 
new features like backup node or append. Also there is no option for 
such unlucky people to back off to 0.20 (at least hadoop downgrade 
search do not give any good results).
I did already fill 5 tickets in Jira, 3 of them with patches. On two 
there is no activity at all, on other three answer is the latest 
non-autogenerated message (and over 3 weeks old).

I did send few messages to this list, one to hdfs-user. No answers.
With this level of project activity, I can't afford to test a thing that 
have not got to 0.21 quality level yet. If I will have any problems, I 
can't afford to wait for months to be heard.
I am more or less stable on my own patched 0.21 for now and will either 
move forward if I will see more project activity or move somewhere else 
if it will become less stable.


Best regards, Vitalii Tymchyshyn


RE: Hadoop Discrete Event Simulator

2011-07-19 Thread Jeff.Schmitz
Maneesh, 

You may want to check this out

https://issues.apache.org/jira/browse/HADOOP-5005

-Original Message-
From: maneesh varshney [mailto:mvarsh...@gmail.com] 
Sent: Monday, July 18, 2011 8:09 PM
To: common-user@hadoop.apache.org
Subject: Hadoop Discrete Event Simulator

Hello,

Perhaps somebody can point out if there have been efforts to simulate
Hadoop clusters.

What I mean is a discrete event simulator that models the hosts and the
networks and run hadoop algorithms for some synthetic workload.
Something
similar to network simulators (for example, ns2).

If such as tool is available, I was hoping to use it for:
a. Getting a general sense of how the HDFS and MapReduce algorithms
work.
For example, if I were to store 1TB data over 100 nodes, how would the
blocks get distributed.
b. Use the simulation to optimize my configuration parameters. For
example,
the relationship between performance and number of cluster node, or
number
of replicas, and so on.

The need for point b. above is to be able to study/analyze the
performance
without (or before) actually running the algorithms on an actual
cluster.

Thanks in advance,
Maneesh

PS: I apologize if this question has been asked earlier. I could not
seem to
locate the search feature in the mailing list archive.



RE: Hadoop upgrade Java version

2011-07-19 Thread Jeff.Schmitz
I am using this

java version 1.6.0_15
Java(TM) SE Runtime Environment (build 1.6.0_15-b03)
Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02, mixed mode)

With latest release and it works fine - 

Cheers - 

JGS

-Original Message-
From: highpointe [mailto:highpoint...@gmail.com] 
Sent: Monday, July 18, 2011 5:45 PM
To: high pointe
Cc: common-user@hadoop.apache.org
Subject: Re: Hadoop upgrade Java version

So uhm yeah. Thanks for the Informica  commercial. 

Now back to my original question. 

Anyone have a suggestion on what version of Java I should be using with
the latest Hadoop release. 

Sent from my iPhone

On Jul 18, 2011, at 11:26 AM, high pointe highpoint...@gmail.com
wrote:

 We are in the process of upgrading to the most current version of
Hadoop.
 
 At the same time we are in need of upgrading Java.  We are currently
running u17.
 
 I have read elsewhere that u21 or up is the best route to go.
Currently the version is u26.
 
 Has anyone gone all the way to u26 with or without issues?
 
 Thanks for the help.




RE: Hadoop upgrade Java version

2011-07-19 Thread Isaac Dooley
_24 seems to work fine on my cluster. 


How would you translate this into MapReduce?

2011-07-19 Thread Em
Hello list,

sorry, sent this email to the wrong list, I think (MapReduce-list had
*no* activity the last whole day?).

As a newbie I got a tricky use-case in mind which I want to implement
with Hadoop to train my skillz. There is no real scenario behind that,
so I can extend or shrink the problem to the extent I like. The problems
I got are coming from a conceptual point of view and from the lack of
experience with Hadoop itself.

Here is what I want to do:

I create random lists of person-IDs and places plus a time-value.

The result of my map-reduce-operations should be something like that:
The key is a place and the value is a list of places that were visited
by persons after they visited the key-place.
Additionally the value should be sorted in a way were I use some
time/count-biased metric. This way the value-list should reflect the
place which was the most popular i.e. second-station on a tour.

I think this is a complex almost real-world-scenario.

In pseudo-code it will be something like this:
for every place p
  for every person m that visited p
select list l of all the places that m visited after p
write a key-value-pair p=l to disc and l is in order of the visits

for every key k in the list of key-value-pairs
   get the value list of places v for k -
   create another key-value-pair pv where the key is the place and
the value is its index in v (for a place p in v)

   for every k
  get all pv
  for every pv aggregate the key-value-pairs by key and sum up
the index i for every place p so that it becomes the kv-pair opv
  sort opv in ascending order by its value

The result would be what I wanted, no?

It looks like I need multiple MR-phases, however I do not even know how
to start.

My first guess is: Create a MR-Job where I invert my list so that I got
a place as the key and as value all persons that visited it.
The next phase needs to iterate over the value's persons and join with
the original data to get an idea of when this person visited this place
and what places came next.
And now the problems arise:
- First: What happens to places that are so popular that the number of
persons that visited it is so large, that I can not pass the whole
KV-pair to a single node to iterate over it?
- Second: I need to re-join the original data. Without a database this
would be extremely slow, wouldn't it?

I hope that you guys can give me some ideas and input to make my first
serious steps in Hadoop-land.

Regards,
Em


localhost permission denied

2011-07-19 Thread Kobina Kwarko
Hello,

Please any assistance?? I am using Hadoop for a school project and managed
to install it on two computers testing with the wordcount example. However,
after stopping Hadoop and restarting the computers (Ubuntu Server 10.10) I
am getting the following error:

root@localhost's password: localhost: Permission denied, please try again.

If I enter the administrative password the same message comes again
preventing me from starting Hadoop.

What am I getting wrong? Has anyone encountered such error before?

I'm using Hadoop 0.20.203.

thanks in advance.

Kobina.


Re: localhost permission denied

2011-07-19 Thread John Armstrong
On Tue, 19 Jul 2011 20:47:31 +0100, Kobina Kwarko
kobina.kwa...@gmail.com
wrote:
 Hello,
 
 Please any assistance?? I am using Hadoop for a school project and
managed
 to install it on two computers testing with the wordcount example.
However,
 after stopping Hadoop and restarting the computers (Ubuntu Server 10.10)
I
 am getting the following error:
 
 root@localhost's password: localhost: Permission denied, please try
again.

Did you set up passwordless ssh for the accounts using hadoop?  And why
are you running hadoop as root?


RE: localhost permission denied

2011-07-19 Thread Jeff.Schmitz
Your SSH isn't setup properly

Setup passphraseless ssh

Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the
following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub  ~/.ssh/authorized_keys
Execution

Format a new distributed-filesystem:
$ bin/hadoop namenode -format

Start The hadoop daemons:
$ bin/start-all.sh

http://hadoop.apache.org/common/docs/r0.17.0/quickstart.html#Setup+passp
hraseless

cheers - 

JGS




-Original Message-
From: Kobina Kwarko [mailto:kobina.kwa...@gmail.com] 
Sent: Tuesday, July 19, 2011 2:48 PM
To: common-user@hadoop.apache.org
Subject: localhost permission denied

Hello,

Please any assistance?? I am using Hadoop for a school project and
managed
to install it on two computers testing with the wordcount example.
However,
after stopping Hadoop and restarting the computers (Ubuntu Server 10.10)
I
am getting the following error:

root@localhost's password: localhost: Permission denied, please try
again.

If I enter the administrative password the same message comes again
preventing me from starting Hadoop.

What am I getting wrong? Has anyone encountered such error before?

I'm using Hadoop 0.20.203.

thanks in advance.

Kobina.



Re: localhost permission denied

2011-07-19 Thread Kobina Kwarko
That is the strangest part, I never set it up as as root, this root just
came with the error. I have a dedicated hadoop user that I'm using, it
worked fine the first time and I tested it with the word count example which
produced the expected result but when I restarted the computers this error
came.

And yes, I used empty password for the ssh as shown on Michael Noll's single
hadoop setup page.


On 19 July 2011 20:52, John Armstrong john.armstr...@ccri.com wrote:

 On Tue, 19 Jul 2011 20:47:31 +0100, Kobina Kwarko
 kobina.kwa...@gmail.com
 wrote:
  Hello,
 
  Please any assistance?? I am using Hadoop for a school project and
 managed
  to install it on two computers testing with the wordcount example.
 However,
  after stopping Hadoop and restarting the computers (Ubuntu Server 10.10)
 I
  am getting the following error:
 
  root@localhost's password: localhost: Permission denied, please try
 again.

 Did you set up passwordless ssh for the accounts using hadoop?  And why
 are you running hadoop as root?



Re: localhost permission denied

2011-07-19 Thread Kobina Kwarko
I can ssh into locahost, even the hadoop user can ssh into localhost without
any error when I try starting hadoop it then that error comes.

On 19 July 2011 20:54, jeff.schm...@shell.com wrote:

 Your SSH isn't setup properly

 Setup passphraseless ssh

 Now check that you can ssh to the localhost without a passphrase:
 $ ssh localhost

 If you cannot ssh to localhost without a passphrase, execute the
 following commands:
 $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
 $ cat ~/.ssh/id_dsa.pub  ~/.ssh/authorized_keys
 Execution

 Format a new distributed-filesystem:
 $ bin/hadoop namenode -format

 Start The hadoop daemons:
 $ bin/start-all.sh

 http://hadoop.apache.org/common/docs/r0.17.0/quickstart.html#Setup+passp
 hraseless

 cheers -

 JGS




 -Original Message-
 From: Kobina Kwarko [mailto:kobina.kwa...@gmail.com]
 Sent: Tuesday, July 19, 2011 2:48 PM
 To: common-user@hadoop.apache.org
 Subject: localhost permission denied

 Hello,

 Please any assistance?? I am using Hadoop for a school project and
 managed
 to install it on two computers testing with the wordcount example.
 However,
 after stopping Hadoop and restarting the computers (Ubuntu Server 10.10)
 I
 am getting the following error:

 root@localhost's password: localhost: Permission denied, please try
 again.

 If I enter the administrative password the same message comes again
 preventing me from starting Hadoop.

 What am I getting wrong? Has anyone encountered such error before?

 I'm using Hadoop 0.20.203.

 thanks in advance.

 Kobina.




IO pipeline optimizations

2011-07-19 Thread Shrinivas Joshi
This blog post on YDN website
http://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/has
detailed discussion on different steps involved in Hadoop IO
operations
and opportunities for optimizations. Could someone please comment on current
state of these potential optimizations? Are some of these expected to be
addressed in next gen MR release?

Thanks,
-Shrinivas


Re: IO pipeline optimizations

2011-07-19 Thread Todd Lipcon
Hi Shrinivas,

There has been some work going on recently around optimizing checksums. See
HDFS-2080 for example. This will help both the write and read code, though
we've focused more on read.

There have also been a lot of improvements around random read access - for
example HDFS-941 which improves random read by more than 2x.

I'm planning on writing a blog post in the next couple of weeks about some
of this work.

-Todd

On Tue, Jul 19, 2011 at 1:26 PM, Shrinivas Joshi jshrini...@gmail.comwrote:

 This blog post on YDN website

 http://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/has
 detailed discussion on different steps involved in Hadoop IO
 operations
 and opportunities for optimizations. Could someone please comment on
 current
 state of these potential optimizations? Are some of these expected to be
 addressed in next gen MR release?

 Thanks,
 -Shrinivas




-- 
Todd Lipcon
Software Engineer, Cloudera


Job progress not showing in Hadoop Tasktracker web interface

2011-07-19 Thread foo_foo_foo

I am a Hadoop novice so kindly pardon my ingorance.

I am running the following Hadoop program in Fully Distributed Mode to count
the number of lines in a file. I am running this job from eclipse and I see
it running (based on the output to the eclipse console) but I do not see the
tasks in the TaskTracker web interface. Also eventhough the data is
distributed accross multiple hosts it doesnt seem to be distributing works
accross hosts.

Could someone pelase help me with this.


package LineCount;

import java.util.*;
import java.io.*;

import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.io.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.*;

public class LineCount extends Configured implements Tool {

public static class Map extends 
MapperLongWritable,Text,Text,IntWritable
{
private static int counter = 1;
private static Text mapOpKey = new Text();
private final static IntWritable mapOpValue = new 
IntWritable(1);
@Override
public void map(LongWritable mapInpKey, Text 
mapInpValue,
MapperLongWritable,Text,Text,IntWritable.Context context) throws
IOException,InterruptedException{
System.out.println(Calling Map + 
counter);
counter++;
mapOpKey.set(Number Of Lines);
context.write(mapOpKey, mapOpValue);
}
}

public static class Reduce extends
ReducerText,IntWritable,Text,IntWritable {
private static int counter = 1;
@Override   
public void reduce(Text redIpKey, IterableIntWritable 
redIpValue,
ReducerText,IntWritable,Text,IntWritable.Context context) throws
IOException,InterruptedException {
int sum=0;
System.out.println(Calling Reduce + counter);
counter++;
while(redIpValue.iterator().hasNext()){
sum = sum + 
redIpValue.iterator().next().get(); 
}
context.write(redIpKey, new IntWritable(sum));
}   
}

@Override
public int run(String[] args) throws Exception{

Configuration conf = new Configuration();
conf.addResource(new 
Path(/hadoop-0.20.2/conf/core-site.xml));
Job job = new Job(conf);
job.setJobName(LineCount);
job.setJarByClass(LineCount.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
//job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
FileInputFormat.setInputPaths(job, new 
Path(/usr/foo/hadoopIP));
FileOutputFormat.setOutputPath(job, new 
Path(/usr/foo/hadoopOP));
job.waitForCompletion(true);
return 0;
}

public static void main(String[] args) throws Exception{
ToolRunner.run(new LineCount(), args);
}
}


-- 
View this message in context: 
http://old.nabble.com/Job-progress-not-showing-in-Hadoop-Tasktracker--web-interface-tp32096156p32096156.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



RE: Job progress not showing in Hadoop Tasktracker web interface

2011-07-19 Thread Teng, James
You can't run a hadoop job in eclipse, you have to set up an environment on 
linux system. Maybe you can try to install it on WMware linux system and run 
the job in pseudo-distributed system.


James, Teng (Teng Linxiao)
eRL,   CDC,eBay,Shanghai
Extension:86-21-28913530
MSN: tenglinx...@hotmail.com
Skype:James,Teng
Email:xt...@ebay.com
 
-Original Message-
From: foo_foo_foo [mailto:finallya...@gmail.com] 
Sent: Wednesday, July 20, 2011 10:05 AM
To: core-u...@hadoop.apache.org
Subject: Job progress not showing in Hadoop Tasktracker web interface


I am a Hadoop novice so kindly pardon my ingorance.

I am running the following Hadoop program in Fully Distributed Mode to count
the number of lines in a file. I am running this job from eclipse and I see
it running (based on the output to the eclipse console) but I do not see the
tasks in the TaskTracker web interface. Also eventhough the data is
distributed accross multiple hosts it doesnt seem to be distributing works
accross hosts.

Could someone pelase help me with this.


package LineCount;

import java.util.*;
import java.io.*;

import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.io.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.*;

public class LineCount extends Configured implements Tool {

public static class Map extends 
MapperLongWritable,Text,Text,IntWritable
{
private static int counter = 1;
private static Text mapOpKey = new Text();
private final static IntWritable mapOpValue = new 
IntWritable(1);
@Override
public void map(LongWritable mapInpKey, Text 
mapInpValue,
MapperLongWritable,Text,Text,IntWritable.Context context) throws
IOException,InterruptedException{
System.out.println(Calling Map + 
counter);
counter++;
mapOpKey.set(Number Of Lines);
context.write(mapOpKey, mapOpValue);
}
}

public static class Reduce extends
ReducerText,IntWritable,Text,IntWritable {
private static int counter = 1;
@Override   
public void reduce(Text redIpKey, IterableIntWritable 
redIpValue,
ReducerText,IntWritable,Text,IntWritable.Context context) throws
IOException,InterruptedException {
int sum=0;
System.out.println(Calling Reduce + counter);
counter++;
while(redIpValue.iterator().hasNext()){
sum = sum + 
redIpValue.iterator().next().get(); 
}
context.write(redIpKey, new IntWritable(sum));
}   
}

@Override
public int run(String[] args) throws Exception{

Configuration conf = new Configuration();
conf.addResource(new 
Path(/hadoop-0.20.2/conf/core-site.xml));
Job job = new Job(conf);
job.setJobName(LineCount);
job.setJarByClass(LineCount.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
//job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
FileInputFormat.setInputPaths(job, new 
Path(/usr/foo/hadoopIP));
FileOutputFormat.setOutputPath(job, new 
Path(/usr/foo/hadoopOP));
job.waitForCompletion(true);
return 0;
}

public static void main(String[] args) throws Exception{
ToolRunner.run(new LineCount(), args);
}
}


-- 
View this message in context: 
http://old.nabble.com/Job-progress-not-showing-in-Hadoop-Tasktracker--web-interface-tp32096156p32096156.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Job progress not showing in Hadoop Tasktracker web interface

2011-07-19 Thread Harsh J
Looks like it may be running in the local mode. Have you setup your
Eclipse configuration properly?

What version of Hadoop are you using?

On Wed, Jul 20, 2011 at 7:35 AM, foo_foo_foo finallya...@gmail.com wrote:

 I am a Hadoop novice so kindly pardon my ingorance.

 I am running the following Hadoop program in Fully Distributed Mode to count
 the number of lines in a file. I am running this job from eclipse and I see
 it running (based on the output to the eclipse console) but I do not see the
 tasks in the TaskTracker web interface. Also eventhough the data is
 distributed accross multiple hosts it doesnt seem to be distributing works
 accross hosts.

 Could someone pelase help me with this.


        package LineCount;

        import java.util.*;
        import java.io.*;

        import org.apache.hadoop.mapreduce.Mapper;
        import org.apache.hadoop.mapreduce.Reducer;
        import org.apache.hadoop.mapreduce.Job;
        import org.apache.hadoop.io.*;
        import org.apache.hadoop.conf.*;
        import org.apache.hadoop.fs.Path;
        import org.apache.hadoop.mapreduce.lib.input.*;
        import org.apache.hadoop.mapreduce.lib.output.*;
        import org.apache.hadoop.util.*;

        public class LineCount extends Configured implements Tool {

                public static class Map extends 
 MapperLongWritable,Text,Text,IntWritable
 {
                        private static int counter = 1;
                        private static Text mapOpKey = new Text();
                        private final static IntWritable mapOpValue = new 
 IntWritable(1);
                        @Override
                        public void map(LongWritable mapInpKey, Text 
 mapInpValue,
 MapperLongWritable,Text,Text,IntWritable.Context context) throws
 IOException,InterruptedException{
                                        System.out.println(Calling Map + 
 counter);
                                        counter++;
                                        mapOpKey.set(Number Of Lines);
                                        context.write(mapOpKey, mapOpValue);
                        }
                }

                public static class Reduce extends
 ReducerText,IntWritable,Text,IntWritable {
                        private static int counter = 1;
                        @Override
                        public void reduce(Text redIpKey, 
 IterableIntWritable redIpValue,
 ReducerText,IntWritable,Text,IntWritable.Context context) throws
 IOException,InterruptedException {
                                int sum=0;
                                System.out.println(Calling Reduce + counter);
                                counter++;
                                while(redIpValue.iterator().hasNext()){
                                        sum = sum + 
 redIpValue.iterator().next().get();
                                }
                                context.write(redIpKey, new IntWritable(sum));
                        }
                }

                @Override
                public int run(String[] args) throws Exception{

                        Configuration conf = new Configuration();
                        conf.addResource(new 
 Path(/hadoop-0.20.2/conf/core-site.xml));
                        Job job = new Job(conf);
                        job.setJobName(LineCount);
                        job.setJarByClass(LineCount.class);
                        job.setOutputKeyClass(Text.class);
                        job.setOutputValueClass(IntWritable.class);
                        job.setMapperClass(Map.class);
                        //job.setCombinerClass(Reduce.class);
                        job.setReducerClass(Reduce.class);
                        FileInputFormat.setInputPaths(job, new 
 Path(/usr/foo/hadoopIP));
                        FileOutputFormat.setOutputPath(job, new 
 Path(/usr/foo/hadoopOP));
                        job.waitForCompletion(true);
                        return 0;
                }

                public static void main(String[] args) throws Exception{
                        ToolRunner.run(new LineCount(), args);
                }
        }


 --
 View this message in context: 
 http://old.nabble.com/Job-progress-not-showing-in-Hadoop-Tasktracker--web-interface-tp32096156p32096156.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.





-- 
Harsh J