No locks available

2011-01-11 Thread Adarsh Sharma

Dear all,

Yesterday I was working on a cluster of 6 Hadoop nodes ( Load data, 
perform some jobs ). But today when I start my cluster I came across a 
problem on one of my datanodes.


Datanodes fails to start due to following error :-


2011-01-11 12:54:10,367 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:

/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = hadoop3/172.16.1.4
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 
911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010

/
2011-01-11 12:55:57,031 INFO 
org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: No 
locks available

   at sun.nio.ch.FileChannelImpl.lock0(Native Method)
   at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:881)
   at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
   at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:527)
   at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:505)
   at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
   at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:112)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:298)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:216)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1246)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368)


2011-01-11 12:55:57,043 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: No 
locks available

   at sun.nio.ch.FileChannelImpl.lock0(Native Method)
   at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:881)
   at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
   at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:527)
   at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:505)
   at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
   at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:112)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:298)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:216)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283)
   at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238)
hadoop-hadoop-datanode-hadoop3.log 1775L, 
210569C
1,1   Top



Can Please is familiar with this issue. Please help.


Thanks  Regards

Adarsh Sharma


Re: When applying a patch, which attachment should I use?

2011-01-11 Thread edward choi
Thanks for the info.
I am currently using Hadoop 0.20.2, so I guess I only need apply
hdfs-630-0.20-append.patchhttps://issues.apache.org/jira/secure/attachment/12446812/hdfs-630-0.20-append.patch
.
I wasn't familiar with the term trunk. I guess it means the latest
development.
Thanks again.

Best Regards,
Ed

2011/1/11 Konstantin Boudnik c...@apache.org

 Yeah, that's pretty crazy all right. In your case looks like that 3
 patches on the top are the latest for 0.20-append branch, 0.21 branch
 and trunk (which perhaps 0.22 branch at the moment). It doesn't look
 like you need to apply all of them - just try the latest for your
 particular branch.

 The mess is caused by the fact the ppl are using different names for
 consequent patches (as in file.1.patch, file.2.patch etc) This is
 _very_ confusing indeed, especially when different contributors work
 on the same fix/feature.
 --
   Take care,
 Konstantin (Cos) Boudnik


 On Mon, Jan 10, 2011 at 01:10, edward choi mp2...@gmail.com wrote:
  Hi,
  For the first time I am about to apply a patch to HDFS.
 
  https://issues.apache.org/jira/browse/HDFS-630
 
  Above is the one that I am trying to do.
  But there are like 15 patches and I don't know which one to use.
 
  Could anyone tell me if I need to apply them all or just the one at the
 top?
 
  The whole patching process is just so confusing :-(
 
  Ed
 



Re: TeraSort question.

2011-01-11 Thread Raj V
I used 9500 maps.

The number of maps defaulty to 2 for teragen. For terasort,  it would depend on 
the number of input files, the dfs.block.size and number of nodes.

 

Raj

From: Phil Whelan phil...@gmail.com
To: common-user@hadoop.apache.org; Raj V rajv...@yahoo.com
Cc: 
Sent: Monday, January 10, 2011 10:39:29 PM
Subject: Re: TeraSort question.

Hi Raj,

 Two of the 5 systems were seriously busy, big IO with lots of disk and 
 network activity. The other three systems, CPU was more or less 100% idle, 
 slight network and I/O.

This process defaults to just 2 map jobs, so only 2 nodes are
utilized. Did you try this option? mapred.map.tasks. I found a very
similar question + answer here...

http://www.mail-archive.com/common-user@hadoop.apache.org/msg5.html

 1.      The data is generated in a fashion to where it is not balanced
 across my cluster.  This is because the data is generated with 2 maps.

 These are due to the default #maps/#reduces in Map-Reduce.
 Use:
 $ bin/hadoop jar hadoop-*-dev-examples.jar teragen - Dmapred.map.tasks=8000 
 100 /tera/in $ bin/hadoop jar hadoop-*-dev-examples.jar terasort - 
 Dmapred.reduce.tasks=5300 /tera/in /tera/out
 Arun

Hope that helps.

Thanks,
Phil

On Mon, Jan 10, 2011 at 9:06 PM, Raj V rajv...@yahoo.com wrote:
 All,

 I have been running terasort on a 480 node hadoop cluster. I have also 
 collected cpu,memory,disk, network statistics during this run. The system 
 stats are quite intersting. I can post it when I have put them together in 
 some presentable format ( if there is interest.). However while looking at 
 the data, I noticed something interesting.

  I thought, intutively, that the all the systems in the cluster would have 
 more or less similar behaviour ( time translation was possible) but the 
 overall graph would look the same.,

 Just to confirm it I took 5 random nodes and looked at the CPU, disk ,network 
 etc. activity when the sort was running. Strangeley enough, it was not so., 
 Two of the 5 systems were seriously busy, big IO with lots of disk and 
 network activity. The other three systems, CPU was more or less 100% idle, 
 slight network and I/O.

 Is that normal and/or expected? SHouldn't all the nodes be utilized in more 
 or less manner over the length of the run?

 I generated the data forf the sort using teragen. ( 128MB bloick size, 
 replication =3).

 I would also be interested in other people timings of sort. Is there some 
 place where people can post sort numbers ( not just the record.)

 I will post the actual graphs of the 5 nodes, if there is interest, tomorrow. 
 ( Some logistical issues abt. posting them tonight)

 I am using CDH3B3, even though I think this is not specific to CDH3B3.

 Sorry for the cross post.

 Raj

Re: When applying a patch, which attachment should I use?

2011-01-11 Thread Ted Dunning
You may also be interested in the append branch:

http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/

On Tue, Jan 11, 2011 at 3:12 AM, edward choi mp2...@gmail.com wrote:

 Thanks for the info.
 I am currently using Hadoop 0.20.2, so I guess I only need apply
 hdfs-630-0.20-append.patch
 https://issues.apache.org/jira/secure/attachment/12446812/hdfs-630-0.20-append.patch
 
 .
 I wasn't familiar with the term trunk. I guess it means the latest
 development.
 Thanks again.

 Best Regards,
 Ed

 2011/1/11 Konstantin Boudnik c...@apache.org

  Yeah, that's pretty crazy all right. In your case looks like that 3
  patches on the top are the latest for 0.20-append branch, 0.21 branch
  and trunk (which perhaps 0.22 branch at the moment). It doesn't look
  like you need to apply all of them - just try the latest for your
  particular branch.
 
  The mess is caused by the fact the ppl are using different names for
  consequent patches (as in file.1.patch, file.2.patch etc) This is
  _very_ confusing indeed, especially when different contributors work
  on the same fix/feature.
  --
Take care,
  Konstantin (Cos) Boudnik
 
 
  On Mon, Jan 10, 2011 at 01:10, edward choi mp2...@gmail.com wrote:
   Hi,
   For the first time I am about to apply a patch to HDFS.
  
   https://issues.apache.org/jira/browse/HDFS-630
  
   Above is the one that I am trying to do.
   But there are like 15 patches and I don't know which one to use.
  
   Could anyone tell me if I need to apply them all or just the one at the
  top?
  
   The whole patching process is just so confusing :-(
  
   Ed
  
 



Re: TeraSort question.

2011-01-11 Thread Raj V
Ted


Thanks. I have all the graphs I need that include, map reduce timeline, system 
activity for all the nodes when the sort was running. I will publish them once 
I have them in some presentable format.,

For legal reasons, I really don't want to send the complete job histiory files.

My question is still this. When running terasort, would the CPU, disk and 
network utilization of all the nodes be more or less similar or completely 
different.

Sometime during the day, I will post the system data from 5 nodes and that 
would probably explain my question better.

Raj
From: Ted Dunning tdunn...@maprtech.com
To: common-user@hadoop.apache.org; Raj V rajv...@yahoo.com
Cc: 
Sent: Tuesday, January 11, 2011 8:22:17 AM
Subject: Re: TeraSort question.

Raj,

Do you have the job history files?  That would be very useful.  I would be
happy to create some swimlane and related graphs for you if you can send me
the history files.

On Mon, Jan 10, 2011 at 9:06 PM, Raj V rajv...@yahoo.com wrote:

 All,

 I have been running terasort on a 480 node hadoop cluster. I have also
 collected cpu,memory,disk, network statistics during this run. The system
 stats are quite intersting. I can post it when I have put them together in
 some presentable format ( if there is interest.). However while looking at
 the data, I noticed something interesting.

  I thought, intutively, that the all the systems in the cluster would have
 more or less similar behaviour ( time translation was possible) but the
 overall graph would look the same.,

 Just to confirm it I took 5 random nodes and looked at the CPU, disk
 ,network etc. activity when the sort was running. Strangeley enough, it was
 not so., Two of the 5 systems were seriously busy, big IO with lots of disk
 and network activity. The other three systems, CPU was more or less 100%
 idle, slight network and I/O.

 Is that normal and/or expected? SHouldn't all the nodes be utilized in more
 or less manner over the length of the run?

 I generated the data forf the sort using teragen. ( 128MB bloick size,
 replication =3).

 I would also be interested in other people timings of sort. Is there some
 place where people can post sort numbers ( not just the record.)

 I will post the actual graphs of the 5 nodes, if there is interest,
 tomorrow. ( Some logistical issues abt. posting them tonight)

 I am using CDH3B3, even though I think this is not specific to CDH3B3.

 Sorry for the cross post.

 Raj

libjars options

2011-01-11 Thread C.V.Krishnakumar Iyer
Hi,

Could anyone please guide me as to how to use the -libjars option in HDFS? 

I have added the necessary jar file (the hbase jar - to be precise)  to the 
classpath of the node where I am starting the job. 

The following is the format that i am invoking: 
bin/hadoop jar Our Jar MainClass -libjars Dependent jars (separated by 
Commas) Arguments to our main class

bin/hadoop jar /Users/hdp/cvk/myjob.jar  mr2.mr2a.MR2ADriver -libjars 
/Users/hdp/hadoop/lib/hbase-0.20.6.jar inputmr2a  outputmr2a

Despite this,  I find that I get the java.lang.ClassNotFoundException error! :(
java.lang.RuntimeException: java.lang.RuntimeException: 
java.lang.ClassNotFoundException: 
org.apache.hadoop.hbase.io.ImmutableBytesWritable
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:841)
at 
org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:551)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:793)
at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:524)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
org.apache.hadoop.hbase.io.ImmutableBytesWritable
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:833)

 The strange thing is that there is another MR job I have  that runs perfectly 
with the libjars option! Could anybody tell me what I am doing wrong? One more 
thing - not sure if it is relevant : I am using the new Hadoop MapReduce API.

Thanks in advance!

Regards,
Krishnakumar.

Re: libjars options

2011-01-11 Thread Ted Yu
Refer to Alex Kozlov's answer on 12/11/10

On Tue, Jan 11, 2011 at 10:10 AM, C.V.Krishnakumar Iyer
f2004...@gmail.comwrote:

 Hi,

 Could anyone please guide me as to how to use the -libjars option in HDFS?

 I have added the necessary jar file (the hbase jar - to be precise)  to the
 classpath of the node where I am starting the job.

 The following is the format that i am invoking:
 bin/hadoop jar Our Jar MainClass -libjars Dependent jars (separated by
 Commas) Arguments to our main class

 bin/hadoop jar /Users/hdp/cvk/myjob.jar  mr2.mr2a.MR2ADriver -libjars
 /Users/hdp/hadoop/lib/hbase-0.20.6.jar inputmr2a  outputmr2a

 Despite this,  I find that I get the java.lang.ClassNotFoundException
 error! :(
 java.lang.RuntimeException: java.lang.RuntimeException:
 java.lang.ClassNotFoundException:
 org.apache.hadoop.hbase.io.ImmutableBytesWritable
at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:841)
at
 org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:551)
at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:793)
at
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:524)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
 org.apache.hadoop.hbase.io.ImmutableBytesWritable
at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:833)

  The strange thing is that there is another MR job I have  that runs
 perfectly with the libjars option! Could anybody tell me what I am doing
 wrong? One more thing - not sure if it is relevant : I am using the new
 Hadoop MapReduce API.

 Thanks in advance!

 Regards,
 Krishnakumar.


Re: TeraSort question.

2011-01-11 Thread Niels Basjes
Raj,

Have a look at the graph shown here:
http://cs.smith.edu/dftwiki/index.php/Hadoop_Tutorial_1.1_--_Generating_Task_Timelines

It should make clear that the number of tasks varies greatly over the
lifetime of a job.
Depending on the nodes available this may leave node idle.

Niels

2011/1/11 Raj V rajv...@yahoo.com:
 Ted


 Thanks. I have all the graphs I need that include, map reduce timeline, 
 system activity for all the nodes when the sort was running. I will publish 
 them once I have them in some presentable format.,

 For legal reasons, I really don't want to send the complete job histiory 
 files.

 My question is still this. When running terasort, would the CPU, disk and 
 network utilization of all the nodes be more or less similar or completely 
 different.

 Sometime during the day, I will post the system data from 5 nodes and that 
 would probably explain my question better.

 Raj
 From: Ted Dunning tdunn...@maprtech.com
 To: common-user@hadoop.apache.org; Raj V rajv...@yahoo.com
 Cc:
 Sent: Tuesday, January 11, 2011 8:22:17 AM
 Subject: Re: TeraSort question.

 Raj,

 Do you have the job history files?  That would be very useful.  I would be
 happy to create some swimlane and related graphs for you if you can send me
 the history files.

 On Mon, Jan 10, 2011 at 9:06 PM, Raj V rajv...@yahoo.com wrote:

 All,

 I have been running terasort on a 480 node hadoop cluster. I have also
 collected cpu,memory,disk, network statistics during this run. The system
 stats are quite intersting. I can post it when I have put them together in
 some presentable format ( if there is interest.). However while looking at
 the data, I noticed something interesting.

  I thought, intutively, that the all the systems in the cluster would have
 more or less similar behaviour ( time translation was possible) but the
 overall graph would look the same.,

 Just to confirm it I took 5 random nodes and looked at the CPU, disk
 ,network etc. activity when the sort was running. Strangeley enough, it was
 not so., Two of the 5 systems were seriously busy, big IO with lots of disk
 and network activity. The other three systems, CPU was more or less 100%
 idle, slight network and I/O.

 Is that normal and/or expected? SHouldn't all the nodes be utilized in more
 or less manner over the length of the run?

 I generated the data forf the sort using teragen. ( 128MB bloick size,
 replication =3).

 I would also be interested in other people timings of sort. Is there some
 place where people can post sort numbers ( not just the record.)

 I will post the actual graphs of the 5 nodes, if there is interest,
 tomorrow. ( Some logistical issues abt. posting them tonight)

 I am using CDH3B3, even though I think this is not specific to CDH3B3.

 Sorry for the cross post.

 Raj



-- 
Met vriendelijke groeten,

Niels Basjes


Re: libjars options

2011-01-11 Thread C.V.Krishnakumar Iyer
Hi,

I have tried that as well, using -files jar file But it still gives the exact 
same error. Any other thing that I could try? 

Thanks,
Krishna.

On Jan 11, 2011, at 10:23 AM, Ted Yu wrote:

 Refer to Alex Kozlov's answer on 12/11/10
 
 On Tue, Jan 11, 2011 at 10:10 AM, C.V.Krishnakumar Iyer
 f2004...@gmail.comwrote:
 
 Hi,
 
 Could anyone please guide me as to how to use the -libjars option in HDFS?
 
 I have added the necessary jar file (the hbase jar - to be precise)  to the
 classpath of the node where I am starting the job.
 
 The following is the format that i am invoking:
 bin/hadoop jar Our Jar MainClass -libjars Dependent jars (separated by
 Commas) Arguments to our main class
 
 bin/hadoop jar /Users/hdp/cvk/myjob.jar  mr2.mr2a.MR2ADriver -libjars
 /Users/hdp/hadoop/lib/hbase-0.20.6.jar inputmr2a  outputmr2a
 
 Despite this,  I find that I get the java.lang.ClassNotFoundException
 error! :(
 java.lang.RuntimeException: java.lang.RuntimeException:
 java.lang.ClassNotFoundException:
 org.apache.hadoop.hbase.io.ImmutableBytesWritable
   at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:841)
   at
 org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:551)
   at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:793)
   at
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:524)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
 org.apache.hadoop.hbase.io.ImmutableBytesWritable
   at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
   at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:833)
 
 The strange thing is that there is another MR job I have  that runs
 perfectly with the libjars option! Could anybody tell me what I am doing
 wrong? One more thing - not sure if it is relevant : I am using the new
 Hadoop MapReduce API.
 
 Thanks in advance!
 
 Regards,
 Krishnakumar.



Re: TeraSort question.

2011-01-11 Thread Raj V
Can't attach teh pdf file that shows diffeent maps.,

File is too big, 

From: Niels Basjes ni...@basjes.nl
To: common-user@hadoop.apache.org; Raj V rajv...@yahoo.com
Cc: 
Sent: Tuesday, January 11, 2011 11:07:08 AM
Subject: Re: TeraSort question.

Raj,

Have a look at the graph shown here:
http://cs.smith.edu/dftwiki/index.php/Hadoop_Tutorial_1.1_--_Generating_Task_Timelines

It should make clear that the number of tasks varies greatly over the
lifetime of a job.
Depending on the nodes available this may leave node idle.

Niels

2011/1/11 Raj V rajv...@yahoo.com:
 Ted


 Thanks. I have all the graphs I need that include, map reduce timeline, 
 system activity for all the nodes when the sort was running. I will publish 
 them once I have them in some presentable format.,

 For legal reasons, I really don't want to send the complete job histiory 
 files.

 My question is still this. When running terasort, would the CPU, disk and 
 network utilization of all the nodes be more or less similar or completely 
 different.

 Sometime during the day, I will post the system data from 5 nodes and that 
 would probably explain my question better.

 Raj
 From: Ted Dunning tdunn...@maprtech.com
 To: common-user@hadoop.apache.org; Raj V rajv...@yahoo.com
 Cc:
 Sent: Tuesday, January 11, 2011 8:22:17 AM
 Subject: Re: TeraSort question.

 Raj,

 Do you have the job history files?  That would be very useful.  I would be
 happy to create some swimlane and related graphs for you if you can send me
 the history files.

 On Mon, Jan 10, 2011 at 9:06 PM, Raj V rajv...@yahoo.com wrote:

 All,

 I have been running terasort on a 480 node hadoop cluster. I have also
 collected cpu,memory,disk, network statistics during this run. The system
 stats are quite intersting. I can post it when I have put them together in
 some presentable format ( if there is interest.). However while looking at
 the data, I noticed something interesting.

  I thought, intutively, that the all the systems in the cluster would have
 more or less similar behaviour ( time translation was possible) but the
 overall graph would look the same.,

 Just to confirm it I took 5 random nodes and looked at the CPU, disk
 ,network etc. activity when the sort was running. Strangeley enough, it was
 not so., Two of the 5 systems were seriously busy, big IO with lots of disk
 and network activity. The other three systems, CPU was more or less 100%
 idle, slight network and I/O.

 Is that normal and/or expected? SHouldn't all the nodes be utilized in more
 or less manner over the length of the run?

 I generated the data forf the sort using teragen. ( 128MB bloick size,
 replication =3).

 I would also be interested in other people timings of sort. Is there some
 place where people can post sort numbers ( not just the record.)

 I will post the actual graphs of the 5 nodes, if there is interest,
 tomorrow. ( Some logistical issues abt. posting them tonight)

 I am using CDH3B3, even though I think this is not specific to CDH3B3.

 Sorry for the cross post.

 Raj



-- 
Met vriendelijke groeten,

Niels Basjes

Re: libjars options

2011-01-11 Thread Alex Kozlov
Have you implemented GenericOptionsParser?  Do you see your jar in the *
mapred.cache.files* or *tmpjars* parameter in your job.xml file (can view
via a JT Web UI)?

-- 
Alex Kozlov
Solutions Architect
Cloudera, Inc
twitter: alexvk2009
http://www.cloudera.com/company/press-center/hadoop-world-nyc/


On Tue, Jan 11, 2011 at 11:49 AM, C.V.Krishnakumar Iyer
f2004...@gmail.comwrote:

 Hi,

 I have tried that as well, using -files jar file But it still gives the
 exact same error. Any other thing that I could try?

 Thanks,
 Krishna.

 On Jan 11, 2011, at 10:23 AM, Ted Yu wrote:

  Refer to Alex Kozlov's answer on 12/11/10
 
  On Tue, Jan 11, 2011 at 10:10 AM, C.V.Krishnakumar Iyer
  f2004...@gmail.comwrote:
 
  Hi,
 
  Could anyone please guide me as to how to use the -libjars option in
 HDFS?
 
  I have added the necessary jar file (the hbase jar - to be precise)  to
 the
  classpath of the node where I am starting the job.
 
  The following is the format that i am invoking:
  bin/hadoop jar Our Jar MainClass -libjars Dependent jars (separated
 by
  Commas) Arguments to our main class
 
  bin/hadoop jar /Users/hdp/cvk/myjob.jar  mr2.mr2a.MR2ADriver -libjars
  /Users/hdp/hadoop/lib/hbase-0.20.6.jar inputmr2a  outputmr2a
 
  Despite this,  I find that I get the java.lang.ClassNotFoundException
  error! :(
  java.lang.RuntimeException: java.lang.RuntimeException:
  java.lang.ClassNotFoundException:
  org.apache.hadoop.hbase.io.ImmutableBytesWritable
at
  org.apache.hadoop.conf.Configuration.getClass(Configuration.java:841)
at
 
 org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:551)
at
  org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:793)
at
  org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:524)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
  Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
  org.apache.hadoop.hbase.io.ImmutableBytesWritable
at
  org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
at
  org.apache.hadoop.conf.Configuration.getClass(Configuration.java:833)
 
  The strange thing is that there is another MR job I have  that runs
  perfectly with the libjars option! Could anybody tell me what I am doing
  wrong? One more thing - not sure if it is relevant : I am using the new
  Hadoop MapReduce API.
 
  Thanks in advance!
 
  Regards,
  Krishnakumar.




Re: libjars options

2011-01-11 Thread Alex Kozlov
There is also a blog that I recently wrote, if it helps
http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job

On Tue, Jan 11, 2011 at 12:33 PM, Alex Kozlov ale...@cloudera.com wrote:

 Have you implemented GenericOptionsParser?  Do you see your jar in the *
 mapred.cache.files* or *tmpjars* parameter in your job.xml file (can view
 via a JT Web UI)?

 --
 Alex Kozlov
 Solutions Architect
 Cloudera, Inc
 twitter: alexvk2009
 http://www.cloudera.com/company/press-center/hadoop-world-nyc/


 On Tue, Jan 11, 2011 at 11:49 AM, C.V.Krishnakumar Iyer 
 f2004...@gmail.com wrote:

 Hi,

 I have tried that as well, using -files jar file But it still gives the
 exact same error. Any other thing that I could try?

 Thanks,
 Krishna.

 On Jan 11, 2011, at 10:23 AM, Ted Yu wrote:

  Refer to Alex Kozlov's answer on 12/11/10
 
  On Tue, Jan 11, 2011 at 10:10 AM, C.V.Krishnakumar Iyer
  f2004...@gmail.comwrote:
 
  Hi,
 
  Could anyone please guide me as to how to use the -libjars option in
 HDFS?
 
  I have added the necessary jar file (the hbase jar - to be precise)  to
 the
  classpath of the node where I am starting the job.
 
  The following is the format that i am invoking:
  bin/hadoop jar Our Jar MainClass -libjars Dependent jars
 (separated by
  Commas) Arguments to our main class
 
  bin/hadoop jar /Users/hdp/cvk/myjob.jar  mr2.mr2a.MR2ADriver -libjars
  /Users/hdp/hadoop/lib/hbase-0.20.6.jar inputmr2a  outputmr2a
 
  Despite this,  I find that I get the java.lang.ClassNotFoundException
  error! :(
  java.lang.RuntimeException: java.lang.RuntimeException:
  java.lang.ClassNotFoundException:
  org.apache.hadoop.hbase.io.ImmutableBytesWritable
at
  org.apache.hadoop.conf.Configuration.getClass(Configuration.java:841)
at
 
 org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:551)
at
  org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:793)
at
  org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:524)
at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
  Caused by: java.lang.RuntimeException:
 java.lang.ClassNotFoundException:
  org.apache.hadoop.hbase.io.ImmutableBytesWritable
at
  org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
at
  org.apache.hadoop.conf.Configuration.getClass(Configuration.java:833)
 
  The strange thing is that there is another MR job I have  that runs
  perfectly with the libjars option! Could anybody tell me what I am
 doing
  wrong? One more thing - not sure if it is relevant : I am using the new
  Hadoop MapReduce API.
 
  Thanks in advance!
 
  Regards,
  Krishnakumar.





Re: libjars options

2011-01-11 Thread C.V.Krishnakumar Iyer
Hi,

Thanks a lot! I shall try this once and let you know! 

Regards,
Krishna.
On Jan 11, 2011, at 12:48 PM, Alex Kozlov wrote:

 There is also a blog that I recently wrote, if it helps
 http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job
 
 On Tue, Jan 11, 2011 at 12:33 PM, Alex Kozlov ale...@cloudera.com wrote:
 
 Have you implemented GenericOptionsParser?  Do you see your jar in the *
 mapred.cache.files* or *tmpjars* parameter in your job.xml file (can view
 via a JT Web UI)?
 
 --
 Alex Kozlov
 Solutions Architect
 Cloudera, Inc
 twitter: alexvk2009
 http://www.cloudera.com/company/press-center/hadoop-world-nyc/
 
 
 On Tue, Jan 11, 2011 at 11:49 AM, C.V.Krishnakumar Iyer 
 f2004...@gmail.com wrote:
 
 Hi,
 
 I have tried that as well, using -files jar file But it still gives the
 exact same error. Any other thing that I could try?
 
 Thanks,
 Krishna.
 
 On Jan 11, 2011, at 10:23 AM, Ted Yu wrote:
 
 Refer to Alex Kozlov's answer on 12/11/10
 
 On Tue, Jan 11, 2011 at 10:10 AM, C.V.Krishnakumar Iyer
 f2004...@gmail.comwrote:
 
 Hi,
 
 Could anyone please guide me as to how to use the -libjars option in
 HDFS?
 
 I have added the necessary jar file (the hbase jar - to be precise)  to
 the
 classpath of the node where I am starting the job.
 
 The following is the format that i am invoking:
 bin/hadoop jar Our Jar MainClass -libjars Dependent jars
 (separated by
 Commas) Arguments to our main class
 
 bin/hadoop jar /Users/hdp/cvk/myjob.jar  mr2.mr2a.MR2ADriver -libjars
 /Users/hdp/hadoop/lib/hbase-0.20.6.jar inputmr2a  outputmr2a
 
 Despite this,  I find that I get the java.lang.ClassNotFoundException
 error! :(
 java.lang.RuntimeException: java.lang.RuntimeException:
 java.lang.ClassNotFoundException:
 org.apache.hadoop.hbase.io.ImmutableBytesWritable
  at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:841)
  at
 
 org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:551)
  at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:793)
  at
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:524)
  at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
  at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Caused by: java.lang.RuntimeException:
 java.lang.ClassNotFoundException:
 org.apache.hadoop.hbase.io.ImmutableBytesWritable
  at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
  at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:833)
 
 The strange thing is that there is another MR job I have  that runs
 perfectly with the libjars option! Could anybody tell me what I am
 doing
 wrong? One more thing - not sure if it is relevant : I am using the new
 Hadoop MapReduce API.
 
 Thanks in advance!
 
 Regards,
 Krishnakumar.
 
 
 



Re: libjars options

2011-01-11 Thread C.V.Krishnakumar Iyer
Hi,

Thanks a lot Alex! using GenericOptionsParser solved the issue. Previously I 
had used Tool and had assumed that it would take care of this.

Regards,
Krishna.
On Jan 11, 2011, at 12:48 PM, Alex Kozlov wrote:

 There is also a blog that I recently wrote, if it helps
 http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job
 
 On Tue, Jan 11, 2011 at 12:33 PM, Alex Kozlov ale...@cloudera.com wrote:
 
 Have you implemented GenericOptionsParser?  Do you see your jar in the *
 mapred.cache.files* or *tmpjars* parameter in your job.xml file (can view
 via a JT Web UI)?
 
 --
 Alex Kozlov
 Solutions Architect
 Cloudera, Inc
 twitter: alexvk2009
 http://www.cloudera.com/company/press-center/hadoop-world-nyc/
 
 
 On Tue, Jan 11, 2011 at 11:49 AM, C.V.Krishnakumar Iyer 
 f2004...@gmail.com wrote:
 
 Hi,
 
 I have tried that as well, using -files jar file But it still gives the
 exact same error. Any other thing that I could try?
 
 Thanks,
 Krishna.
 
 On Jan 11, 2011, at 10:23 AM, Ted Yu wrote:
 
 Refer to Alex Kozlov's answer on 12/11/10
 
 On Tue, Jan 11, 2011 at 10:10 AM, C.V.Krishnakumar Iyer
 f2004...@gmail.comwrote:
 
 Hi,
 
 Could anyone please guide me as to how to use the -libjars option in
 HDFS?
 
 I have added the necessary jar file (the hbase jar - to be precise)  to
 the
 classpath of the node where I am starting the job.
 
 The following is the format that i am invoking:
 bin/hadoop jar Our Jar MainClass -libjars Dependent jars
 (separated by
 Commas) Arguments to our main class
 
 bin/hadoop jar /Users/hdp/cvk/myjob.jar  mr2.mr2a.MR2ADriver -libjars
 /Users/hdp/hadoop/lib/hbase-0.20.6.jar inputmr2a  outputmr2a
 
 Despite this,  I find that I get the java.lang.ClassNotFoundException
 error! :(
 java.lang.RuntimeException: java.lang.RuntimeException:
 java.lang.ClassNotFoundException:
 org.apache.hadoop.hbase.io.ImmutableBytesWritable
  at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:841)
  at
 
 org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:551)
  at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:793)
  at
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:524)
  at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
  at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Caused by: java.lang.RuntimeException:
 java.lang.ClassNotFoundException:
 org.apache.hadoop.hbase.io.ImmutableBytesWritable
  at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
  at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:833)
 
 The strange thing is that there is another MR job I have  that runs
 perfectly with the libjars option! Could anybody tell me what I am
 doing
 wrong? One more thing - not sure if it is relevant : I am using the new
 Hadoop MapReduce API.
 
 Thanks in advance!
 
 Regards,
 Krishnakumar.
 
 
 



Re: No locks available

2011-01-11 Thread Allen Wittenauer

On Jan 11, 2011, at 2:39 AM, Adarsh Sharma wrote:

 Dear all,
 
 Yesterday I was working on a cluster of 6 Hadoop nodes ( Load data, perform 
 some jobs ). But today when I start my cluster I came across a problem on one 
 of my datanodes.

Are you running this on NFS?

 
 2011-01-11 12:55:57,031 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 java.io.IOException: No locks available


Re: When applying a patch, which attachment should I use?

2011-01-11 Thread edward choi
I am not familiar with this whole svn and patch stuff, so please understand
my asking.

I was going to apply
hdfs-630-0.20-append.patchhttps://issues.apache.org/jira/secure/attachment/12446812/hdfs-630-0.20-append.patch
only
because I wanted to install HBase and the installation guide told me to.
The append branch you mentioned, does that include
hdfs-630-0.20-append.patchhttps://issues.apache.org/jira/secure/attachment/12446812/hdfs-630-0.20-append.patch
as
well?
Is it like the latest patch with all the good stuff packed in one?

Regards,
Ed

2011/1/12 Ted Dunning tdunn...@maprtech.com

 You may also be interested in the append branch:

 http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/

 On Tue, Jan 11, 2011 at 3:12 AM, edward choi mp2...@gmail.com wrote:

  Thanks for the info.
  I am currently using Hadoop 0.20.2, so I guess I only need apply
  hdfs-630-0.20-append.patch
 
 https://issues.apache.org/jira/secure/attachment/12446812/hdfs-630-0.20-append.patch
  
  .
  I wasn't familiar with the term trunk. I guess it means the latest
  development.
  Thanks again.
 
  Best Regards,
  Ed
 
  2011/1/11 Konstantin Boudnik c...@apache.org
 
   Yeah, that's pretty crazy all right. In your case looks like that 3
   patches on the top are the latest for 0.20-append branch, 0.21 branch
   and trunk (which perhaps 0.22 branch at the moment). It doesn't look
   like you need to apply all of them - just try the latest for your
   particular branch.
  
   The mess is caused by the fact the ppl are using different names for
   consequent patches (as in file.1.patch, file.2.patch etc) This is
   _very_ confusing indeed, especially when different contributors work
   on the same fix/feature.
   --
 Take care,
   Konstantin (Cos) Boudnik
  
  
   On Mon, Jan 10, 2011 at 01:10, edward choi mp2...@gmail.com wrote:
Hi,
For the first time I am about to apply a patch to HDFS.
   
https://issues.apache.org/jira/browse/HDFS-630
   
Above is the one that I am trying to do.
But there are like 15 patches and I don't know which one to use.
   
Could anyone tell me if I need to apply them all or just the one at
 the
   top?
   
The whole patching process is just so confusing :-(
   
Ed
   
  
 



Re: No locks available

2011-01-11 Thread Adarsh Sharma

Allen Wittenauer wrote:

On Jan 11, 2011, at 2:39 AM, Adarsh Sharma wrote:

  

Dear all,

Yesterday I was working on a cluster of 6 Hadoop nodes ( Load data, perform 
some jobs ). But today when I start my cluster I came across a problem on one 
of my datanodes.



Are you running this on NFS?
  

No Sir,

I am running this on 3 Servers with local filesystem. Each Server 
contains 2 Hard Disks ( /hdd2-1, /hdd1-1 ) and on each servers there run 
2 VM's and one occupy /hdd2-1 and the other /hdd1-1.


My Namenode contains all the predefined Ip of VM's.


Thanks
  

2011-01-11 12:55:57,031 INFO org.apache.hadoop.hdfs.server.common.Storage: 
java.io.IOException: No locks available





Re: Application for testing

2011-01-11 Thread Konstantin Boudnik
(Moving general@ to Bcc: list)

Bo, you can try to run TeraSort from Hadoop examples: you'll see if the
cluster is up and running and cen compare its performance between upgrades, if
needed.

Also, please don't use general@ for user questions: there's common-user@ list
exactly for these purposes.

With regards,
  Cos

On Tue, Jan 11, 2011 at 07:50AM, Bo Sang wrote:
 Hi, guys:
 
 I have deployed a hadoop on our group's nodes. Could you recommend some
 typical applications for me? I want to test whether it can really work and
 observe its performance.
 
 -- 
 Best Regards!
 
 Sincerely
 Bo Sang


signature.asc
Description: Digital signature


Re: Too-many fetch failure Reduce Error

2011-01-11 Thread Adarsh Sharma


Any update on this error.


Thanks



Adarsh Sharma wrote:

Esteban Gutierrez Moguel wrote:

Adarsh,

Dou you have in /etc/hosts the hostnames for masters and slaves?
  


Yes I know this issue. But did you think the error occurs while 
reading the output of map.

I want to know the proper reason of below lines :

org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_201101071129_0001/attempt_201101071129_0001_m_12_0/output/file.out.index 





esteban.

On Fri, Jan 7, 2011 at 06:47, Adarsh Sharma 
adarsh.sha...@orkash.comwrote:


 

Dear all,

I am researching about the below error and could not able to find the
reason :

Data Size : 3.4 GB
Hadoop-0.20.0

had...@ws32-test-lin:~/project/hadoop-0.20.2$ bin/hadoop jar
hadoop-0.20.2-examples.jar wordcount /user/hadoop/page_content.txt
page_content_output.txt
11/01/07 16:11:14 INFO input.FileInputFormat: Total input paths to 
process

: 1
11/01/07 16:11:15 INFO mapred.JobClient: Running job: 
job_201101071129_0001

11/01/07 16:11:16 INFO mapred.JobClient:  map 0% reduce 0%
11/01/07 16:11:41 INFO mapred.JobClient:  map 1% reduce 0%
11/01/07 16:11:45 INFO mapred.JobClient:  map 2% reduce 0%
11/01/07 16:11:48 INFO mapred.JobClient:  map 3% reduce 0%
11/01/07 16:11:52 INFO mapred.JobClient:  map 4% reduce 0%
11/01/07 16:11:56 INFO mapred.JobClient:  map 5% reduce 0%
11/01/07 16:12:00 INFO mapred.JobClient:  map 6% reduce 0%
11/01/07 16:12:05 INFO mapred.JobClient:  map 7% reduce 0%
11/01/07 16:12:08 INFO mapred.JobClient:  map 8% reduce 0%
11/01/07 16:12:11 INFO mapred.JobClient:  map 9% reduce 0%
11/01/07 16:12:14 INFO mapred.JobClient:  map 10% reduce 0%
11/01/07 16:12:17 INFO mapred.JobClient:  map 11% reduce 0%
11/01/07 16:12:21 INFO mapred.JobClient:  map 12% reduce 0%
11/01/07 16:12:24 INFO mapred.JobClient:  map 13% reduce 0%
11/01/07 16:12:27 INFO mapred.JobClient:  map 14% reduce 0%
11/01/07 16:12:30 INFO mapred.JobClient:  map 15% reduce 0%
11/01/07 16:12:33 INFO mapred.JobClient:  map 16% reduce 0%
11/01/07 16:12:36 INFO mapred.JobClient:  map 17% reduce 0%
11/01/07 16:12:40 INFO mapred.JobClient:  map 18% reduce 0%
11/01/07 16:12:45 INFO mapred.JobClient:  map 19% reduce 0%
11/01/07 16:12:48 INFO mapred.JobClient:  map 20% reduce 0%
11/01/07 16:12:54 INFO mapred.JobClient:  map 21% reduce 0%
11/01/07 16:13:00 INFO mapred.JobClient:  map 22% reduce 0%
11/01/07 16:13:04 INFO mapred.JobClient:  map 22% reduce 1%
11/01/07 16:13:13 INFO mapred.JobClient:  map 23% reduce 1%
11/01/07 16:13:19 INFO mapred.JobClient:  map 24% reduce 1%
11/01/07 16:13:25 INFO mapred.JobClient:  map 25% reduce 1%
11/01/07 16:13:30 INFO mapred.JobClient:  map 26% reduce 1%
11/01/07 16:13:34 INFO mapred.JobClient:  map 26% reduce 3%
11/01/07 16:13:36 INFO mapred.JobClient:  map 27% reduce 3%
11/01/07 16:13:37 INFO mapred.JobClient:  map 27% reduce 4%
11/01/07 16:13:39 INFO mapred.JobClient:  map 28% reduce 4%
11/01/07 16:13:43 INFO mapred.JobClient:  map 29% reduce 4%
11/01/07 16:13:46 INFO mapred.JobClient:  map 30% reduce 4%
11/01/07 16:13:49 INFO mapred.JobClient:  map 31% reduce 4%
11/01/07 16:13:52 INFO mapred.JobClient:  map 32% reduce 4%
11/01/07 16:13:55 INFO mapred.JobClient:  map 33% reduce 4%
11/01/07 16:13:58 INFO mapred.JobClient:  map 34% reduce 4%
11/01/07 16:14:02 INFO mapred.JobClient:  map 35% reduce 4%
11/01/07 16:14:05 INFO mapred.JobClient:  map 36% reduce 4%
11/01/07 16:14:08 INFO mapred.JobClient:  map 37% reduce 4%
11/01/07 16:14:11 INFO mapred.JobClient:  map 38% reduce 4%
11/01/07 16:14:15 INFO mapred.JobClient:  map 39% reduce 4%
11/01/07 16:14:19 INFO mapred.JobClient:  map 40% reduce 4%
11/01/07 16:14:20 INFO mapred.JobClient:  map 40% reduce 5%
11/01/07 16:14:25 INFO mapred.JobClient:  map 41% reduce 5%
11/01/07 16:14:32 INFO mapred.JobClient:  map 42% reduce 5%
11/01/07 16:14:38 INFO mapred.JobClient:  map 43% reduce 5%
11/01/07 16:14:41 INFO mapred.JobClient:  map 43% reduce 6%
11/01/07 16:14:43 INFO mapred.JobClient:  map 44% reduce 6%
11/01/07 16:14:47 INFO mapred.JobClient:  map 45% reduce 6%
11/01/07 16:14:50 INFO mapred.JobClient:  map 46% reduce 6%
11/01/07 16:14:54 INFO mapred.JobClient:  map 47% reduce 7%
11/01/07 16:14:59 INFO mapred.JobClient:  map 48% reduce 7%
11/01/07 16:15:02 INFO mapred.JobClient:  map 49% reduce 7%
11/01/07 16:15:05 INFO mapred.JobClient:  map 50% reduce 7%
11/01/07 16:15:11 INFO mapred.JobClient:  map 51% reduce 7%
11/01/07 16:15:14 INFO mapred.JobClient:  map 52% reduce 7%
11/01/07 16:15:16 INFO mapred.JobClient:  map 52% reduce 8%
11/01/07 16:15:20 INFO mapred.JobClient:  map 53% reduce 8%
11/01/07 16:15:25 INFO mapred.JobClient:  map 54% reduce 8%
11/01/07 16:15:29 INFO mapred.JobClient:  map 55% reduce 8%
11/01/07 16:15:31 INFO mapred.JobClient:  map 55% reduce 9%
11/01/07 16:15:33 INFO mapred.JobClient:  map 56% reduce 9%
11/01/07 16:15:38 INFO mapred.JobClient:  map 57% reduce 9%
11/01/07 16:15:42 INFO mapred.JobClient:  map 58% reduce 9%
11/01/07 16:15:43 INFO