Re: How to make HOD apply more than one core on each machine?

2010-04-16 Thread Hemanth Yamijala
Song,

   I know it is the way to set the capacity of each node, however, I want to
 know, how can we make Torque manager that we will run more than 1 mapred
 tasks on each machine. Because if we dont do this, torque will assign other
 cores on this machine to other tasks, which may cause a competition for
 cores.

   Do you know how to solve this?


If I understand, what you want is that when a physical node is
allocated via HOD by the Torque resource manager, you don't want that
node to be shared by other jobs. Is that correct ?

Looking on the web, I found that schedulers like Maui / Moab that are
typically used with Torque allow for this. In particular, I thought
this link: 
https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2009-May/039949.html
may be particularly useful. It talks about a NODEACCESSPOLICY
configuration in Maui that is described here:
http://www.clusterresources.com/products/maui/docs/5.3nodeaccess.shtml.
Setting this policy to SINGLEJOB seems to solve your problem.

Can you check if this meets your requirement ?


Re: Distributed Cache with New API

2010-04-16 Thread Larry Compton
Thanks. That clears it up.

Larry

On Fri, Apr 16, 2010 at 1:05 AM, Amareshwari Sri Ramadasu 
amar...@yahoo-inc.com wrote:

 Hi,
 @Ted, below code is internal code. Users are not expected to call
 DistributedCache.getLocalCache(), they cannot use it also. They do not know
 all the parameters.
 @Larry, DistributedCache is not changed to use new api in branch 0.20. The
 change is done in only from branch 0.21. See MAPREDUCE-898 (
 https://issues.apache.org/jira/browse/MAPREDUCE-898).
 If you are using branch 0.20, you are encouraged to use deprecated JobConf
 itself.
 You can try the following change in your code:
 Change the line   DistributedCache.addCacheFile(new
 Path(args[0]).toUri(), conf);
  to DistributedCache.addCacheFile(new Path(args[0]).toUri(),
 job.getConfiguration());

 Thanks
 Amareshwari

 On 4/16/10 2:27 AM, Ted Yu yuzhih...@gmail.com wrote:

 Please take a look at the loop starting at line 158 in TaskRunner.java:
p[i] = DistributedCache.getLocalCache(files[i], conf,
  new Path(baseDir),
  fileStatus,
  false, Long.parseLong(

 fileTimestamps[i]),
  new Path(workDir.
getAbsolutePath()),
  false);
  }
  DistributedCache.setLocalFiles(conf, stringifyPathArray(p));

 I think the confusing part is that DistributedCache.getLocalCacheFiles() is
 paired with DistributedCache.setLocalFiles()

 Cheers

 On Thu, Apr 15, 2010 at 1:16 PM, Larry Compton
 lawrence.comp...@gmail.comwrote:

  Ted,
 
  Thanks. I have looked at that example. The javadocs for DistributedCache
  still refer to deprecated classes, like JobConf. I'm trying to use the
  revised API.
 
  Larry
 
  On Thu, Apr 15, 2010 at 4:07 PM, Ted Yu yuzhih...@gmail.com wrote:
 
   Please see the sample within
   src\core\org\apache\hadoop\filecache\DistributedCache.java:
  
* JobConf job = new JobConf();
* DistributedCache.addCacheFile(new
   URI(/myapp/lookup.dat#lookup.dat),
*   job);
  
  
   On Thu, Apr 15, 2010 at 12:56 PM, Larry Compton
   lawrence.comp...@gmail.comwrote:
  
I'm trying to use the distributed cache in a MapReduce job written to
  the
new API (org.apache.hadoop.mapreduce.*). In my Tool class, a file
  path
   is
added to the distributed cache as follows:
   
   public int run(String[] args) throws Exception {
   Configuration conf = getConf();
   Job job = new Job(conf, Job);
   ...
   DistributedCache.addCacheFile(new Path(args[0]).toUri(),
 conf);
   ...
   return job.waitForCompletion(true) ? 0 : 1;
   }
   
The setup() method in my mapper tries to read the path as follows:
   
   protected void setup(Context context) throws IOException {
   Path[] paths = DistributedCache.getLocalCacheFiles(context
   .getConfiguration());
   }
   
But paths is null.
   
I'm assuming I'm setting up the distributed cache incorrectly. I've
  seen
   a
few hints in previous mailing list postings that indicate that the
distributed cache is accessed via the Job and JobContext objects in
 the
revised API, but the javadocs don't seem to support that.
   
Thanks.
Larry
   
  
 




o.a.h.mapreduce API and SequenceFile encoding format

2010-04-16 Thread Bo Shi
Hey Folks,

No luck on IRC; trying here:

I was playing around with 0.20.x and SequenceFileOutputFormat.  The
documentation doesn't specify any particular file encoding but I had
just assumed that it was some sort of raw binary format.  I see, after
inspecting the output that it was a false assumption... the file
encoding appears to be ascii hex pairs with space delimiters.  Is this
accurate?  After trolling the javadocs somemore, I found
SequenceFileAsBinaryOutputFormat but that class doesn't appear to have
an analog in the new o.a.h.mapreduce API.  Is it just not ported or am
i missing some other method of specifying file encoding in the new
mapreduce APIs?


Thanks,
Bo


Jetty returning 404s for everything

2010-04-16 Thread Robert Crocombe
I have a cluster running Cloudera's 0.20.1+152-1 version of Hadoop.  All was 
well, but there was an unfortunate power outage that affected just the 
namenode.  Everything seemed largely normal upon resumption (I did have to 
recreate the local version of hadoop.tmp.dir to get the namenode to start), 
but now I find that none of the status webpages is working: Jetty is 
returning 404s for everything.  The actual JobTracker appears fine: I am 
able to submit jobs and get results.  Here's what I see:


$ telnet localhost 50030
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET /jobtracker.jsp HTTP/1.1
Host: localhost

HTTP/1.1 404 /jobtracker.jsp
Content-Type: text/html; charset=iso-8859-1
Cache-Control: must-revalidate,no-cache,no-store
Content-Length: 1412
Server: Jetty(6.1.14)

html
head
meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
titleError 404 /jobtracker.jsp/title
/head
bodyh2HTTP ERROR: 404/h2pre/jobtracker.jsp/pre
pRequestURI=/jobtracker.jsp/ppismalla 
href=http://jetty.mortbay.org/;Powered by 
Jetty:///a/small/i/pbr/

br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/

/body
/html
^]
telnet quit

In contrast, another cluster running the slightly more up-to-date 
0.20.1+169.68.1 returns what you'd expect, e.g.


$ telnet localhost 50030 




Trying 127.0.0.1... 




Connected to localhost. 




Escape character is '^]'. 




GET /jobtracker.jsp HTTP/1.1 




Host: localhost 








HTTP/1.1 200 OK 




Content-Type: text/html; charset=utf-8 




Expires: Thu, 01 Jan 1970 00:00:00 GMT 




Set-Cookie: JSESSIONID=12c1udmu09jok;Path=/ 




Content-Length: 2851 




Server: Jetty(6.1.14) 






















html
head
titlehdp-nn-pri Hadoop Map/Reduce Administration/title
link rel=stylesheet type=text/css href=/static/hadoop.css
link rel=icon type=image/vnd.microsoft.icon 
href=/static/images/favicon.ico /
script type=text/javascript src=/static/jobtracker.js/script 

/head 

body 





h1hdp-nn-pri Hadoop Map/Reduce Administration/h1
.
.
.
etc.

I assume this stuff is under the control of the webapp directory and that 
appears identical between the two clusters: I did a recursive diff.  Anyway, 
I've looked at a bunch of things and don't see any problems, so I'm kind of 
at wits currently.


Any suggestions would be most appreciated.

--
Robert Crocombe



Splitting input for mapper and contiguous data

2010-04-16 Thread Andrew Nguyen
As I may have mentioned, my main goal currently is the processing of 
physiologic data using hadoop and MR.  The steps are:

Convert ADC units to physical units (input is sample num, raw value, output 
is sample num, physical value
Perform a peak detection to detect the systolic blood pressure (input is 
sample num, physical value, output is sample num, physical value but the 
output is only a subset of the input)
Calculate the central tendency measure using a sliding window (mapper input is 
sample num, physical value, mapper output is window ID, (sample num, 
physical value), reducer input is window ID, central tendency measurement at 
different radii )

Each of the above steps builds upon the result of the previous.  So, for the 
first two steps, I have been doing everything in the mapper and specified 0 
reduce tasks.  The last step, I am performing calculations on a sliding window 
of N points, skipping forward M points for the next window.  N is  M.  So, to 
implement this, I have a mapper that outputs all of the x,y points (the value) 
for a particular key (the window ID).  The reducer then performs the 
calculations on each window's data.  Everything works pretty well except that I 
noticed the splitting of the input across different mappers affects the final 
output.  Due to the nature of the calculations, this doesn't affect the end 
result very much.

However, I'm trying to make sure I understand everything properly, and I want 
to see if there is a better/proper way of implementing something like this.  
I'm guessing the problem comes from the fact that I'm trying to use contiguous 
data points to create a window of N points.  The window ID is just the first 
sample num encountered for the window.  As a result, the first sample num 
encountered will change for everything but the first map task, when compared to 
a serial execution.

Thanks!

--Andrew

Extremely slow HDFS after upgrade

2010-04-16 Thread Scott Carey
I have two clusters upgraded to CDH2.   One is performing fine, and the other 
is EXTREMELY slow.

Some jobs that formerly took 90 seconds, take 20 to 50 minutes.

It is an HDFS issue from what I can tell.

The simple DFS benchmark with one map task shows the problem clearly.  I have 
looked at every difference I can find and am wondering where else to look to 
track this down.
The disks on all nodes in the cluster check out -- capable of 75MB/sec minimum 
with a 'dd' write test.
top / iostat do not show any significant CPU usage or iowait times on any 
machines in the cluster during the test.
ifconfig does not report any dropped packets or other errors on any machine in 
the cluster.  dmesg has nothing interesting.
The poorly performing cluster is on a slightly newer CentOS version:
Poor: 2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 11:30:06 EDT 2010 x86_64 x86_64 
x86_64 GNU/Linux  (CentOS 5.4, recent patches)
Good: 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64 x86_64 x86_64 
GNU/Linux  (CentOS 5.3, I think)
The performance is always poor, not sporadically poor.  It is poor with M/R 
tasks as well as non-M/R HDFS clients (i.e. sqoop).

Poor performance cluster (no other jobs active during the test):
---
$ hadoop jar /usr/lib/hadoop/hadoop-0.20.1+169.68-test.jar TestDFSIO -write 
-nrFiles 1 -fileSize 2000
10/04/16 12:53:13 INFO mapred.FileInputFormat: nrFiles = 1
10/04/16 12:53:13 INFO mapred.FileInputFormat: fileSize (MB) = 2000
10/04/16 12:53:13 INFO mapred.FileInputFormat: bufferSize = 100
10/04/16 12:53:14 INFO mapred.FileInputFormat: creating control file: 2000 mega 
bytes, 1 files
10/04/16 12:53:14 INFO mapred.FileInputFormat: created control files for: 1 
files
10/04/16 12:53:14 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
10/04/16 12:53:15 INFO mapred.FileInputFormat: Total input paths to process : 1
10/04/16 12:53:15 INFO mapred.JobClient: Running job: job_201004091928_0391
10/04/16 12:53:16 INFO mapred.JobClient:  map 0% reduce 0%
10/04/16 13:42:30 INFO mapred.JobClient:  map 100% reduce 0%
10/04/16 13:43:06 INFO mapred.JobClient:  map 100% reduce 100%
10/04/16 13:43:07 INFO mapred.JobClient: Job complete: job_201004091928_0391
[snip]
10/04/16 13:43:07 INFO mapred.FileInputFormat: - TestDFSIO - : write
10/04/16 13:43:07 INFO mapred.FileInputFormat:Date  time: Fri Apr 
16 13:43:07 PDT 2010
10/04/16 13:43:07 INFO mapred.FileInputFormat:Number of files: 1
10/04/16 13:43:07 INFO mapred.FileInputFormat: Total MBytes processed: 2000
10/04/16 13:43:07 INFO mapred.FileInputFormat:  Throughput mb/sec: 
0.678296742615553
10/04/16 13:43:07 INFO mapred.FileInputFormat: Average IO rate mb/sec: 
0.6782967448234558
10/04/16 13:43:07 INFO mapred.FileInputFormat:  IO rate std deviation: 
9.568803140552889E-5
10/04/16 13:43:07 INFO mapred.FileInputFormat: Test exec time sec: 2992.913


Good performance cluster (other jobs active during the test):
-
hadoop jar /usr/lib/hadoop/hadoop-0.20.1+169.68-test.jar TestDFSIO -write 
-nrFiles 1 -fileSize 2000
10/04/16 12:50:52 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in 
the classpath. Usage of hadoop-site.xml is deprecated. Instead use 
core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of 
core-default.xml, mapred-default.xml and hdfs-default.xml respectively
TestFDSIO.0.0.4
10/04/16 12:50:52 INFO mapred.FileInputFormat: nrFiles = 1
10/04/16 12:50:52 INFO mapred.FileInputFormat: fileSize (MB) = 2000
10/04/16 12:50:52 INFO mapred.FileInputFormat: bufferSize = 100
10/04/16 12:50:52 INFO mapred.FileInputFormat: creating control file: 2000 mega 
bytes, 1 files
10/04/16 12:50:52 INFO mapred.FileInputFormat: created control files for: 1 
files
10/04/16 12:50:52 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
10/04/16 12:50:53 INFO mapred.FileInputFormat: Total input paths to process : 1
10/04/16 12:50:54 INFO mapred.JobClient: Running job: job_201003311607_4098
10/04/16 12:50:55 INFO mapred.JobClient:  map 0% reduce 0%
10/04/16 12:51:22 INFO mapred.JobClient:  map 100% reduce 0%
10/04/16 12:51:32 INFO mapred.JobClient:  map 100% reduce 100%
10/04/16 12:51:32 INFO mapred.JobClient: Job complete: job_201003311607_4098
[snip]
10/04/16 12:51:32 INFO mapred.FileInputFormat: - TestDFSIO - : write
10/04/16 12:51:32 INFO mapred.FileInputFormat:Date  time: Fri Apr 
16 12:51:32 PDT 2010
10/04/16 12:51:32 INFO mapred.FileInputFormat:Number of files: 1
10/04/16 12:51:32 INFO mapred.FileInputFormat: Total MBytes processed: 2000
10/04/16 12:51:32 INFO mapred.FileInputFormat:  Throughput mb/sec: 
92.47699634715865
10/04/16 12:51:32 INFO mapred.FileInputFormat: Average IO rate mb/sec: 
92.47699737548828

Re: Extremely slow HDFS after upgrade

2010-04-16 Thread Todd Lipcon
Hey Scott,

This is indeed really strange... if you do a straight hadoop fs -put with
dfs.replication set to 1 from one of the DNs, does it upload slow? That
would cut out the network from the equation.

-Todd

On Fri, Apr 16, 2010 at 5:29 PM, Scott Carey sc...@richrelevance.comwrote:

 I have two clusters upgraded to CDH2.   One is performing fine, and the
 other is EXTREMELY slow.

 Some jobs that formerly took 90 seconds, take 20 to 50 minutes.

 It is an HDFS issue from what I can tell.

 The simple DFS benchmark with one map task shows the problem clearly.  I
 have looked at every difference I can find and am wondering where else to
 look to track this down.
 The disks on all nodes in the cluster check out -- capable of 75MB/sec
 minimum with a 'dd' write test.
 top / iostat do not show any significant CPU usage or iowait times on any
 machines in the cluster during the test.
 ifconfig does not report any dropped packets or other errors on any machine
 in the cluster.  dmesg has nothing interesting.
 The poorly performing cluster is on a slightly newer CentOS version:
 Poor: 2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 11:30:06 EDT 2010 x86_64 x86_64
 x86_64 GNU/Linux  (CentOS 5.4, recent patches)
 Good: 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64 x86_64
 x86_64 GNU/Linux  (CentOS 5.3, I think)
 The performance is always poor, not sporadically poor.  It is poor with M/R
 tasks as well as non-M/R HDFS clients (i.e. sqoop).

 Poor performance cluster (no other jobs active during the test):
 ---
 $ hadoop jar /usr/lib/hadoop/hadoop-0.20.1+169.68-test.jar TestDFSIO -write
 -nrFiles 1 -fileSize 2000
 10/04/16 12:53:13 INFO mapred.FileInputFormat: nrFiles = 1
 10/04/16 12:53:13 INFO mapred.FileInputFormat: fileSize (MB) = 2000
 10/04/16 12:53:13 INFO mapred.FileInputFormat: bufferSize = 100
 10/04/16 12:53:14 INFO mapred.FileInputFormat: creating control file: 2000
 mega bytes, 1 files
 10/04/16 12:53:14 INFO mapred.FileInputFormat: created control files for: 1
 files
 10/04/16 12:53:14 WARN mapred.JobClient: Use GenericOptionsParser for
 parsing the arguments. Applications should implement Tool for the same.
 10/04/16 12:53:15 INFO mapred.FileInputFormat: Total input paths to process
 : 1
 10/04/16 12:53:15 INFO mapred.JobClient: Running job: job_201004091928_0391
 10/04/16 12:53:16 INFO mapred.JobClient:  map 0% reduce 0%
 10/04/16 13:42:30 INFO mapred.JobClient:  map 100% reduce 0%
 10/04/16 13:43:06 INFO mapred.JobClient:  map 100% reduce 100%
 10/04/16 13:43:07 INFO mapred.JobClient: Job complete:
 job_201004091928_0391
 [snip]
 10/04/16 13:43:07 INFO mapred.FileInputFormat: - TestDFSIO - :
 write
 10/04/16 13:43:07 INFO mapred.FileInputFormat:Date  time: Fri
 Apr 16 13:43:07 PDT 2010
 10/04/16 13:43:07 INFO mapred.FileInputFormat:Number of files: 1
 10/04/16 13:43:07 INFO mapred.FileInputFormat: Total MBytes processed: 2000
 10/04/16 13:43:07 INFO mapred.FileInputFormat:  Throughput mb/sec:
 0.678296742615553
 10/04/16 13:43:07 INFO mapred.FileInputFormat: Average IO rate mb/sec:
 0.6782967448234558
 10/04/16 13:43:07 INFO mapred.FileInputFormat:  IO rate std deviation:
 9.568803140552889E-5
 10/04/16 13:43:07 INFO mapred.FileInputFormat: Test exec time sec:
 2992.913
 

 Good performance cluster (other jobs active during the test):
 -
 hadoop jar /usr/lib/hadoop/hadoop-0.20.1+169.68-test.jar TestDFSIO -write
 -nrFiles 1 -fileSize 2000
 10/04/16 12:50:52 WARN conf.Configuration: DEPRECATED: hadoop-site.xml
 found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use
 core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of
 core-default.xml, mapred-default.xml and hdfs-default.xml respectively
 TestFDSIO.0.0.4
 10/04/16 12:50:52 INFO mapred.FileInputFormat: nrFiles = 1
 10/04/16 12:50:52 INFO mapred.FileInputFormat: fileSize (MB) = 2000
 10/04/16 12:50:52 INFO mapred.FileInputFormat: bufferSize = 100
 10/04/16 12:50:52 INFO mapred.FileInputFormat: creating control file: 2000
 mega bytes, 1 files
 10/04/16 12:50:52 INFO mapred.FileInputFormat: created control files for: 1
 files
 10/04/16 12:50:52 WARN mapred.JobClient: Use GenericOptionsParser for
 parsing the arguments. Applications should implement Tool for the same.
 10/04/16 12:50:53 INFO mapred.FileInputFormat: Total input paths to process
 : 1
 10/04/16 12:50:54 INFO mapred.JobClient: Running job: job_201003311607_4098
 10/04/16 12:50:55 INFO mapred.JobClient:  map 0% reduce 0%
 10/04/16 12:51:22 INFO mapred.JobClient:  map 100% reduce 0%
 10/04/16 12:51:32 INFO mapred.JobClient:  map 100% reduce 100%
 10/04/16 12:51:32 INFO mapred.JobClient: Job complete:
 job_201003311607_4098
 [snip]
 10/04/16 12:51:32 INFO mapred.FileInputFormat: - TestDFSIO - :
 write
 10/04/16 12:51:32 INFO mapred.FileInputFormat:Date  time: Fri
 Apr 16 

Re: Extremely slow HDFS after upgrade

2010-04-16 Thread Scott Carey
Ok, so here is a ... fun result.

I have dfs.replication.min set to 2, so I can't just do
hsdoop fs -Ddfs.replication=1 put someFile someFile
Since that will fail.

So here are two results that are fascinating:

$ time hadoop fs -Ddfs.replication=3 -put test.tar test.tar
real1m53.237s
user0m1.952s
sys 0m0.308s

$ time hadoop fs -Ddfs.replication=2 -put test.tar test.tar
real0m1.689s
user0m1.763s
sys 0m0.315s



The file is 77MB and so is two blocks.
The test with replication level 3 is slow about 9 out of 10 times.  When it is 
slow it sometimes is 28 seconds, sometimes 2 minutes.  It was fast one time...
The test with replication level 2 is fast in 40 out of 40 tests.

This is a development cluster with 8 nodes.

It looks like the replication level of 3 or more causes trouble.  Looking more 
closely at the logs, it seems that certain datanodes (but not all) cause large 
delays if they are in the middle of an HDFS write chain.  So, a write that goes 
from A  B  C is fast if B is a good node and C a bad node.  If its A  C  B 
then its slow.

So, I can say that some nodes but not all are doing something wrong. when in 
the middle of a write chain.  If I do a replication = 2 write on one of these 
bad nodes, its always slow.

So the good news is I can identify the bad nodes, and decomission them.  The 
bad news is this still doesn't make a lot of sense, and 40% of the nodes have 
the issue.  Worse, on a couple nodes the behavior in the replication = 2 case 
is not consistent -- sometimes the first block is fast.  So it may be dependent 
on not just the source, but the source  target combination in the chain.


At this point, I suspect something completely broken at the network level, 
perhaps even routing.  Why it would show up after an upgrade is yet to be 
determined, but the upgrade did include some config changes and OS updates.

Thanks Todd!

-Scott


On Apr 16, 2010, at 5:34 PM, Todd Lipcon wrote:

 Hey Scott,

 This is indeed really strange... if you do a straight hadoop fs -put with
 dfs.replication set to 1 from one of the DNs, does it upload slow? That
 would cut out the network from the equation.

 -Todd

 On Fri, Apr 16, 2010 at 5:29 PM, Scott Carey sc...@richrelevance.comwrote:

 I have two clusters upgraded to CDH2.   One is performing fine, and the
 other is EXTREMELY slow.

 Some jobs that formerly took 90 seconds, take 20 to 50 minutes.

 It is an HDFS issue from what I can tell.

 The simple DFS benchmark with one map task shows the problem clearly.  I
 have looked at every difference I can find and am wondering where else to
 look to track this down.
 The disks on all nodes in the cluster check out -- capable of 75MB/sec
 minimum with a 'dd' write test.
 top / iostat do not show any significant CPU usage or iowait times on any
 machines in the cluster during the test.
 ifconfig does not report any dropped packets or other errors on any machine
 in the cluster.  dmesg has nothing interesting.
 The poorly performing cluster is on a slightly newer CentOS version:
 Poor: 2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 11:30:06 EDT 2010 x86_64 x86_64
 x86_64 GNU/Linux  (CentOS 5.4, recent patches)
 Good: 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64 x86_64
 x86_64 GNU/Linux  (CentOS 5.3, I think)
 The performance is always poor, not sporadically poor.  It is poor with M/R
 tasks as well as non-M/R HDFS clients (i.e. sqoop).

 Poor performance cluster (no other jobs active during the test):
 ---
 $ hadoop jar /usr/lib/hadoop/hadoop-0.20.1+169.68-test.jar TestDFSIO -write
 -nrFiles 1 -fileSize 2000
 10/04/16 12:53:13 INFO mapred.FileInputFormat: nrFiles = 1
 10/04/16 12:53:13 INFO mapred.FileInputFormat: fileSize (MB) = 2000
 10/04/16 12:53:13 INFO mapred.FileInputFormat: bufferSize = 100
 10/04/16 12:53:14 INFO mapred.FileInputFormat: creating control file: 2000
 mega bytes, 1 files
 10/04/16 12:53:14 INFO mapred.FileInputFormat: created control files for: 1
 files
 10/04/16 12:53:14 WARN mapred.JobClient: Use GenericOptionsParser for
 parsing the arguments. Applications should implement Tool for the same.
 10/04/16 12:53:15 INFO mapred.FileInputFormat: Total input paths to process
 : 1
 10/04/16 12:53:15 INFO mapred.JobClient: Running job: job_201004091928_0391
 10/04/16 12:53:16 INFO mapred.JobClient:  map 0% reduce 0%
 10/04/16 13:42:30 INFO mapred.JobClient:  map 100% reduce 0%
 10/04/16 13:43:06 INFO mapred.JobClient:  map 100% reduce 100%
 10/04/16 13:43:07 INFO mapred.JobClient: Job complete:
 job_201004091928_0391
 [snip]
 10/04/16 13:43:07 INFO mapred.FileInputFormat: - TestDFSIO - :
 write
 10/04/16 13:43:07 INFO mapred.FileInputFormat:Date  time: Fri
 Apr 16 13:43:07 PDT 2010
 10/04/16 13:43:07 INFO mapred.FileInputFormat:Number of files: 1
 10/04/16 13:43:07 INFO mapred.FileInputFormat: Total MBytes processed: 2000
 10/04/16 13:43:07 INFO mapred.FileInputFormat:  Throughput 

Re: Extremely slow HDFS after upgrade

2010-04-16 Thread Scott Carey
More info -- this is not a Hadoop issue.

The network performance issue can be replicated with SSH only on the links 
where Hadoop has a problem, and only in the direction with a problem.

HDFS is slow to transfer data in certain directions from certain machines.

So, for example, copying from node C to D may be slow, but not the other 
direction from C to D.  Likewise, although only 3 of 8 nodes have this problem, 
it is not universal.  For example, node C might have trouble copying data to 5 
of the 7 other nodes, and node G might have trouble with all 7 other nodes.

No idea what it is yet, but SSH exhibits the same issue -- only in those 
specific point-to-point links in one specific direction.

-Scott

On Apr 16, 2010, at 7:10 PM, Scott Carey wrote:

 Ok, so here is a ... fun result.

 I have dfs.replication.min set to 2, so I can't just do
 hsdoop fs -Ddfs.replication=1 put someFile someFile
 Since that will fail.

 So here are two results that are fascinating:

 $ time hadoop fs -Ddfs.replication=3 -put test.tar test.tar
 real1m53.237s
 user0m1.952s
 sys 0m0.308s

 $ time hadoop fs -Ddfs.replication=2 -put test.tar test.tar
 real0m1.689s
 user0m1.763s
 sys 0m0.315s



 The file is 77MB and so is two blocks.
 The test with replication level 3 is slow about 9 out of 10 times.  When it 
 is slow it sometimes is 28 seconds, sometimes 2 minutes.  It was fast one 
 time...
 The test with replication level 2 is fast in 40 out of 40 tests.

 This is a development cluster with 8 nodes.

 It looks like the replication level of 3 or more causes trouble.  Looking 
 more closely at the logs, it seems that certain datanodes (but not all) cause 
 large delays if they are in the middle of an HDFS write chain.  So, a write 
 that goes from A  B  C is fast if B is a good node and C a bad node.  If 
 its A  C  B then its slow.

 So, I can say that some nodes but not all are doing something wrong. when in 
 the middle of a write chain.  If I do a replication = 2 write on one of these 
 bad nodes, its always slow.

 So the good news is I can identify the bad nodes, and decomission them.  The 
 bad news is this still doesn't make a lot of sense, and 40% of the nodes have 
 the issue.  Worse, on a couple nodes the behavior in the replication = 2 case 
 is not consistent -- sometimes the first block is fast.  So it may be 
 dependent on not just the source, but the source  target combination in the 
 chain.


 At this point, I suspect something completely broken at the network level, 
 perhaps even routing.  Why it would show up after an upgrade is yet to be 
 determined, but the upgrade did include some config changes and OS updates.

 Thanks Todd!

 -Scott


 On Apr 16, 2010, at 5:34 PM, Todd Lipcon wrote:

 Hey Scott,

 This is indeed really strange... if you do a straight hadoop fs -put with
 dfs.replication set to 1 from one of the DNs, does it upload slow? That
 would cut out the network from the equation.

 -Todd

 On Fri, Apr 16, 2010 at 5:29 PM, Scott Carey sc...@richrelevance.comwrote:

 I have two clusters upgraded to CDH2.   One is performing fine, and the
 other is EXTREMELY slow.

 Some jobs that formerly took 90 seconds, take 20 to 50 minutes.

 It is an HDFS issue from what I can tell.

 The simple DFS benchmark with one map task shows the problem clearly.  I
 have looked at every difference I can find and am wondering where else to
 look to track this down.
 The disks on all nodes in the cluster check out -- capable of 75MB/sec
 minimum with a 'dd' write test.
 top / iostat do not show any significant CPU usage or iowait times on any
 machines in the cluster during the test.
 ifconfig does not report any dropped packets or other errors on any machine
 in the cluster.  dmesg has nothing interesting.
 The poorly performing cluster is on a slightly newer CentOS version:
 Poor: 2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 11:30:06 EDT 2010 x86_64 x86_64
 x86_64 GNU/Linux  (CentOS 5.4, recent patches)
 Good: 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64 x86_64
 x86_64 GNU/Linux  (CentOS 5.3, I think)
 The performance is always poor, not sporadically poor.  It is poor with M/R
 tasks as well as non-M/R HDFS clients (i.e. sqoop).

 Poor performance cluster (no other jobs active during the test):
 ---
 $ hadoop jar /usr/lib/hadoop/hadoop-0.20.1+169.68-test.jar TestDFSIO -write
 -nrFiles 1 -fileSize 2000
 10/04/16 12:53:13 INFO mapred.FileInputFormat: nrFiles = 1
 10/04/16 12:53:13 INFO mapred.FileInputFormat: fileSize (MB) = 2000
 10/04/16 12:53:13 INFO mapred.FileInputFormat: bufferSize = 100
 10/04/16 12:53:14 INFO mapred.FileInputFormat: creating control file: 2000
 mega bytes, 1 files
 10/04/16 12:53:14 INFO mapred.FileInputFormat: created control files for: 1
 files
 10/04/16 12:53:14 WARN mapred.JobClient: Use GenericOptionsParser for
 parsing the arguments. Applications should implement Tool for the same.
 10/04/16 12:53:15 

Re: Extremely slow HDFS after upgrade

2010-04-16 Thread Todd Lipcon
Checked link autonegotiation with ethtool? Sometimes gige will autoneg to
10mb half duplex if there's a bad cable, NIC, or switch port.

-Todd

On Fri, Apr 16, 2010 at 8:08 PM, Scott Carey sc...@richrelevance.comwrote:

 More info -- this is not a Hadoop issue.

 The network performance issue can be replicated with SSH only on the links
 where Hadoop has a problem, and only in the direction with a problem.

 HDFS is slow to transfer data in certain directions from certain machines.

 So, for example, copying from node C to D may be slow, but not the other
 direction from C to D.  Likewise, although only 3 of 8 nodes have this
 problem, it is not universal.  For example, node C might have trouble
 copying data to 5 of the 7 other nodes, and node G might have trouble with
 all 7 other nodes.

 No idea what it is yet, but SSH exhibits the same issue -- only in those
 specific point-to-point links in one specific direction.

 -Scott

 On Apr 16, 2010, at 7:10 PM, Scott Carey wrote:

  Ok, so here is a ... fun result.
 
  I have dfs.replication.min set to 2, so I can't just do
  hsdoop fs -Ddfs.replication=1 put someFile someFile
  Since that will fail.
 
  So here are two results that are fascinating:
 
  $ time hadoop fs -Ddfs.replication=3 -put test.tar test.tar
  real1m53.237s
  user0m1.952s
  sys 0m0.308s
 
  $ time hadoop fs -Ddfs.replication=2 -put test.tar test.tar
  real0m1.689s
  user0m1.763s
  sys 0m0.315s
 
 
 
  The file is 77MB and so is two blocks.
  The test with replication level 3 is slow about 9 out of 10 times.  When
 it is slow it sometimes is 28 seconds, sometimes 2 minutes.  It was fast one
 time...
  The test with replication level 2 is fast in 40 out of 40 tests.
 
  This is a development cluster with 8 nodes.
 
  It looks like the replication level of 3 or more causes trouble.  Looking
 more closely at the logs, it seems that certain datanodes (but not all)
 cause large delays if they are in the middle of an HDFS write chain.  So, a
 write that goes from A  B  C is fast if B is a good node and C a bad node.
  If its A  C  B then its slow.
 
  So, I can say that some nodes but not all are doing something wrong. when
 in the middle of a write chain.  If I do a replication = 2 write on one of
 these bad nodes, its always slow.
 
  So the good news is I can identify the bad nodes, and decomission them.
  The bad news is this still doesn't make a lot of sense, and 40% of the
 nodes have the issue.  Worse, on a couple nodes the behavior in the
 replication = 2 case is not consistent -- sometimes the first block is fast.
  So it may be dependent on not just the source, but the source  target
 combination in the chain.
 
 
  At this point, I suspect something completely broken at the network
 level, perhaps even routing.  Why it would show up after an upgrade is yet
 to be determined, but the upgrade did include some config changes and OS
 updates.
 
  Thanks Todd!
 
  -Scott
 
 
  On Apr 16, 2010, at 5:34 PM, Todd Lipcon wrote:
 
  Hey Scott,
 
  This is indeed really strange... if you do a straight hadoop fs -put
 with
  dfs.replication set to 1 from one of the DNs, does it upload slow? That
  would cut out the network from the equation.
 
  -Todd
 
  On Fri, Apr 16, 2010 at 5:29 PM, Scott Carey sc...@richrelevance.com
 wrote:
 
  I have two clusters upgraded to CDH2.   One is performing fine, and the
  other is EXTREMELY slow.
 
  Some jobs that formerly took 90 seconds, take 20 to 50 minutes.
 
  It is an HDFS issue from what I can tell.
 
  The simple DFS benchmark with one map task shows the problem clearly.
  I
  have looked at every difference I can find and am wondering where else
 to
  look to track this down.
  The disks on all nodes in the cluster check out -- capable of 75MB/sec
  minimum with a 'dd' write test.
  top / iostat do not show any significant CPU usage or iowait times on
 any
  machines in the cluster during the test.
  ifconfig does not report any dropped packets or other errors on any
 machine
  in the cluster.  dmesg has nothing interesting.
  The poorly performing cluster is on a slightly newer CentOS version:
  Poor: 2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 11:30:06 EDT 2010 x86_64
 x86_64
  x86_64 GNU/Linux  (CentOS 5.4, recent patches)
  Good: 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64 x86_64
  x86_64 GNU/Linux  (CentOS 5.3, I think)
  The performance is always poor, not sporadically poor.  It is poor with
 M/R
  tasks as well as non-M/R HDFS clients (i.e. sqoop).
 
  Poor performance cluster (no other jobs active during the test):
  ---
  $ hadoop jar /usr/lib/hadoop/hadoop-0.20.1+169.68-test.jar TestDFSIO
 -write
  -nrFiles 1 -fileSize 2000
  10/04/16 12:53:13 INFO mapred.FileInputFormat: nrFiles = 1
  10/04/16 12:53:13 INFO mapred.FileInputFormat: fileSize (MB) = 2000
  10/04/16 12:53:13 INFO mapred.FileInputFormat: bufferSize = 100
  10/04/16 12:53:14 INFO