ahhh, apologies for badmouthing hadoop...
So, I finally disocvered one problem that may have caused this kind of
degradation of performance. After growing the data set even larger to 35gb,
Hadoop crashed with disk full error. It would appear that the system will
actually continue to work when the disk is almost full, but there appear to
be some thing that causes it to slow down. Does hdfs juggle blocks around
when there isn't enough space on a slave machine? That would explain why it
was slowing down so much when the fs is almost full... Another part of this
is that I've updated by expectation of the speedup, accounting for the sort
that is happening, it is indeed faster.
I've upgraded to 0.18.2, and I now see the exception that is slowing down
the reducer near the end of the run(pasted below),any suggestions on this
one?
at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
2008-12-12 12:06:28,403 WARN /:
/mapOutput?job=job_200812121139_0001map=attempt_200812121139_0001_m_000114_0reduce=2:
java.lang.IllegalStateException: Committed
at
org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212)
at
org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375)
at
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2504)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
at
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
at
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
at org.mortbay.http.HttpServer.service(HttpServer.java:954)
at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
at
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
2008-12-12 12:06:31,107 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200812121139_0001_r_07_0 0.29801327% reduce copy (135 of 151
at 1.42 MB/s)
On Thu, Dec 11, 2008 at 12:13 PM, hc busy hc.b...@gmail.com wrote:
And aside from refusing to declare task complete after everything is 100%,
I also notied that the mapper seems too slow. It's taking the same amount of
time for 4 machines to read and write through the 30gb file as if I did it
with a /bin/cat on one machine. Do you guys have any suggestions with
regards to these two problems?
On Wed, Dec 10, 2008 at 4:37 PM, hc busy hc.b...@gmail.com wrote:
Guys, I've just configured a hadoop cluster for the first time, and I'm
running a null map-reduction over the streaming interface. (/bin/cat for
both map and reducer). So I noticed that the mapper and reducer complete
100% in the web ui within a reasonable amount of time, but the job does not
complete. On command line it displays
...INFO streaming.StreamJob: map 100% reduce 100%
In the web ui, it shows map completion graph is 100%, but does not display
a reduce completion graph. The four machines are well equiped to handle the
size of data (30gb). Looking at the task tracker on each of the machines, I
noticed that it is ticking down the percents very very slowly:
2008-12-10 16:18:55,265 INFO org.apache.hadoop.mapred.TaskTracker:
task_200812101532_0001_r_02_0 46.684883% Records R/W=149326846/149326834
reduce
2008-12-10 16:18:57,055 INFO org.apache.hadoop.mapred.TaskTracker:
task_200812101532_0001_r_06_0 47.566963% Records R/W=151739348/151739342
reduce
2008-12-10 16:18:58,268 INFO org.apache.hadoop.mapred.TaskTracker:
task_200812101532_0001_r_02_0 46.826576% Records R/W=149326846/149326834
reduce
2008-12-10 16:19:00,058 INFO org.apache.hadoop.mapred.TaskTracker:
task_200812101532_0001_r_06_0 47.741756% Records R/W=153377016/153376990
reduce
2008-12-10 16:19:01,271 INFO org.apache.hadoop.mapred.TaskTracker:
task_200812101532_0001_r_02_0 46.9636% Records R/W=149326846/149326834
reduce
2008-12-10 16:19:03,061 INFO org.apache.hadoop.mapred.TaskTracker:
task_200812101532_0001_r_06_0 47.94259% Records R/W=153377016/153376990
reduce
2008-12-10 16:19:04,274 INFO org.apache.hadoop.mapred.TaskTracker:
task_200812101532_0001_r_02_0 47.110992% Records R/W=150960648/150960644
reduce
so it would continue like this for hours and hours. What buffer am I
setting too small, or what could possiblly make it go so slow?? I've worked
on hadoop clusters