[ 
https://issues.apache.org/jira/browse/HADOOP-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13590183#comment-13590183
 ] 

Todd Lipcon commented on HADOOP-8029:
-------------------------------------

One thought: both in trunk and in branch-1, it seems like we should eventually 
disable fadvise - otherwise performance is still going to be terrible because 
it will spit a WARN out for every chunk read. Maybe something like, if we fail 
to fadvise, then we disable it for the next 60 seconds, so at the worst case we 
only log once a minute instead of potentially tens or hundreds of times per 
second?
                
> org.apache.hadoop.io.nativeio.NativeIO.posixFadviseIfPossible does not handle 
> EINVAL
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8029
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8029
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: native
>    Affects Versions: 0.20.205.0
>         Environment: Debian Wheezy 64-bit 
> uname -a = "Linux desktop 3.1.0-1-amd64 #1 SMP Tue Jan 10 05:01:58 UTC 2012 
> x86_64 GNU/Linux" 
> cat /etc/issue = "Debian GNU/Linux wheezy/sid \n \l" 
> /etc/apt/sources.list = " 
> deb http://ftp.us.debian.org/debian/ wheezy main contrib non-free 
> deb-src http://ftp.us.debian.org/debian/ wheezy main contrib non-free 
> deb http://security.debian.org/ wheezy/updates main contrib non-free 
> deb-src http://security.debian.org/ wheezy/updates main contrib non-free 
> deb http://archive.cloudera.com/debian squeeze-cdh3 contrib 
> deb-src http://archive.cloudera.com/debian squeeze-cdh3 contrib" 
> Hadoop specific configuration (disabled permissions, pseudo-distributed mode, 
> replication set to 1, from my own blog post here: http://j.mp/tsVBR4
>            Reporter: Tim Mattison
>         Attachments: HADOOP-8029.001.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> When Hadoop's directories reside on tmpfs in Debian Wheezy (and possibly all 
> Linux 3.1 distros) in an installation that is using the native libraries 
> fadvise returns EINVAL when trying to run a MapReduce job.  Since EINVAL 
> isn't handled all MapReduce jobs report "Map output lost, rescheduling: 
> getMapOutput".
> A full stack trace for this issue looks like this:
> [exec] 12/02/03 09:50:58 INFO mapred.JobClient: Task Id : 
> attempt_201202030949_0001_m_000000_0, Status : FAILED
> [exec] Map output lost, rescheduling: 
> getMapOutput(attempt_201202030949_0001_m_000000_0,0) failed :
> [exec] EINVAL: Invalid argument
> [exec] at org.apache.hadoop.io.nativeio.NativeIO.posix_fadvise(Native Method)
> [exec] at 
> org.apache.hadoop.io.nativeio.NativeIO.posixFadviseIfPossible(NativeIO.java:177)
> [exec] at 
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:4026)
> [exec] at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> [exec] at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> [exec] at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> [exec] at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> [exec] at 
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:829)
> [exec] at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> [exec] at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> [exec] at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> Some logic will need to be implemented to handle EINVAL to properly support 
> all file systems.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to