Re: Child Error
d restart my >>>>>> cluster? >>>>>> >>>>>> Thank you. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, May 23, 2013 at 7:14 PM, Jim Twensky >>>>>> wrote: >>>>>> >>>>>>> Hello, I have a 20 node Hadoop cluster where each node has 8GB >>>>>>> memory and an 8-core processor. I sometimes get the following error on a >>>>>>> random basis: >>>>>>> >>>>>>> >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> Exception in thread "main" java.io.IOException: Exception reading >>>>>>> file:/var/tmp/jim/hadoop-jim/mapred/local/taskTracker/jim/jobcache/job_201305231647_0005/jobToken >>>>>>> at >>>>>>> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:135) >>>>>>> at >>>>>>> org.apache.hadoop.mapreduce.security.TokenCache.loadTokens(TokenCache.java:165) >>>>>>> at org.apache.hadoop.mapred.Child.main(Child.java:92) >>>>>>> Caused by: java.io.IOException: failure to login >>>>>>> at >>>>>>> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:501) >>>>>>> at >>>>>>> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:463) >>>>>>> at >>>>>>> org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:1519) >>>>>>> at >>>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1420) >>>>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) >>>>>>> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) >>>>>>> at >>>>>>> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:129) >>>>>>> ... 2 more >>>>>>> Caused by: javax.security.auth.login.LoginException: >>>>>>> java.lang.NullPointerException: invalid null input: name >>>>>>> at >>>>>>> com.sun.security.auth.UnixPrincipal.(UnixPrincipal.java:70) >>>>>>> at >>>>>>> com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:132) >>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>> at >>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>>>>> at >>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>>> >>>>>>> .. >>>>>>> >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> This does not always happen but I see a pattern when the >>>>>>> intermediate data is larger, it tends to occur more frequently. In the >>>>>>> web >>>>>>> log, I can see the following: >>>>>>> >>>>>>> java.lang.Throwable: Child Error >>>>>>> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) >>>>>>> Caused by: java.io.IOException: Task process exit with nonzero status >>>>>>> of 1. >>>>>>> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) >>>>>>> >>>>>>> From what I read online, a possible cause is when there is not >>>>>>> enough memory for all JVM's. My mapred site.xml is set up to allocate >>>>>>> 1100MB for each child and the maximum number of map and reduce tasks are >>>>>>> set to 3 - So 6600MB of the child JVMs + (500MB * 2) for the data node >>>>>>> and >>>>>>> task tracker (as I set HADOOP_HEAP to 500 MB). I feel like memory is not >>>>>>> the cause but I couldn't avoid it so far. >>>>>>> In case it helps, here are the relevant sections of my >>>>>>> mapred-site.xml >>>>>>> >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> mapred.tasktracker.map.tasks.maximum >>>>>>> 3 >>>>>>> >>>>>>> mapred.tasktracker.reduce.tasks.maximum >>>>>>> 3 >>>>>>> >>>>>>> mapred.child.java.opts >>>>>>> -Xmx1100M -ea -XX:+HeapDumpOnOutOfMemoryError >>>>>>> -XX:HeapDumpPath=/var/tmp/soner >>>>>>> >>>>>>> mapred.reduce.parallel.copies >>>>>>> 5 >>>>>>> >>>>>>> tasktracker.http.threads >>>>>>> 80 >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> My jobs still complete most of the time though they occasionally >>>>>>> fail and I'm really puzzled at this point. I'd appreciate any help or >>>>>>> ideas. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >
Re: Child Error
java version "1.7.0_21" >>> OpenJDK Runtime Environment (IcedTea 2.3.9) (7u21-2.3.9-0ubuntu0.12.10.1) >>> OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode) >>> >>> I don't get any OOME errors and this error happens on random nodes, not >>> a particular one. Usually all tasks running on a particular node fail and >>> that node gets blacklisted. However, the same node works just fine during >>> the next or previous jobs. Can it be a problem with the ssh keys? What else >>> can cause the IOException with "failure to login" message? I've been >>> digging into this for two days but I'm almost clueless. >>> >>> Thanks, >>> Jim >>> >>> >>> >>> >>> On Fri, May 24, 2013 at 10:32 PM, Jean-Marc Spaggiari < >>> jean-m...@spaggiari.org> wrote: >>> >>>> Hi Jim, >>>> >>>> Which JVM are you using? >>>> >>>> I don't think you have any memory issue. Else you will have got some >>>> OOME... >>>> >>>> JM >>>> >>>> >>>> 2013/5/24 Jim Twensky >>>> >>>>> Hi again, in addition to my previous post, I was able to get some >>>>> error logs from the task tracker/data node this morning and looks like it >>>>> might be a jetty issue: >>>>> >>>>> 2013-05-23 19:59:20,595 WARN org.apache.hadoop.mapred.TaskLog: Failed >>>>> to retrieve stdout log for task: attempt_201305231647_0007_m_001096_0 >>>>> java.io.IOException: Owner 'jim' for path >>>>> /var/tmp/jim/hadoop-logs/userlogs/job_201305231647_0007/attempt_201305231647_0007_m_001096_0/stdout >>>>> did not match expected owner '10929' >>>>> at >>>>> org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:177) >>>>> at >>>>> org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:117) >>>>> at org.apache.hadoop.mapred.TaskLog$Reader.(TaskLog.java:455) >>>>> at >>>>> org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:81) >>>>> at >>>>> org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:296) >>>>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) >>>>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) >>>>> at >>>>> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) >>>>> at >>>>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) >>>>> at >>>>> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:848) >>>>> at >>>>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) >>>>> at >>>>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) >>>>> >>>>> >>>>> I am wondering if I am hitting >>>>> MAPREDUCE-2389<https://issues.apache.org/jira/browse/MAPREDUCE-2389>If >>>>> so, how do I downgrade my jetty version? Should I just replace the jetty >>>>> jar file in the lib directory with an earlier version and restart my >>>>> cluster? >>>>> >>>>> Thank you. >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, May 23, 2013 at 7:14 PM, Jim Twensky wrote: >>>>> >>>>>> Hello, I have a 20 node Hadoop cluster where each node has 8GB memory >>>>>> and an 8-core processor. I sometimes get the following error on a random >>>>>> basis: >>>>>> >>>>>> >>>>>> >>>>>> --- >>>>>> >>>>>> Exception in thread "main" java.io.IOException: Exception reading >>>>>> file:/var/tmp/jim/hadoop-jim/mapred/local/taskTracker/jim/jobcache/job_201305231647_0005/jobToken >>>>>> at >>>>>> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:135) >>>>>> at >>>>>> org.apache.hadoop.mapreduce.security.TokenCache.loadTokens(TokenCache.java:165) >>>>>> at org.apach
Re: Child Error
>> >> On Fri, May 24, 2013 at 10:32 PM, Jean-Marc Spaggiari < >> jean-m...@spaggiari.org> wrote: >> >>> Hi Jim, >>> >>> Which JVM are you using? >>> >>> I don't think you have any memory issue. Else you will have got some >>> OOME... >>> >>> JM >>> >>> >>> 2013/5/24 Jim Twensky >>> >>>> Hi again, in addition to my previous post, I was able to get some error >>>> logs from the task tracker/data node this morning and looks like it might >>>> be a jetty issue: >>>> >>>> 2013-05-23 19:59:20,595 WARN org.apache.hadoop.mapred.TaskLog: Failed >>>> to retrieve stdout log for task: attempt_201305231647_0007_m_001096_0 >>>> java.io.IOException: Owner 'jim' for path >>>> /var/tmp/jim/hadoop-logs/userlogs/job_201305231647_0007/attempt_201305231647_0007_m_001096_0/stdout >>>> did not match expected owner '10929' >>>> at >>>> org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:177) >>>> at >>>> org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:117) >>>> at org.apache.hadoop.mapred.TaskLog$Reader.(TaskLog.java:455) >>>> at >>>> org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:81) >>>> at >>>> org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:296) >>>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) >>>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) >>>> at >>>> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) >>>> at >>>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) >>>> at >>>> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:848) >>>> at >>>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) >>>> at >>>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) >>>> >>>> >>>> I am wondering if I am hitting >>>> MAPREDUCE-2389<https://issues.apache.org/jira/browse/MAPREDUCE-2389>If so, >>>> how do I downgrade my jetty version? Should I just replace the jetty >>>> jar file in the lib directory with an earlier version and restart my >>>> cluster? >>>> >>>> Thank you. >>>> >>>> >>>> >>>> >>>> On Thu, May 23, 2013 at 7:14 PM, Jim Twensky wrote: >>>> >>>>> Hello, I have a 20 node Hadoop cluster where each node has 8GB memory >>>>> and an 8-core processor. I sometimes get the following error on a random >>>>> basis: >>>>> >>>>> >>>>> >>>>> --- >>>>> >>>>> Exception in thread "main" java.io.IOException: Exception reading >>>>> file:/var/tmp/jim/hadoop-jim/mapred/local/taskTracker/jim/jobcache/job_201305231647_0005/jobToken >>>>> at >>>>> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:135) >>>>> at >>>>> org.apache.hadoop.mapreduce.security.TokenCache.loadTokens(TokenCache.java:165) >>>>> at org.apache.hadoop.mapred.Child.main(Child.java:92) >>>>> Caused by: java.io.IOException: failure to login >>>>> at >>>>> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:501) >>>>> at >>>>> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:463) >>>>> at >>>>> org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:1519) >>>>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1420) >>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) >>>>> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) >>>>> at >>>>> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:129) >>>>> ... 2 more >>>>> Caused by: javax.security.auth.login.LoginException: >>>>> java.lang.NullPointerException: invalid null input: name >>>>> at com.sun.secu
Re: Child Error
.fs.FileSystem$Cache$Key.(FileSystem.java:1519) >>>>at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1420) >>>>at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) >>>>at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) >>>>at >>>> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:129) >>>>... 2 more >>>> Caused by: javax.security.auth.login.LoginException: >>>> java.lang.NullPointerException: invalid null input: name >>>>at com.sun.security.auth.UnixPrincipal.(UnixPrincipal.java:70) >>>>at >>>> com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:132) >>>>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>>at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>> >>>> .. >>>> >>>> >>>> --- >>>> >>>> This does not always happen but I see a pattern when the intermediate >>>> data is larger, it tends to occur more frequently. In the web log, I can >>>> see the following: >>>> >>>> java.lang.Throwable: Child Error >>>>at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) >>>> Caused by: java.io.IOException: Task process exit with nonzero status of 1. >>>>at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) >>>> >>>> From what I read online, a possible cause is when there is not enough >>>> memory for all JVM's. My mapred site.xml is set up to allocate 1100MB for >>>> each child and the maximum number of map and reduce tasks are set to 3 - So >>>> 6600MB of the child JVMs + (500MB * 2) for the data node and task tracker >>>> (as I set HADOOP_HEAP to 500 MB). I feel like memory is not the cause but I >>>> couldn't avoid it so far. >>>> In case it helps, here are the relevant sections of my mapred-site.xml >>>> >>>> >>>> --- >>>> >>>> mapred.tasktracker.map.tasks.maximum >>>> 3 >>>> >>>> mapred.tasktracker.reduce.tasks.maximum >>>> 3 >>>> >>>> mapred.child.java.opts >>>> -Xmx1100M -ea -XX:+HeapDumpOnOutOfMemoryError >>>> -XX:HeapDumpPath=/var/tmp/soner >>>> >>>> mapred.reduce.parallel.copies >>>> 5 >>>> >>>> tasktracker.http.threads >>>> 80 >>>> >>>> --- >>>> >>>> My jobs still complete most of the time though they occasionally fail >>>> and I'm really puzzled at this point. I'd appreciate any help or ideas. >>>> >>>> Thanks >>>> >>> >>> >> >
Re: Child Error
t com.sun.security.auth.UnixPrincipal.(UnixPrincipal.java:70) >>> at >>> com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:132) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> >>> .. >>> >>> >>> --- >>> >>> This does not always happen but I see a pattern when the intermediate >>> data is larger, it tends to occur more frequently. In the web log, I can >>> see the following: >>> >>> java.lang.Throwable: Child Error >>> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) >>> Caused by: java.io.IOException: Task process exit with nonzero status of 1. >>> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) >>> >>> From what I read online, a possible cause is when there is not enough >>> memory for all JVM's. My mapred site.xml is set up to allocate 1100MB for >>> each child and the maximum number of map and reduce tasks are set to 3 - So >>> 6600MB of the child JVMs + (500MB * 2) for the data node and task tracker >>> (as I set HADOOP_HEAP to 500 MB). I feel like memory is not the cause but I >>> couldn't avoid it so far. >>> In case it helps, here are the relevant sections of my mapred-site.xml >>> >>> >>> --- >>> >>> mapred.tasktracker.map.tasks.maximum >>> 3 >>> >>> mapred.tasktracker.reduce.tasks.maximum >>> 3 >>> >>> mapred.child.java.opts >>> -Xmx1100M -ea -XX:+HeapDumpOnOutOfMemoryError >>> -XX:HeapDumpPath=/var/tmp/soner >>> >>> mapred.reduce.parallel.copies >>> 5 >>> >>> tasktracker.http.threads >>> 80 >>> >>> --- >>> >>> My jobs still complete most of the time though they occasionally fail >>> and I'm really puzzled at this point. I'd appreciate any help or ideas. >>> >>> Thanks >>> >> >> >
Re: Child Error
Hi Jim, Which JVM are you using? I don't think you have any memory issue. Else you will have got some OOME... JM 2013/5/24 Jim Twensky > Hi again, in addition to my previous post, I was able to get some error > logs from the task tracker/data node this morning and looks like it might > be a jetty issue: > > 2013-05-23 19:59:20,595 WARN org.apache.hadoop.mapred.TaskLog: Failed to > retrieve stdout log for task: attempt_201305231647_0007_m_001096_0 > java.io.IOException: Owner 'jim' for path > /var/tmp/jim/hadoop-logs/userlogs/job_201305231647_0007/attempt_201305231647_0007_m_001096_0/stdout > did not match expected owner '10929' > at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:177) > at org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:117) > at org.apache.hadoop.mapred.TaskLog$Reader.(TaskLog.java:455) > at > org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:81) > at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:296) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) > at > org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:848) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > > > I am wondering if I am hitting > MAPREDUCE-2389<https://issues.apache.org/jira/browse/MAPREDUCE-2389>If so, > how do I downgrade my jetty version? Should I just replace the jetty > jar file in the lib directory with an earlier version and restart my > cluster? > > Thank you. > > > > > On Thu, May 23, 2013 at 7:14 PM, Jim Twensky wrote: > >> Hello, I have a 20 node Hadoop cluster where each node has 8GB memory and >> an 8-core processor. I sometimes get the following error on a random basis: >> >> >> >> --- >> >> Exception in thread "main" java.io.IOException: Exception reading >> file:/var/tmp/jim/hadoop-jim/mapred/local/taskTracker/jim/jobcache/job_201305231647_0005/jobToken >> at >> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:135) >> at >> org.apache.hadoop.mapreduce.security.TokenCache.loadTokens(TokenCache.java:165) >> at org.apache.hadoop.mapred.Child.main(Child.java:92) >> Caused by: java.io.IOException: failure to login >> at >> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:501) >> at >> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:463) >> at >> org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:1519) >> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1420) >> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) >> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) >> at >> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:129) >> ... 2 more >> Caused by: javax.security.auth.login.LoginException: >> java.lang.NullPointerException: invalid null input: name >> at com.sun.security.auth.UnixPrincipal.(UnixPrincipal.java:70) >> at >> com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:132) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> >> .. >> >> >> --- >> >> This does not always happen but I see a pattern when the intermediate >> data is larger, it tends to occur more frequently. In the web log, I can >> see the following: >> >> java.lang.Throwable: Child Error >> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) >> Caused by: java.io.IOException: Task process exit with nonzero status of 1. >> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) >> >> Fr
Re: Child Error
Hi again, in addition to my previous post, I was able to get some error logs from the task tracker/data node this morning and looks like it might be a jetty issue: 2013-05-23 19:59:20,595 WARN org.apache.hadoop.mapred.TaskLog: Failed to retrieve stdout log for task: attempt_201305231647_0007_m_001096_0 java.io.IOException: Owner 'jim' for path /var/tmp/jim/hadoop-logs/userlogs/job_201305231647_0007/attempt_201305231647_0007_m_001096_0/stdout did not match expected owner '10929' at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:177) at org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:117) at org.apache.hadoop.mapred.TaskLog$Reader.(TaskLog.java:455) at org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:81) at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:296) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:848) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) I am wondering if I am hitting MAPREDUCE-2389<https://issues.apache.org/jira/browse/MAPREDUCE-2389>If so, how do I downgrade my jetty version? Should I just replace the jetty jar file in the lib directory with an earlier version and restart my cluster? Thank you. On Thu, May 23, 2013 at 7:14 PM, Jim Twensky wrote: > Hello, I have a 20 node Hadoop cluster where each node has 8GB memory and > an 8-core processor. I sometimes get the following error on a random basis: > > > > --- > > Exception in thread "main" java.io.IOException: Exception reading > file:/var/tmp/jim/hadoop-jim/mapred/local/taskTracker/jim/jobcache/job_201305231647_0005/jobToken > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:135) > at > org.apache.hadoop.mapreduce.security.TokenCache.loadTokens(TokenCache.java:165) > at org.apache.hadoop.mapred.Child.main(Child.java:92) > Caused by: java.io.IOException: failure to login > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:501) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:463) > at > org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:1519) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1420) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:129) > ... 2 more > Caused by: javax.security.auth.login.LoginException: > java.lang.NullPointerException: invalid null input: name > at com.sun.security.auth.UnixPrincipal.(UnixPrincipal.java:70) > at > com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:132) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > .. > > > --- > > This does not always happen but I see a pattern when the intermediate data > is larger, it tends to occur more frequently. In the web log, I can see the > following: > > java.lang.Throwable: Child Error > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) > Caused by: java.io.IOException: Task process exit with nonzero status of 1. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) > > From what I read online, a possible cause is when there is not enough > memory for all JVM's. My mapred site.xml is set up to allocate 1100MB for > each child and the maximum number of map and reduce tasks are set to 3 - So > 6600MB of the child JVMs + (500MB * 2) for the data node and task tracker > (as I set HADOOP_HEAP to 500 MB). I feel like memory is not the cause but I > couldn't avoid it so far. > In case it helps, here are the relevant sections of my mapred-site.xml > > > ---
Child Error
Hello, I have a 20 node Hadoop cluster where each node has 8GB memory and an 8-core processor. I sometimes get the following error on a random basis: --- Exception in thread "main" java.io.IOException: Exception reading file:/var/tmp/jim/hadoop-jim/mapred/local/taskTracker/jim/jobcache/job_201305231647_0005/jobToken at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:135) at org.apache.hadoop.mapreduce.security.TokenCache.loadTokens(TokenCache.java:165) at org.apache.hadoop.mapred.Child.main(Child.java:92) Caused by: java.io.IOException: failure to login at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:501) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:463) at org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:1519) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1420) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:129) ... 2 more Caused by: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name at com.sun.security.auth.UnixPrincipal.(UnixPrincipal.java:70) at com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:132) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) .. --- This does not always happen but I see a pattern when the intermediate data is larger, it tends to occur more frequently. In the web log, I can see the following: java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by: java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) >From what I read online, a possible cause is when there is not enough memory for all JVM's. My mapred site.xml is set up to allocate 1100MB for each child and the maximum number of map and reduce tasks are set to 3 - So 6600MB of the child JVMs + (500MB * 2) for the data node and task tracker (as I set HADOOP_HEAP to 500 MB). I feel like memory is not the cause but I couldn't avoid it so far. In case it helps, here are the relevant sections of my mapred-site.xml --- mapred.tasktracker.map.tasks.maximum 3 mapred.tasktracker.reduce.tasks.maximum 3 mapred.child.java.opts -Xmx1100M -ea -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/tmp/soner mapred.reduce.parallel.copies 5 tasktracker.http.threads 80 --- My jobs still complete most of the time though they occasionally fail and I'm really puzzled at this point. I'd appreciate any help or ideas. Thanks
Re: Child error
10x On Wed, Mar 13, 2013 at 1:56 PM, Azuryy Yu wrote: > dont wait patch, its a very simple fix. just do it. > On Mar 13, 2013 5:04 PM, "Amit Sela" wrote: > >> But the patch will work on 1.0.4 correct ? >> >> On Wed, Mar 13, 2013 at 4:57 AM, George Datskos < >> george.dats...@jp.fujitsu.com> wrote: >> >>> Leo >>> >>> That JIRA says "fix version=1.0.4" but it is not correct. The real JIRA >>> is MAPREDUCE-2374. >>> >>> The actual fix version for this bug 1.1.2 >>> >>> >>> George >>> >>> >>> or https://issues.apache.org/jira/browse/MAPREDUCE-4857 >>> >>> Which is fixed in 1.0.4 >>> >>> ** ** >>> >>> ** ** >>> >>> *From:* Amit Sela [mailto:am...@infolinks.com ] >>> *Sent:* Tuesday, March 12, 2013 5:08 AM >>> *To:* user@hadoop.apache.org >>> *Subject:* Re: Child error >>> >>> ** ** >>> >>> Hi Jean-Marc, >>> >>> I am running Hadoop 1.0.3, and I did see this issue you've mentioned but >>> the exit status in the issue is 126 and sometimes I get 255. >>> >>> Any ideas what do theses status codes mean ? >>> >>> Did you suffer this issue and upgraded to 1.0.4 ? If so, How "smooth" is >>> such upgrade (shouldn't differ from 1.0.3 that much no ?) >>> >>> ** ** >>> >>> Thanks! >>> >>> ** ** >>> >>> ** ** >>> >>> On Tue, Mar 12, 2013 at 1:40 PM, Jean-Marc Spaggiari < >>> jean-m...@spaggiari.org> wrote: >>> >>> Hi Amit, >>> >>> Which Hadoop version are you using? >>> >>> I have been told it's because of >>> https://issues.apache.org/jira/browse/MAPREDUCE-2374 >>> >>> JM >>> >>> 2013/3/12 Amit Sela : >>> >>> > Hi all, >>> > >>> > I have a weird failure occurring every now and then during a MapReduce >>> job. >>> > >>> > This is the error: >>> > >>> > java.lang.Throwable: Child Error >>> > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) >>> > Caused by: java.io.IOException: Task process exit with nonzero status >>> of >>> > 255. >>> > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) >>> > >>> > And sometimes it's the same but with status of 126. >>> > >>> > Any ideas ? >>> > >>> > Thanks. >>> >>> ** ** >>> >>> >>> >>
Re: Child error
dont wait patch, its a very simple fix. just do it. On Mar 13, 2013 5:04 PM, "Amit Sela" wrote: > But the patch will work on 1.0.4 correct ? > > On Wed, Mar 13, 2013 at 4:57 AM, George Datskos < > george.dats...@jp.fujitsu.com> wrote: > >> Leo >> >> That JIRA says "fix version=1.0.4" but it is not correct. The real JIRA >> is MAPREDUCE-2374. >> >> The actual fix version for this bug 1.1.2 >> >> >> George >> >> >> or https://issues.apache.org/jira/browse/MAPREDUCE-4857 >> >> Which is fixed in 1.0.4 >> >> ** ** >> >> ** ** >> >> *From:* Amit Sela [mailto:am...@infolinks.com ] >> *Sent:* Tuesday, March 12, 2013 5:08 AM >> *To:* user@hadoop.apache.org >> *Subject:* Re: Child error >> >> ** ** >> >> Hi Jean-Marc, >> >> I am running Hadoop 1.0.3, and I did see this issue you've mentioned but >> the exit status in the issue is 126 and sometimes I get 255. >> >> Any ideas what do theses status codes mean ? >> >> Did you suffer this issue and upgraded to 1.0.4 ? If so, How "smooth" is >> such upgrade (shouldn't differ from 1.0.3 that much no ?) >> >> ** ** >> >> Thanks! >> >> ** ** >> >> ** ** >> >> On Tue, Mar 12, 2013 at 1:40 PM, Jean-Marc Spaggiari < >> jean-m...@spaggiari.org> wrote: >> >> Hi Amit, >> >> Which Hadoop version are you using? >> >> I have been told it's because of >> https://issues.apache.org/jira/browse/MAPREDUCE-2374 >> >> JM >> >> 2013/3/12 Amit Sela : >> >> > Hi all, >> > >> > I have a weird failure occurring every now and then during a MapReduce >> job. >> > >> > This is the error: >> > >> > java.lang.Throwable: Child Error >> > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) >> > Caused by: java.io.IOException: Task process exit with nonzero status of >> > 255. >> > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) >> > >> > And sometimes it's the same but with status of 126. >> > >> > Any ideas ? >> > >> > Thanks. >> >> ** ** >> >> >> >
Re: Child error
yes, you are right. On Mar 13, 2013 5:04 PM, "Amit Sela" wrote: > But the patch will work on 1.0.4 correct ? > > On Wed, Mar 13, 2013 at 4:57 AM, George Datskos < > george.dats...@jp.fujitsu.com> wrote: > >> Leo >> >> That JIRA says "fix version=1.0.4" but it is not correct. The real JIRA >> is MAPREDUCE-2374. >> >> The actual fix version for this bug 1.1.2 >> >> >> George >> >> >> or https://issues.apache.org/jira/browse/MAPREDUCE-4857 >> >> Which is fixed in 1.0.4 >> >> ** ** >> >> ** ** >> >> *From:* Amit Sela [mailto:am...@infolinks.com ] >> *Sent:* Tuesday, March 12, 2013 5:08 AM >> *To:* user@hadoop.apache.org >> *Subject:* Re: Child error >> >> ** ** >> >> Hi Jean-Marc, >> >> I am running Hadoop 1.0.3, and I did see this issue you've mentioned but >> the exit status in the issue is 126 and sometimes I get 255. >> >> Any ideas what do theses status codes mean ? >> >> Did you suffer this issue and upgraded to 1.0.4 ? If so, How "smooth" is >> such upgrade (shouldn't differ from 1.0.3 that much no ?) >> >> ** ** >> >> Thanks! >> >> ** ** >> >> ** ** >> >> On Tue, Mar 12, 2013 at 1:40 PM, Jean-Marc Spaggiari < >> jean-m...@spaggiari.org> wrote: >> >> Hi Amit, >> >> Which Hadoop version are you using? >> >> I have been told it's because of >> https://issues.apache.org/jira/browse/MAPREDUCE-2374 >> >> JM >> >> 2013/3/12 Amit Sela : >> >> > Hi all, >> > >> > I have a weird failure occurring every now and then during a MapReduce >> job. >> > >> > This is the error: >> > >> > java.lang.Throwable: Child Error >> > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) >> > Caused by: java.io.IOException: Task process exit with nonzero status of >> > 255. >> > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) >> > >> > And sometimes it's the same but with status of 126. >> > >> > Any ideas ? >> > >> > Thanks. >> >> ** ** >> >> >> >
Re: Child error
But the patch will work on 1.0.4 correct ? On Wed, Mar 13, 2013 at 4:57 AM, George Datskos < george.dats...@jp.fujitsu.com> wrote: > Leo > > That JIRA says "fix version=1.0.4" but it is not correct. The real JIRA > is MAPREDUCE-2374. > > The actual fix version for this bug 1.1.2 > > > George > > > or https://issues.apache.org/jira/browse/MAPREDUCE-4857 > > Which is fixed in 1.0.4 > > ** ** > > ** ** > > *From:* Amit Sela [mailto:am...@infolinks.com ] > *Sent:* Tuesday, March 12, 2013 5:08 AM > *To:* user@hadoop.apache.org > *Subject:* Re: Child error > > ** ** > > Hi Jean-Marc, > > I am running Hadoop 1.0.3, and I did see this issue you've mentioned but > the exit status in the issue is 126 and sometimes I get 255. > > Any ideas what do theses status codes mean ? > > Did you suffer this issue and upgraded to 1.0.4 ? If so, How "smooth" is > such upgrade (shouldn't differ from 1.0.3 that much no ?) > > ** ** > > Thanks! > > ** ** > > ** ** > > On Tue, Mar 12, 2013 at 1:40 PM, Jean-Marc Spaggiari < > jean-m...@spaggiari.org> wrote: > > Hi Amit, > > Which Hadoop version are you using? > > I have been told it's because of > https://issues.apache.org/jira/browse/MAPREDUCE-2374 > > JM > > 2013/3/12 Amit Sela : > > > Hi all, > > > > I have a weird failure occurring every now and then during a MapReduce > job. > > > > This is the error: > > > > java.lang.Throwable: Child Error > > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) > > Caused by: java.io.IOException: Task process exit with nonzero status of > > 255. > > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) > > > > And sometimes it's the same but with status of 126. > > > > Any ideas ? > > > > Thanks. > > ** ** > > >
Re: Child error
Leo That JIRA says "fix version=1.0.4" but it is not correct. The real JIRA is MAPREDUCE-2374. The actual fix version for this bug 1.1.2 George or https://issues.apache.org/jira/browse/MAPREDUCE-4857 Which is fixed in 1.0.4 *From:*Amit Sela [mailto:am...@infolinks.com] *Sent:* Tuesday, March 12, 2013 5:08 AM *To:* user@hadoop.apache.org *Subject:* Re: Child error Hi Jean-Marc, I am running Hadoop 1.0.3, and I did see this issue you've mentioned but the exit status in the issue is 126 and sometimes I get 255. Any ideas what do theses status codes mean ? Did you suffer this issue and upgraded to 1.0.4 ? If so, How "smooth" is such upgrade (shouldn't differ from 1.0.3 that much no ?) Thanks! On Tue, Mar 12, 2013 at 1:40 PM, Jean-Marc Spaggiari mailto:jean-m...@spaggiari.org>> wrote: Hi Amit, Which Hadoop version are you using? I have been told it's because of https://issues.apache.org/jira/browse/MAPREDUCE-2374 JM 2013/3/12 Amit Sela mailto:am...@infolinks.com>>: > Hi all, > > I have a weird failure occurring every now and then during a MapReduce job. > > This is the error: > > java.lang.Throwable: Child Error > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) > Caused by: java.io.IOException: Task process exit with nonzero status of > 255. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) > > And sometimes it's the same but with status of 126. > > Any ideas ? > > Thanks.
RE: Child error
or https://issues.apache.org/jira/browse/MAPREDUCE-4857 Which is fixed in 1.0.4 From: Amit Sela [mailto:am...@infolinks.com] Sent: Tuesday, March 12, 2013 5:08 AM To: user@hadoop.apache.org Subject: Re: Child error Hi Jean-Marc, I am running Hadoop 1.0.3, and I did see this issue you've mentioned but the exit status in the issue is 126 and sometimes I get 255. Any ideas what do theses status codes mean ? Did you suffer this issue and upgraded to 1.0.4 ? If so, How "smooth" is such upgrade (shouldn't differ from 1.0.3 that much no ?) Thanks! On Tue, Mar 12, 2013 at 1:40 PM, Jean-Marc Spaggiari mailto:jean-m...@spaggiari.org>> wrote: Hi Amit, Which Hadoop version are you using? I have been told it's because of https://issues.apache.org/jira/browse/MAPREDUCE-2374 JM 2013/3/12 Amit Sela mailto:am...@infolinks.com>>: > Hi all, > > I have a weird failure occurring every now and then during a MapReduce job. > > This is the error: > > java.lang.Throwable: Child Error > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) > Caused by: java.io.IOException: Task process exit with nonzero status of > 255. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) > > And sometimes it's the same but with status of 126. > > Any ideas ? > > Thanks.
Re: Child error
Hi Jean-Marc, I am running Hadoop 1.0.3, and I did see this issue you've mentioned but the exit status in the issue is 126 and sometimes I get 255. Any ideas what do theses status codes mean ? Did you suffer this issue and upgraded to 1.0.4 ? If so, How "smooth" is such upgrade (shouldn't differ from 1.0.3 that much no ?) Thanks! On Tue, Mar 12, 2013 at 1:40 PM, Jean-Marc Spaggiari < jean-m...@spaggiari.org> wrote: > Hi Amit, > > Which Hadoop version are you using? > > I have been told it's because of > https://issues.apache.org/jira/browse/MAPREDUCE-2374 > > JM > > 2013/3/12 Amit Sela : > > Hi all, > > > > I have a weird failure occurring every now and then during a MapReduce > job. > > > > This is the error: > > > > java.lang.Throwable: Child Error > > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) > > Caused by: java.io.IOException: Task process exit with nonzero status of > > 255. > > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) > > > > And sometimes it's the same but with status of 126. > > > > Any ideas ? > > > > Thanks. >
Re: Child error
Hi Amit, Which Hadoop version are you using? I have been told it's because of https://issues.apache.org/jira/browse/MAPREDUCE-2374 JM 2013/3/12 Amit Sela : > Hi all, > > I have a weird failure occurring every now and then during a MapReduce job. > > This is the error: > > java.lang.Throwable: Child Error > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) > Caused by: java.io.IOException: Task process exit with nonzero status of > 255. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) > > And sometimes it's the same but with status of 126. > > Any ideas ? > > Thanks.
Child error
Hi all, I have a weird failure occurring every now and then during a MapReduce job. This is the error: *java.lang.Throwable: Child Error* * at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)* *Caused by: java.io.IOException: Task process exit with nonzero status of 255.* * at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)* * * And sometimes it's the same but with *status of 126.* * * Any ideas ? Thanks.
Getting many Child Error : Could not reserve enough space for object heap
Dear all, I am setting up and configuring a small Hadoop cluster of 11 nodes for teaching purposes. All machines are identical, and have the following specs: - 4-core Intel(R) Xeon(R) CPU E3-1270 (3.5 GHz) - 16 GB of RAM - Debian Squeeze I use a version of Hadoop 0.20.2 packaged by Cloudera (hadoop-0.20.2-cdh3u5). The significant configuration options I changed are: - mapred.tasktracker.map.tasks.maximum : 4 - mapred.tasktracker.reduce.tasks.maximum : 2 - mapred.child.java.opts : -Xmx1500m - mapred.child.ulimit : 450 - io.sort.mb : 200 - io.sort.factor : 64 - io.file.buffer.size : 65536 - mapred.jobtracker.taskScheduler : org.apache.hadoop.mapred.FairScheduler - mapred.reduce.tasks : 10 - mapred.reduce.parallel.copies : 10 - mapred.reduce.slowstart.completed.maps : 0.8 Most of these values were taken from the "Hadoop Operations" book. My problem is the following: when running jobs on the cluster, I often get the following errors in my mappers: java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:250) Caused by: java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:237) Error occurred during initialization of VM Could not reserve enough space for object heap I had at first a ulimit of 300, and then increased it to 450, with no change. I don't understand why I get these memory errors: as I understood, each node should use 1 + 1 + 4*1.5 + 2*1.5 = 11 GB of RAM at most, leaving plenty of margin (the first 2 GB are for the TaskTracker and DataNode processes). Of course, no other software is running on these machines. The JobTracker and NameNode are on two separated machines, not part of these 11 workers. Do any of you have any advice on how I could prevent these errors from happening? All jobs run fine though, it's just that these failures slow things down a bit, and let me with the impression that I got something wrong. Are there any issues with my configuration options, given the hardware specs of my machines? Thanks in advance for any help/pointer! Cheers, Vincent
RE: Map Reduce "Child Error" task failure
Hi Matt, You are most probably seeing this https://issues.apache.org/jira/browse/MAPREDUCE-2374 There is a single line fix for this issue. See the latest patch attached to the above JIRA entry. -Shrinivas -Original Message- From: Matt Kennedy [mailto:stinkym...@gmail.com] Sent: Tuesday, August 21, 2012 2:15 PM To: user@hadoop.apache.org Subject: Map Reduce "Child Error" task failure I'm encountering a sporadic error while running MapReduce jobs, it shows up in the console output as follows: 12/08/21 14:56:05 INFO mapred.JobClient: Task Id : attempt_201208211430_0001_m_003538_0, Status : FAILED java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by: java.io.IOException: Task process exit with nonzero status of 126. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) 12/08/21 14:56:05 WARN mapred.JobClient: Error reading task outputhttp://:50060/tasklog?plaintext=true&attemptid=attempt_201208211430_0001_m_003538_0&filter=stdout 12/08/21 14:56:05 WARN mapred.JobClient: Error reading task outputhttp://:50060/tasklog?plaintext=true&attemptid=attempt_201208211430_0001_m_003538_0&filter=stderr The conditions look exactly like those described in: https://issues.apache.org/jira/browse/MAPREDUCE-4003 Unfortunately, this issue is marked as closed for Apache Hadoop version 1.0.3, but that's the version that I'm running into this issue with. There does seem to be a correlation between the frequency of these errors and the number of concurrent map tasks being executed, however the hardware resources on the cluster do not appear to be near their limits. I'm assuming that there is a knob somewhere that is maladjusted that is causing this error, however I haven't found it. I did find this discussion (https://groups.google.com/a/cloudera.org/d/topic/cdh-user/NlhvHapf3pk/discussion) on CDH users list describing the exact same problem and the advice was to increase the value of the mapred.child.ulimit setting. However, I had this value initially unset, which should mean that the value is unlimited if my research is correct. Then I set the value to 3 GB (3x my setting for mapred.map.child.java.opts) and it still did not resolve the problem. Finally, out of frustration, I just added a zero at the end and now the value is 31457280 (the unit for the setting is in KB) which is 30GB. I'm still having the problem. Is anybody else seeing this issue or have an idea for a workaround? Right now my workaround is to set the allowed failures to be very high before a tasktracker is blacklisted, but this has the unintended side effect of taking a very long time to evict legitimately messed up tasktrackers. If this error is indicative of some other configuration problem, I'd like to try to resolve it. Ideas? Or should I re-open the JIRA? Thank you for your time, Matt
Map Reduce "Child Error" task failure
I'm encountering a sporadic error while running MapReduce jobs, it shows up in the console output as follows: 12/08/21 14:56:05 INFO mapred.JobClient: Task Id : attempt_201208211430_0001_m_003538_0, Status : FAILED java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by: java.io.IOException: Task process exit with nonzero status of 126. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) 12/08/21 14:56:05 WARN mapred.JobClient: Error reading task outputhttp://:50060/tasklog?plaintext=true&attemptid=attempt_201208211430_0001_m_003538_0&filter=stdout 12/08/21 14:56:05 WARN mapred.JobClient: Error reading task outputhttp://:50060/tasklog?plaintext=true&attemptid=attempt_201208211430_0001_m_003538_0&filter=stderr The conditions look exactly like those described in: https://issues.apache.org/jira/browse/MAPREDUCE-4003 Unfortunately, this issue is marked as closed for Apache Hadoop version 1.0.3, but that's the version that I'm running into this issue with. There does seem to be a correlation between the frequency of these errors and the number of concurrent map tasks being executed, however the hardware resources on the cluster do not appear to be near their limits. I'm assuming that there is a knob somewhere that is maladjusted that is causing this error, however I haven't found it. I did find this discussion (https://groups.google.com/a/cloudera.org/d/topic/cdh-user/NlhvHapf3pk/discussion) on CDH users list describing the exact same problem and the advice was to increase the value of the mapred.child.ulimit setting. However, I had this value initially unset, which should mean that the value is unlimited if my research is correct. Then I set the value to 3 GB (3x my setting for mapred.map.child.java.opts) and it still did not resolve the problem. Finally, out of frustration, I just added a zero at the end and now the value is 31457280 (the unit for the setting is in KB) which is 30GB. I'm still having the problem. Is anybody else seeing this issue or have an idea for a workaround? Right now my workaround is to set the allowed failures to be very high before a tasktracker is blacklisted, but this has the unintended side effect of taking a very long time to evict legitimately messed up tasktrackers. If this error is indicative of some other configuration problem, I'd like to try to resolve it. Ideas? Or should I re-open the JIRA? Thank you for your time, Matt