[ 
https://issues.apache.org/jira/browse/MESOS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485801#comment-13485801
 ] 

Qinghe Jin commented on MESOS-298:
----------------------------------

Hi Ben, I have compared the FrameworkExecutor.java carefully, and it seems I 
already have the latest version of that file. 

It's not easy to reproduce it from trunk because I have made some changes to 
both mesos and FrameworkScheduler trying to support disk load balance. Now it's 
still in experiment phase. 

TASK_LOST is not the whole story. What's bothering me now is some new kinds of 
error like below:



Task Id : attempt_201210291044_0002_m_000001_0, Status : FAILED
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid 
local directory for output/spill0.out
        at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
        at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
        at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
        at 
org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:121)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1392)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)



The problem seems to be caused by lack of disk space, but I have seen a lot of 
space left. Another problem is :



Task Id : attempt_201210291044_0002_m_000007_0, Status : FAILED
java.lang.Throwable: Child Error
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:278)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:265)

java.lang.Throwable: Child Error
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:278)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:265)

12/10/29 10:47:04 WARN mapred.JobClient: Error reading task 
outputhttp://blade15:50060/tasklog?plaintext=true&attemptid=attempt_201210291044_0002_m_000007_0&filter=stdout
12/10/29 10:47:04 WARN mapred.JobClient: Error reading task 
outputhttp://blade15:50060/tasklog?plaintext=true&attemptid=attempt_201210291044_0002_m_000007_0&filter=stderr
12/10/29 10:47:04 INFO mapred.JobClient: Task Id : 
attempt_201210291044_0002_r_000000_0, Status : FAILED

A new week, It looks like I have a lot to do next. 




                
> Executor fails to start
> -----------------------
>
>                 Key: MESOS-298
>                 URL: https://issues.apache.org/jira/browse/MESOS-298
>             Project: Mesos
>          Issue Type: Question
>          Components: framework, slave
>    Affects Versions: 0.9.0
>         Environment: open Suse 11.0
>            Reporter: Qinghe Jin
>
> When the master asks the hadoop executor to start, the executor could be 
> forked successfully but fails quickly which result in the TASK_LOST. The 
> output in **/executors/default/runs/id/stderr looks like below:
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/mesos/Executor 
>         at java.lang.ClassLoader.defineClass1(Native Method)
>         at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
>         at java.lang.ClassLoader.defineClass(ClassLoader.java:616)
>         at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
>         at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
>         at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
> Caused by: java.lang.ClassNotFoundException: org.apache.mesos.Executor
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>         ... 12 more
> Could not find the main class: org.apache.hadoop.mapred.FrameworkExecutor.  
> Program will exit.
> I know the reason is the caller can't find the org.apache.mesos.Executor, and 
> I have found the class in mesos-0.9.0.jar, and I am sure that it can find 
> it(if not, the jobtracker will fail to start). But each time I run it, the 
> executor fails quickly.
> I am not familiar with java, so I tried all ways I can get from google, but 
> still can't fix it. I have been suffered from it for almost one week. Anyone 
> can help? I appreciate it very much! Thanks ahead!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to