[ 
https://issues.apache.org/jira/browse/FLINK-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jared Stehler updated FLINK-6662:
---------------------------------
    Description: 
Running flink mesos on 1.3-release branch, I'm seeing the following error on 
appmaster startup:

2017-05-22 15:32:45.946 [flink-akka.actor.default-dispatcher-17] WARN  
o.a.flink.mesos.runtime.clusterframework.MesosJobManager  - Failed to recover 
job 088027410f1a628e7dfc59dc23df3ded.
java.lang.Exception: Failed to retrieve the submitted job graph from state 
handle.
        at 
org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:186)
        at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(JobManager.scala:536)
        at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$1$$anonfun$apply$mcV$sp$1.apply(JobManager.scala:533)
        at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$1$$anonfun$apply$mcV$sp$1.apply(JobManager.scala:533)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$1.apply$mcV$sp(JobManager.scala:533)
        at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$1.apply(JobManager.scala:529)
        at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$1.apply(JobManager.scala:529)
        at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
        at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
        at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
        at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ClassNotFoundException: 
org.apache.flink.runtime.jobgraph.tasks.JobSnapshottingSettings
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at 
org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:64)
        at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1826)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2000)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
        at 
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:305)
        at 
org.apache.flink.runtime.state.RetrievableStreamStateHandle.retrieveState(RetrievableStreamStateHandle.java:58)
        at 
org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:184)
        ... 15 common frames omitted


  was:
Running flink mesos on 1.3-release branch, I'm seeing the following error on 
appmaster startup:

sLast login: Sun May 21 19:03:05 on ttys005
sg 10.80.54.119%                                                                
                                                                                
                                                                                
                            
 ~/dev/scratch/flink   release-1.3 ●  ssg 10.80.54.119                       
                                                                                
                                                                      ✓  
11436  12:00:16 
zsh: command not found: ssg
 ~/dev/scratch/flink   release-1.3 ●  ssh 10.80.54.119                       
                                                                                
                                                                  127 ↵  
11437  12:00:16 
Welcome to Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-117-generic x86_64)

 * Documentation:  https://help.ubuntu.com/

  System information as of Mon May 22 15:00:39 UTC 2017

  System load:  0.0               Processes:              159
  Usage of /:   68.1% of 7.74GB   Users logged in:        0
  Memory usage: 20%               IP address for eth0:    10.80.54.119
  Swap usage:   0%                IP address for docker0: 172.17.0.1

  Graph this data and manage this system at:
    https://landscape.canonical.com/

  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud

31 packages can be updated.
27 updates are security updates.

New release '16.04.2 LTS' available.
Run 'do-release-upgrade' to upgrade to it.


Last login: Sun May 21 18:44:31 2017 from 
ip-10-80-48-143.us-west-2.compute.internal
ubuntu@ip-10-80-54-119:~$ cd /mnt/mesos/
docker/               logs/                 lost+found/           
singularity-executor/ work/                 
ubuntu@ip-10-80-54-119:~$ cd /mnt/mesos/work/
meta/        provisioner/ slaves/      
ubuntu@ip-10-80-54-119:~$ cd 
/mnt/mesos/work/slaves/e87689b9-a83a-449b-9e6b-3e339ead141a-S13/
docker/     frameworks/ 
ubuntu@ip-10-80-54-119:~$ cd 
/mnt/mesos/work/slaves/e87689b9-a83a-449b-9e6b-3e339ead141a-S13/frameworks/
e87689b9-a83a-449b-9e6b-3e339ead141a-0004/ 
e87689b9-a83a-449b-9e6b-3e339ead141a-0006/ 
e87689b9-a83a-449b-9e6b-3e339ead141a-0008/ Singularity/                         
      
e87689b9-a83a-449b-9e6b-3e339ead141a-0005/ 
e87689b9-a83a-449b-9e6b-3e339ead141a-0007/ 
e87689b9-a83a-449b-9e6b-3e339ead141a-0009/ 
ubuntu@ip-10-80-54-119:~$ cd 
/mnt/mesos/work/slaves/e87689b9-a83a-449b-9e6b-3e339ead141a-S13/frameworks/Singularity/executors/
1vn1/ 3un1/ 4un1/ 6tn1/ 6un1/ asn1/ cvn1/ hun1/ iin1/ isn1/ jsn1/ jtn1/ ltn1/ 
ntn1/ osn1/ smn1/ tsn1/ won1/ 
ubuntu@ip-10-80-54-119:~$ cd 
/mnt/mesos/work/slaves/e87689b9-a83a-449b-9e6b-3e339ead141a-S13/frameworks/Singularity/executors/
ubuntu@ip-10-80-54-119:/mnt/mesos/work/slaves/e87689b9-a83a-449b-9e6b-3e339ead141a-S13/frameworks/Singularity/executors$
 ls 1vn1/
runs
ubuntu@ip-10-80-54-119:/mnt/mesos/work/slaves/e87689b9-a83a-449b-9e6b-3e339ead141a-S13/frameworks/Singularity/executors$
 ls 1vn1/runs/
e24faf7e-9553-4c07-8c6a-e85acdfe88af  latest
ubuntu@ip-10-80-54-119:/mnt/mesos/work/slaves/e87689b9-a83a-449b-9e6b-3e339ead141a-S13/frameworks/Singularity/executors$
 find . -name "*1495467133685*"
find: 
`./asn1/runs/9b6627de-d545-4e34-87e6-f639d94afe47/pdf-service-1495313993-1495314044332-1-10.80.54.119-us_west_2c/logs':
 Permission denied
find: 
`./asn1/runs/9b6627de-d545-4e34-87e6-f639d94afe47/pdf-service-1495313993-1495314044332-1-10.80.54.119-us_west_2c/tmp':
 Permission denied
find: 
`./iin1/runs/b0bcea6d-4a88-4587-bf34-619aa627338d/deployinator-1494544026-1494956610579-1-10.80.54.119-us_west_2c/logs':
 Permission denied
find: 
`./iin1/runs/b0bcea6d-4a88-4587-bf34-619aa627338d/deployinator-1494544026-1494956610579-1-10.80.54.119-us_west_2c/tmp':
 Permission denied
find: 
`./ntn1/runs/6483530f-0e0a-4b27-ad89-dd0fede1e3c0/prometheus-1495389842-1495389843120-1-10.80.54.119-us_west_2c/storage':
 Permission denied
./cvn1/runs/11f8a647-1cad-4881-b30c-9f68a4aa1cc3/flink-mesos-1495467129-1495467133685-1-10.80.54.119-us_west_2c
ubuntu@ip-10-80-54-119:/mnt/mesos/work/slaves/e87689b9-a83a-449b-9e6b-3e339ead141a-S13/frameworks/Singularity/executors$
 cd 
cvn1/runs/11f8a647-1cad-4881-b30c-9f68a4aa1cc3/flink-mesos-1495467129-1495467133685-1-10.80.54.119-us_west_2c
ubuntu@ip-10-80-54-119:/mnt/mesos/work/slaves/e87689b9-a83a-449b-9e6b-3e339ead141a-S13/frameworks/Singularity/executors/cvn1/runs/11f8a647-1cad-4881-b30c-9f68a4aa1cc3/flink-mesos-1495467129-1495467133685-1-10.80.54.119-us_west_2c$
 ls -al
total 140
drwxr-xr-x 5 root root  4096 May 22 15:33 .
drwxr-xr-x 3 root root  4096 May 22 15:33 ..
drwxr-xr-x 2 root root  4096 May 22 03:07 conf
-rw-r--r-- 1 root root  3752 May 22 15:32 docker.env
-rw-r--r-- 1 root root   948 May 22 15:33 executor.bash.log
-rw-r--r-- 1 root root 12860 May 22 15:33 executor.java.log
-rw-r--r-- 1 root root   752 May 22 15:33 logrotate.status
drwxr-xr-x 2 root root  4096 May 22 15:33 logs
-rw-r--r-- 1 root root 10979 May 22 15:32 runner.sh
-rw-r--r-- 1 root root 79224 May 22 15:33 tail_of_finished_service.log
drwxr-xr-x 2 root root  4096 May 22 15:32 tmp
ubuntu@ip-10-80-54-119:/mnt/mesos/work/slaves/e87689b9-a83a-449b-9e6b-3e339ead141a-S13/frameworks/Singularity/executors/cvn1/runs/11f8a647-1cad-4881-b30c-9f68a4aa1cc3/flink-mesos-1495467129-1495467133685-1-10.80.54.119-us_west_2c$
 less tail_of_finished_service.log 

        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ClassNotFoundException: 
org.apache.flink.runtime.jobgraph.tasks.JobSnapshottingSettings
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at 
org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:64)
        at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1826)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2000)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
        at 
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:305)
        at 
org.apache.flink.runtime.state.RetrievableStreamStateHandle.retrieveState(RetrievableStreamStateHandle.java:58)
        at 
org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:184)
        ... 15 common frames omitted



> ClassNotFoundException: o.a.f.r.j.t.JobSnapshottingSettings recovering job
> --------------------------------------------------------------------------
>
>                 Key: FLINK-6662
>                 URL: https://issues.apache.org/jira/browse/FLINK-6662
>             Project: Flink
>          Issue Type: Bug
>          Components: JobManager, Mesos, State Backends, Checkpointing
>    Affects Versions: 1.3.0
>            Reporter: Jared Stehler
>
> Running flink mesos on 1.3-release branch, I'm seeing the following error on 
> appmaster startup:
> 2017-05-22 15:32:45.946 [flink-akka.actor.default-dispatcher-17] WARN  
> o.a.flink.mesos.runtime.clusterframework.MesosJobManager  - Failed to recover 
> job 088027410f1a628e7dfc59dc23df3ded.
> java.lang.Exception: Failed to retrieve the submitted job graph from state 
> handle.
>         at 
> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:186)
>         at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(JobManager.scala:536)
>         at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$1$$anonfun$apply$mcV$sp$1.apply(JobManager.scala:533)
>         at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$1$$anonfun$apply$mcV$sp$1.apply(JobManager.scala:533)
>         at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>         at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$1.apply$mcV$sp(JobManager.scala:533)
>         at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$1.apply(JobManager.scala:529)
>         at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$1.apply(JobManager.scala:529)
>         at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>         at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>         at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
>         at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
>         at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.flink.runtime.jobgraph.tasks.JobSnapshottingSettings
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:348)
>         at 
> org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:64)
>         at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1826)
>         at 
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713)
>         at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2000)
>         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>         at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>         at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>         at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>         at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>         at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>         at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
>         at 
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:305)
>         at 
> org.apache.flink.runtime.state.RetrievableStreamStateHandle.retrieveState(RetrievableStreamStateHandle.java:58)
>         at 
> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:184)
>         ... 15 common frames omitted



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to