Re: Job fails to start with S3 savepoint

2017-03-21 Thread Stephan Ewen
Not sure if that is the issue here, but the code does a "fs.exists()" for
the path. We should not do this anywhere in all checkpointing related code,
because this fails frequently with s3 due to its consistency model.

This should eventually succeed though, so I think it is not the problem
here.



On Mon, Mar 20, 2017 at 7:04 PM, Bajaj, Abhinav 
wrote:

> Hi Ufuk,
>
>
>
> Realized I sent an incomplete mail. Continuing my previous reply here.
>
>
>
> Thanks for hint on the region. The bucket is in eu-west-1 region.
>
>
>
> The flink configuration for checkpoints and savepoints is as below –
>
> state.backend.fs.checkpointdir: s3://flink-bucket/flink-checkpoints
>
> state.savepoints.dir:   s3://flink-bucket/flink-savepoints
>
>
>
> No region specified in the s3 urls above. But the savepoint was created
> successfully.
>
>
>
> When using the monitoring REST API, the cancel-with-savepoint API returned
> the savepoint path – “s3://flink-bucket/flink-savepoints/savepoint-
> 0eba6ba712d2”.
>
> I used the same savepoint path while submitting a new job.
>
>
>
> I am wondering why there would be a difference in behavior between
> creating and reading the savepoint.
>
>
>
> I meantime, I will try updating the s3 urls to reflect the region and
> update here.
>
>
>
> Thanks,
>
> Abhinav
>
>
>
>
>
>
>
> *A**b**hinav Bajaj*
>
> Lead Engineer
>
> HERE Predictive Analytics
>
> Office:  +12062092767 <+1%20206-209-2767>
>
> Mobile: +17083299516 <+1%20708-329-9516>
>
> *HERE Seattle*
>
> 701 Pike Street, #2000, Seattle, WA 98101, USA
>
> *47° 36' 41" N. 122° 19' 57" W*
>
> *HERE Maps*
>
>
>
>
>
>
>
> *From: *"Bajaj, Abhinav" 
> *Date: *Monday, March 20, 2017 at 10:46 AM
> *To: *"user@flink.apache.org" , "u...@apache.org" <
> u...@apache.org>
>
> *Subject: *Re: Job fails to start with S3 savepoint
>
>
>
> Hi Ufuk,
>
>
>
> Thanks for replying.
>
> The savepoint path is correct and it exists.
>
> FYI, I used the monitoring REST APIs to cancel the job with savepoint.
>
>
>
>
>
> *A**b**hinav Bajaj*
>
> Lead Engineer
>
> HERE Predictive Analytics
>
> Office:  +12062092767 <+1%20206-209-2767>
>
> Mobile: +17083299516 <+1%20708-329-9516>
>
> *HERE Seattle*
>
> 701 Pike Street, #2000, Seattle, WA 98101, USA
>
> *47° 36' 41" N. 122° 19' 57" W*
>
> *HERE Maps*
>
>
>
>
>
>
>
> *From: *Ufuk Celebi 
> *Reply-To: *"user@flink.apache.org" 
> *Date: *Monday, March 20, 2017 at 2:41 AM
> *To: *"user@flink.apache.org" 
> *Subject: *Re: Job fails to start with S3 savepoint
>
>
>
> Hey Abhinav,
>
>
>
> the Exception is thrown if the S3 object does not exist.
>
>
>
> Can you double check that it actually does exist (no typos, etc.)?
>
>
>
> Could this be related to accessing a different region than expected?
>
>
>
> – Ufuk
>
>
>
>
>
> On Mon, Mar 20, 2017 at 9:38 AM, Timo Walther  wrote:
>
> Hi Abhinav,
>
> can you check if you have configured your AWS setup correctly? The S3
> configuration might be missing.
>
> https://ci.apache.org/projects/flink/flink-docs-
> release-1.3/setup/aws.html#missing-s3-filesystem-configuration
> <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci.apache.org%2Fprojects%2Fflink%2Fflink-docs-release-1.3%2Fsetup%2Faws.html%23missing-s3-filesystem-configuration&data=01%7C01%7C%7Cd049ebd9730747304e0208d46f756af4%7C6d4034cd72254f72b85391feaea64919%7C1&sdata=QqbkwWAbOwb97ckBio4Ty7nL7XJC598mYVeN5nr6HXo%3D&reserved=0>
>
> Regards,
> Timo
>
>
> Am 18/03/17 um 01:33 schrieb Bajaj, Abhinav:
>
> Hi,
>
>
>
> I am trying to explore using S3 for storing checkpoints and savepoints.
>
> I can get Flink to store the checkpoints and savepoints in s3.
>
>
>
> However, when I try to submit the same Job using the stored savepoint, it
> fails with below exception.
>
> I am using Flink 1.2 and submitted the job from the UI dashboard.
>
>
>
> Can anyone guide me through this issue?
>
>
>
> Thanks,
>
> Abhinav
>
>
>
> *Jobmanager logs with exception* –
>
>
>
> 2017-03-18 00:10:09,193 INFO  org.apache.flink.runtime.blob.
> BlobClient   - Blob client connecting to
> akka://flink/user/jobmanager
>
> 2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.
> client.JobClient  - Checking and uploading JAR files
>
>

Re: Job fails to start with S3 savepoint

2017-03-20 Thread Bajaj, Abhinav
Hi Ufuk,

Realized I sent an incomplete mail. Continuing my previous reply here.

Thanks for hint on the region. The bucket is in eu-west-1 region.

The flink configuration for checkpoints and savepoints is as below –
state.backend.fs.checkpointdir: s3://flink-bucket/flink-checkpoints
state.savepoints.dir:   s3://flink-bucket/flink-savepoints

No region specified in the s3 urls above. But the savepoint was created 
successfully.

When using the monitoring REST API, the cancel-with-savepoint API returned the 
savepoint path – “s3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2”.
I used the same savepoint path while submitting a new job.

I am wondering why there would be a difference in behavior between creating and 
reading the savepoint.

I meantime, I will try updating the s3 urls to reflect the region and update 
here.

Thanks,
Abhinav



[cid:image001.png@01D2A169.CB504880]

Abhinav Bajaj
Lead Engineer
HERE Predictive Analytics
Office:  +12062092767
Mobile: +17083299516

HERE Seattle
701 Pike Street, #2000, Seattle, WA 98101, USA
47° 36' 41" N. 122° 19' 57" W
HERE Maps




From: "Bajaj, Abhinav" 
Date: Monday, March 20, 2017 at 10:46 AM
To: "user@flink.apache.org" , "u...@apache.org" 

Subject: Re: Job fails to start with S3 savepoint

Hi Ufuk,

Thanks for replying.
The savepoint path is correct and it exists.
FYI, I used the monitoring REST APIs to cancel the job with savepoint.


[cid:image002.png@01D2A169.CB504880]

Abhinav Bajaj
Lead Engineer
HERE Predictive Analytics
Office:  +12062092767
Mobile: +17083299516

HERE Seattle
701 Pike Street, #2000, Seattle, WA 98101, USA
47° 36' 41" N. 122° 19' 57" W
HERE Maps




From: Ufuk Celebi 
Reply-To: "user@flink.apache.org" 
Date: Monday, March 20, 2017 at 2:41 AM
To: "user@flink.apache.org" 
Subject: Re: Job fails to start with S3 savepoint

Hey Abhinav,

the Exception is thrown if the S3 object does not exist.

Can you double check that it actually does exist (no typos, etc.)?

Could this be related to accessing a different region than expected?

– Ufuk


On Mon, Mar 20, 2017 at 9:38 AM, Timo Walther 
mailto:twal...@apache.org>> wrote:
Hi Abhinav,

can you check if you have configured your AWS setup correctly? The S3 
configuration might be missing.

https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#missing-s3-filesystem-configuration<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci.apache.org%2Fprojects%2Fflink%2Fflink-docs-release-1.3%2Fsetup%2Faws.html%23missing-s3-filesystem-configuration&data=01%7C01%7C%7Cd049ebd9730747304e0208d46f756af4%7C6d4034cd72254f72b85391feaea64919%7C1&sdata=QqbkwWAbOwb97ckBio4Ty7nL7XJC598mYVeN5nr6HXo%3D&reserved=0>

Regards,
Timo


Am 18/03/17 um 01:33 schrieb Bajaj, Abhinav:
Hi,

I am trying to explore using S3 for storing checkpoints and savepoints.
I can get Flink to store the checkpoints and savepoints in s3.

However, when I try to submit the same Job using the stored savepoint, it fails 
with below exception.
I am using Flink 1.2 and submitted the job from the UI dashboard.

Can anyone guide me through this issue?

Thanks,
Abhinav

Jobmanager logs with exception –

2017-03-18 00:10:09,193 INFO  org.apache.flink.runtime.blob.BlobClient  
 - Blob client connecting to akka://flink/user/jobmanager
2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.client.JobClient 
 - Checking and uploading JAR files
2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.blob.BlobClient  
 - Blob client connecting to akka://flink/user/jobmanager
2017-03-18 00:10:09,501 INFO  org.apache.flink.yarn.YarnJobManager  
 - Submitting job 4425245091bea9ad103dd3ff338244bb (Session Counter 
Example).
2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager  
 - Using restart strategy NoRestartStrategy for 
4425245091bea9ad103dd3ff338244bb.
2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager  
 - Running initialization on master for job Session Counter Example 
(4425245091bea9ad103dd3ff338244bb).
2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager  
 - Successfully ran initialization on master in 0 ms.
2017-03-18 00:10:09,503 INFO  org.apache.flink.yarn.YarnJobManager  
 - Starting job from savepoint 
's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.
2017-03-18 00:10:09,636 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Session 
Counter Example (4425245091bea9ad103dd3ff338244bb) switched from state CREATED 
to FAILING.
org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable 
failure. This suppresses job restarts. Please check the stack trace for the 
root cause.
at 
org.

Re: Job fails to start with S3 savepoint

2017-03-20 Thread Bajaj, Abhinav
Hi Ufuk,

Thanks for replying.
The savepoint path is correct and it exists.
FYI, I used the monitoring REST APIs to cancel the job with savepoint.


[cid:image001.png@01D2A167.3E215140]

Abhinav Bajaj
Lead Engineer
HERE Predictive Analytics
Office:  +12062092767
Mobile: +17083299516

HERE Seattle
701 Pike Street, #2000, Seattle, WA 98101, USA
47° 36' 41" N. 122° 19' 57" W
HERE Maps




From: Ufuk Celebi 
Reply-To: "user@flink.apache.org" 
Date: Monday, March 20, 2017 at 2:41 AM
To: "user@flink.apache.org" 
Subject: Re: Job fails to start with S3 savepoint

Hey Abhinav,

the Exception is thrown if the S3 object does not exist.

Can you double check that it actually does exist (no typos, etc.)?

Could this be related to accessing a different region than expected?

– Ufuk


On Mon, Mar 20, 2017 at 9:38 AM, Timo Walther 
mailto:twal...@apache.org>> wrote:
Hi Abhinav,

can you check if you have configured your AWS setup correctly? The S3 
configuration might be missing.

https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#missing-s3-filesystem-configuration<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci.apache.org%2Fprojects%2Fflink%2Fflink-docs-release-1.3%2Fsetup%2Faws.html%23missing-s3-filesystem-configuration&data=01%7C01%7C%7Cd049ebd9730747304e0208d46f756af4%7C6d4034cd72254f72b85391feaea64919%7C1&sdata=QqbkwWAbOwb97ckBio4Ty7nL7XJC598mYVeN5nr6HXo%3D&reserved=0>

Regards,
Timo


Am 18/03/17 um 01:33 schrieb Bajaj, Abhinav:
Hi,

I am trying to explore using S3 for storing checkpoints and savepoints.
I can get Flink to store the checkpoints and savepoints in s3.

However, when I try to submit the same Job using the stored savepoint, it fails 
with below exception.
I am using Flink 1.2 and submitted the job from the UI dashboard.

Can anyone guide me through this issue?

Thanks,
Abhinav

Jobmanager logs with exception –

2017-03-18 00:10:09,193 INFO  org.apache.flink.runtime.blob.BlobClient  
 - Blob client connecting to akka://flink/user/jobmanager
2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.client.JobClient 
 - Checking and uploading JAR files
2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.blob.BlobClient  
 - Blob client connecting to akka://flink/user/jobmanager
2017-03-18 00:10:09,501 INFO  org.apache.flink.yarn.YarnJobManager  
 - Submitting job 4425245091bea9ad103dd3ff338244bb (Session Counter 
Example).
2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager  
 - Using restart strategy NoRestartStrategy for 
4425245091bea9ad103dd3ff338244bb.
2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager  
 - Running initialization on master for job Session Counter Example 
(4425245091bea9ad103dd3ff338244bb).
2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager  
 - Successfully ran initialization on master in 0 ms.
2017-03-18 00:10:09,503 INFO  org.apache.flink.yarn.YarnJobManager  
 - Starting job from savepoint 
's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.
2017-03-18 00:10:09,636 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Session 
Counter Example (4425245091bea9ad103dd3ff338244bb) switched from state CREATED 
to FAILING.
org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable 
failure. This suppresses job restarts. Please check the stack trace for the 
root cause.
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at 
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalArgumentException: Invalid path 
'

Re: Job fails to start with S3 savepoint

2017-03-20 Thread Ufuk Celebi
Hey Abhinav,

the Exception is thrown if the S3 object does not exist.

Can you double check that it actually does exist (no typos, etc.)?

Could this be related to accessing a different region than expected?

– Ufuk


On Mon, Mar 20, 2017 at 9:38 AM, Timo Walther  wrote:

> Hi Abhinav,
>
> can you check if you have configured your AWS setup correctly? The S3
> configuration might be missing.
>
> https://ci.apache.org/projects/flink/flink-docs-
> release-1.3/setup/aws.html#missing-s3-filesystem-configuration
>
> Regards,
> Timo
>
>
> Am 18/03/17 um 01:33 schrieb Bajaj, Abhinav:
>
> Hi,
>
>
>
> I am trying to explore using S3 for storing checkpoints and savepoints.
>
> I can get Flink to store the checkpoints and savepoints in s3.
>
>
>
> However, when I try to submit the same Job using the stored savepoint, it
> fails with below exception.
>
> I am using Flink 1.2 and submitted the job from the UI dashboard.
>
>
>
> Can anyone guide me through this issue?
>
>
>
> Thanks,
>
> Abhinav
>
>
>
> *Jobmanager logs with exception* –
>
>
>
> 2017-03-18 00:10:09,193 INFO  org.apache.flink.runtime.blob.
> BlobClient   - Blob client connecting to
> akka://flink/user/jobmanager
>
> 2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.
> client.JobClient  - Checking and uploading JAR files
>
> 2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.blob.
> BlobClient   - Blob client connecting to
> akka://flink/user/jobmanager
>
> 2017-03-18 00:10:09,501 INFO  org.apache.flink.yarn.
> YarnJobManager   - Submitting job
> 4425245091bea9ad103dd3ff338244bb (Session Counter Example).
>
> 2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.
> YarnJobManager   - Using restart strategy
> NoRestartStrategy for 4425245091bea9ad103dd3ff338244bb.
>
> 2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.
> YarnJobManager   - Running initialization on
> master for job Session Counter Example (4425245091bea9ad103dd3ff338244bb).
>
> 2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.
> YarnJobManager   - Successfully ran
> initialization on master in 0 ms.
>
> 2017-03-18 00:10:09,503 INFO  org.apache.flink.yarn.
> YarnJobManager   - Starting job from savepoint
> 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.
>
> 2017-03-18 00:10:09,636 INFO  org.apache.flink.runtime.
> executiongraph.ExecutionGraph - Job Session Counter Example (
> 4425245091bea9ad103dd3ff338244bb) switched from state CREATED to FAILING.
>
> org.apache.flink.runtime.execution.SuppressRestartsException:
> Unrecoverable failure. This suppresses job restarts. Please check the stack
> trace for the root cause.
>
> at org.apache.flink.runtime.jobmanager.JobManager$$
> anonfun$org$apache$flink$runtime$jobmanager$JobManager$
> $submitJob$1.apply$mcV$sp(JobManager.scala:1369)
>
> at org.apache.flink.runtime.jobmanager.JobManager$$
> anonfun$org$apache$flink$runtime$jobmanager$JobManager$
> $submitJob$1.apply(JobManager.scala:1330)
>
> at org.apache.flink.runtime.jobmanager.JobManager$$
> anonfun$org$apache$flink$runtime$jobmanager$JobManager$
> $submitJob$1.apply(JobManager.scala:1330)
>
> at scala.concurrent.impl.Future$PromiseCompletingRunnable.
> liftedTree1$1(Future.scala:24)
>
> at scala.concurrent.impl.Future$
> PromiseCompletingRunnable.run(Future.scala:24)
>
> at akka.dispatch.TaskInvocation.
> run(AbstractDispatcher.scala:40)
>
> at akka.dispatch.ForkJoinExecutorConfigurator$
> AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
>
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(
> ForkJoinTask.java:260)
>
> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> runTask(ForkJoinPool.java:1339)
>
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
> ForkJoinPool.java:1979)
>
> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
> ForkJoinWorkerThread.java:107)
>
> Caused by: java.lang.IllegalArgumentException: Invalid path
> 's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.
>
> at org.apache.flink.runtime.checkpoint.savepoint.
> SavepointStore.createFsInputStream(SavepointStore.java:182)
>
> at org.apache.flink.runtime.checkpoint.savepoint.
> SavepointStore.loadSavepoint(SavepointStore.java:131)
>
> at org.apache.flink.runtime.checkpoint.savepoint.
> SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)
>
> at org.apache.flink.runtime.jobmanager.JobManager$$
> anonfun$org$apache$flink$runtime$jobmanager$JobManager$
> $submitJob$1.apply$mcV$sp(JobManager.scala:1348)
>
> ... 10 more
>
> 2017-03-18 00:10:09,638 INFO  org.apache.flink.runtime.
> executiongraph.ExecutionGraph - Source:

Re: Job fails to start with S3 savepoint

2017-03-20 Thread Timo Walther

Hi Abhinav,

can you check if you have configured your AWS setup correctly? The S3 
configuration might be missing.


https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#missing-s3-filesystem-configuration

Regards,
Timo


Am 18/03/17 um 01:33 schrieb Bajaj, Abhinav:


Hi,

I am trying to explore using S3 for storing checkpoints and savepoints.

I can get Flink to store the checkpoints and savepoints in s3.

However, when I try to submit the same Job using the stored savepoint, 
it fails with below exception.


I am using Flink 1.2 and submitted the job from the UI dashboard.

Can anyone guide me through this issue?

Thanks,

Abhinav

_Jobmanager logs with exception_–

2017-03-18 00:10:09,193 INFO org.apache.flink.runtime.blob.BlobClient 
- Blob client connecting to akka://flink/user/jobmanager


2017-03-18 00:10:09,348 INFO org.apache.flink.runtime.client.JobClient 
- Checking and uploading JAR files


2017-03-18 00:10:09,348 INFO org.apache.flink.runtime.blob.BlobClient 
- Blob client connecting to akka://flink/user/jobmanager


2017-03-18 00:10:09,501 INFO org.apache.flink.yarn.YarnJobManager - 
Submitting job 4425245091bea9ad103dd3ff338244bb (Session Counter Example).


2017-03-18 00:10:09,502 INFO org.apache.flink.yarn.YarnJobManager - 
Using restart strategy NoRestartStrategy for 
4425245091bea9ad103dd3ff338244bb.


2017-03-18 00:10:09,502 INFO org.apache.flink.yarn.YarnJobManager - 
Running initialization on master for job Session Counter Example 
(4425245091bea9ad103dd3ff338244bb).


2017-03-18 00:10:09,502 INFO org.apache.flink.yarn.YarnJobManager - 
Successfully ran initialization on master in 0 ms.


2017-03-18 00:10:09,503 INFO org.apache.flink.yarn.YarnJobManager - 
Starting job from savepoint 
's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.


2017-03-18 00:10:09,636 INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Session 
Counter Example (4425245091bea9ad103dd3ff338244bb) switched from state 
CREATED to FAILING.


org.apache.flink.runtime.execution.SuppressRestartsException: 
Unrecoverable failure. This suppresses job restarts. Please check the 
stack trace for the root cause.


at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)


at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)


at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)


at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)


at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)


at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)

at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)


at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)


at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)


at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


Caused by: java.lang.IllegalArgumentException: Invalid path 
's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.


at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)


at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)


at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)


at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)


... 10 more

2017-03-18 00:10:09,638 INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: 
Custom Source -> Map (1/1) (f7e8f6c8d2030f5773f9d162d9ac2797) switched 
from CREATED to CANCELED.


2017-03-18 00:10:09,639 INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - 
TriggerWindow(TumblingProcessingTimeWindows(15000), 
ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@f0aadd59}, 
ProcessingTimeTrigger(), 
WindowedStream.apply(WindowedStream.java:521)) -> Sink: Unnamed (1/1) 
(7d1917621cf923445ab904bb60c62bfd) switched from CREATED to CANCELED.


2017-03-18 00:10:09,639 INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - Try to 
restart or fail the job Session Counter Example 
(4425245091bea9ad103dd3ff338244bb) if no longer possible.


2017-03-18 00:10:09,639 INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Session 
Counter Example (4425245091bea9ad103dd3ff338244bb) switched from state 
FAILING to FAILED.


org.apache.flink.runtime.execution.SuppressResta

Job fails to start with S3 savepoint

2017-03-17 Thread Bajaj, Abhinav
Hi,

I am trying to explore using S3 for storing checkpoints and savepoints.
I can get Flink to store the checkpoints and savepoints in s3.

However, when I try to submit the same Job using the stored savepoint, it fails 
with below exception.
I am using Flink 1.2 and submitted the job from the UI dashboard.

Can anyone guide me through this issue?

Thanks,
Abhinav

Jobmanager logs with exception –

2017-03-18 00:10:09,193 INFO  org.apache.flink.runtime.blob.BlobClient  
 - Blob client connecting to akka://flink/user/jobmanager
2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.client.JobClient 
 - Checking and uploading JAR files
2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.blob.BlobClient  
 - Blob client connecting to akka://flink/user/jobmanager
2017-03-18 00:10:09,501 INFO  org.apache.flink.yarn.YarnJobManager  
 - Submitting job 4425245091bea9ad103dd3ff338244bb (Session Counter 
Example).
2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager  
 - Using restart strategy NoRestartStrategy for 
4425245091bea9ad103dd3ff338244bb.
2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager  
 - Running initialization on master for job Session Counter Example 
(4425245091bea9ad103dd3ff338244bb).
2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager  
 - Successfully ran initialization on master in 0 ms.
2017-03-18 00:10:09,503 INFO  org.apache.flink.yarn.YarnJobManager  
 - Starting job from savepoint 
's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.
2017-03-18 00:10:09,636 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Session 
Counter Example (4425245091bea9ad103dd3ff338244bb) switched from state CREATED 
to FAILING.
org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable 
failure. This suppresses job restarts. Please check the stack trace for the 
root cause.
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at 
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalArgumentException: Invalid path 
's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.
at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)
at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)
at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)
... 10 more
2017-03-18 00:10:09,638 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom 
Source -> Map (1/1) (f7e8f6c8d2030f5773f9d162d9ac2797) switched from CREATED to 
CANCELED.
2017-03-18 00:10:09,639 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph - 
TriggerWindow(TumblingProcessingTimeWindows(15000), 
ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@f0aadd59},
 ProcessingTimeTrigger(), WindowedStream.apply(WindowedStream.java:521)) -> 
Sink: Unnamed (1/1) (7d1917621cf923445ab904bb60c62bfd) switched from CREATED to 
CANCELED.
2017-03-18 00:10:09,639 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph - Try to restart 
or fail the job Session Counter Example (4425245091bea9ad103dd3ff338244bb) if 
no longer possible.
2017-03-18 00:10:09,639 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph - Job