Sayat Satybaldiyev created FLINK-10286:
------------------------------------------
Summary: Flink Persist Invalid Job Graph in Zookeeper
Key: FLINK-10286
URL: https://issues.apache.org/jira/browse/FLINK-10286
Project: Flink
Issue Type: Bug
Components: Core
Affects Versions: 1.6.0
Reporter: Sayat Satybaldiyev
In HA mode Flink 1.6, Flink persist job graph in Zookpeer even if the job was
not accepted by Job Manager. This particularly bad as later if JM dies and
restarts JM tries to recover the job and obviously fails and dies completely.
How to reproduce:
1. Have HA Flink cluster 1.6
2. Submit invalid job, in my case I'm put invalid file schema for rocksdb state
backed
```
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime);
env.enableCheckpointing(5000);
RocksDBStateBackend backend = new
RocksDBStateBackend("hddd:///tmp/flink/rocksdb");
backend.setPredefinedOptions(PredefinedOptions.FLASH_SSD_OPTIMIZED);
env.setStateBackend(backend);
```
Client returns:
```
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: Could not submit
job (JobID: 9680f02ae2f3806c3b4da25bfacd0749)
```
JM does not accept job, this truncated error log from JM:
```
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to
submit job.
... 24 more
Caused by: java.util.concurrent.CompletionException:
java.lang.RuntimeException:
org.apache.flink.runtime.client.JobExecutionException: Could not set up
JobManager
Caused by: java.lang.RuntimeException: Failed to start checkpoint ID counter:
Could not find a file system implementation for scheme 'hddd'. The scheme is
not directly supported by Flink and no Hadoop file system to support this
scheme could be loaded.
```
4. Go to ZK and observe that JM has saved job to ZK
ls /flink/flink_ns/jobgraphs/9680f02ae2f3806c3b4da25bfacd0749
[7f392fd9-cedc-4978-9186-1f54b98eeeb7]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)