[jira] [Commented] (FLINK-8867) Rocksdb checkpointing failing with fs.default-scheme: hdfs:// config
[ https://issues.apache.org/jira/browse/FLINK-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16462250#comment-16462250 ] Shashank Agarwal commented on FLINK-8867: - [~aljoscha] Actually bit busy with our releases ... Waiting for 1.5 rc2 will test with that. Before we have tested with 1.5-SNAPSHOT but that was a long time ago. > Rocksdb checkpointing failing with fs.default-scheme: hdfs:// config > > > Key: FLINK-8867 > URL: https://issues.apache.org/jira/browse/FLINK-8867 > Project: Flink > Issue Type: Bug > Components: Configuration, State Backends, Checkpointing, YARN >Affects Versions: 1.4.1, 1.4.2 >Reporter: Shashank Agarwal >Assignee: Stephan Ewen >Priority: Blocker > Fix For: 1.5.0 > > > In our setup, when we put an entry in our Flink_conf file for default schema. > {code} > fs.default-scheme: hdfs://mydomain.com:8020/flink > {code} > Than application with rocksdb state backend fails with the following > exception. When we remove this config it works fine. It's working fine with > other state backends. > {code} > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 1 > for operator order ip stream (1/1).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:948) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.Exception: Could not materialize checkpoint 1 for > operator order ip stream (1/1). > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:894) > ... 5 more > Suppressed: java.lang.Exception: Could not properly cancel managed > keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:91) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:976) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:939) > ... 5 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:66) > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:89) > ... 7 more > Caused by: java.lang.IllegalStateException > at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:179) > at > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.materializeSnapshot(RocksDBKeyedStateBackend.java:926) > at > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:389) > at > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:386) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:894) > ... 5 more > [CIRCULAR REFERENCE:java.lang.IllegalStateException] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8867) Rocksdb checkpointing failing with fs.default-scheme: hdfs:// config
[ https://issues.apache.org/jira/browse/FLINK-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390847#comment-16390847 ] Shashank Agarwal commented on FLINK-8867: - [~StephanEwen] [~srichter] You can check full logs at issue https://issues.apache.org/jira/browse/FLINK-7756 Actually, when we were debugging that issue we found the root cause is this. > Rocksdb checkpointing failing with fs.default-scheme: hdfs:// config > > > Key: FLINK-8867 > URL: https://issues.apache.org/jira/browse/FLINK-8867 > Project: Flink > Issue Type: Bug > Components: Configuration, State Backends, Checkpointing, YARN >Affects Versions: 1.4.1, 1.4.2 >Reporter: Shashank Agarwal >Priority: Major > Fix For: 1.5.0, 1.4.3 > > > In our setup, when we put an entry in our Flink_conf file for default schema. > {code} > fs.default-scheme: hdfs://mydomain.com:8020/flink > {code} > Than application with rocksdb state backend fails with the following > exception. When we remove this config it works fine. It's working fine with > other state backends. > {code} > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 1 > for operator order ip stream (1/1).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:948) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.Exception: Could not materialize checkpoint 1 for > operator order ip stream (1/1). > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:894) > ... 5 more > Suppressed: java.lang.Exception: Could not properly cancel managed > keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:91) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:976) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:939) > ... 5 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:66) > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:89) > ... 7 more > Caused by: java.lang.IllegalStateException > at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:179) > at > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.materializeSnapshot(RocksDBKeyedStateBackend.java:926) > at > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:389) > at > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:386) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:894) > ... 5 more > [CIRCULAR REFERENCE:java.lang.IllegalStateException] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8867) Rocksdb checkpointing failing with fs.default-scheme: hdfs:// config
Shashank Agarwal created FLINK-8867: --- Summary: Rocksdb checkpointing failing with fs.default-scheme: hdfs:// config Key: FLINK-8867 URL: https://issues.apache.org/jira/browse/FLINK-8867 Project: Flink Issue Type: Bug Components: Configuration, State Backends, Checkpointing, YARN Affects Versions: 1.4.1, 1.4.2 Reporter: Shashank Agarwal Fix For: 1.5.0, 1.4.3 In our setup, when we put an entry in our Flink_conf file for default schema. {code} fs.default-scheme: hdfs://mydomain.com:8020/flink {code} Than application with rocksdb state backend fails with the following exception. When we remove this config it works fine. It's working fine with other state backends. {code} AsynchronousException{java.lang.Exception: Could not materialize checkpoint 1 for operator order ip stream (1/1).} at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:948) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.Exception: Could not materialize checkpoint 1 for operator order ip stream (1/1). ... 6 more Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:894) ... 5 more Suppressed: java.lang.Exception: Could not properly cancel managed keyed state future. at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:91) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:976) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:939) ... 5 more Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) at org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:66) at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:89) ... 7 more Caused by: java.lang.IllegalStateException at org.apache.flink.util.Preconditions.checkState(Preconditions.java:179) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.materializeSnapshot(RocksDBKeyedStateBackend.java:926) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:389) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:386) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:894) ... 5 more [CIRCULAR REFERENCE:java.lang.IllegalStateException] {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeyedCEP.
[ https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Agarwal closed FLINK-6321. --- Resolution: Fixed Fix Version/s: 1.4.1 same as : https://issues.apache.org/jira/browse/FLINK-7756 > RocksDB state backend Checkpointing is not working with KeyedCEP. > - > > Key: FLINK-6321 > URL: https://issues.apache.org/jira/browse/FLINK-6321 > Project: Flink > Issue Type: Sub-task > Components: CEP >Affects Versions: 1.2.0 > Environment: yarn-cluster, RocksDB State backend, Checkpointing every > 1000 ms >Reporter: Shashank Agarwal >Assignee: Kostas Kloudas >Priority: Blocker > Fix For: 1.5.0, 1.4.2, 1.4.1 > > Attachments: jobmanager.log, taskmanager.log > > > Checkpointing is not working with RocksDBStateBackend when using CEP. It's > working fine with FsStateBackend and MemoryStateBackend. Application failing > every-time. > {code} > 04/18/2017 21:53:20 Job execution switched to status FAILING. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint > 46 for operator KeyedCEPPatternOperator -> Map (1/4).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 46 for > operator KeyedCEPPatternOperator -> Map (1/4). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915) > ... 5 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-7756) RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.
[ https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Agarwal closed FLINK-7756. --- Resolution: Fixed Fix Version/s: 1.4.1 > RocksDB state backend Checkpointing (Async and Incremental) is not working > with CEP. > - > > Key: FLINK-7756 > URL: https://issues.apache.org/jira/browse/FLINK-7756 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing, Streaming >Affects Versions: 1.4.0, 1.3.2 > Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend >Reporter: Shashank Agarwal >Assignee: Aljoscha Krettek >Priority: Blocker > Fix For: 1.5.0, 1.4.2, 1.4.1 > > Attachments: jobmanager.log, jobmanager_without_cassandra.log, > taskmanager.log, taskmanager_without_cassandra.log > > > When i try to use RocksDBStateBackend on my staging cluster (which is using > HDFS as file system) it crashes. But When i use FsStateBackend on staging > (which is using HDFS as file system) it is working fine. > On local with local file system it's working fine in both cases. > Please check attached logs. I have around 20-25 tasks in my app. > {code:java} > 2017-09-29 14:21:31,639 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=0). > 2017-09-29 14:21:31,640 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,020 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=1). > 2017-09-29 14:21:32,022 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,078 INFO com.datastax.driver.core.NettyUtil > - Found Netty's native epoll transport in the classpath, using > it > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (1/2) > (b879f192c4e8aae6671cdafb3a24c00a). > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Map (2/2) > (1ea5aef6ccc7031edc6b37da2912d90b). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (2/2) > (4bac8e764c67520d418a4c755be23d4d). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched > from RUNNING to FAILED. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 > for operator Co-Flat Map (1/2).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 2 for > operator Co-Flat Map (1/2). > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) > ... 5 more > Suppressed: java.lang.Exception: Could not properly cancel managed > keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:961) > ... 5 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAnd
[jira] [Commented] (FLINK-7756) RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.
[ https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386040#comment-16386040 ] Shashank Agarwal commented on FLINK-7756: - [~aljoscha] We found one thing, In our past setup, we had an entry in our Flink_conf file for default schema. {code} fs.default-scheme: hdfs://mydomain.com:8020/flink {code} When we have removed that now it's working fine with previous flink build also which we have built from source using HDP version. So this problem solved, But this is an issue should I report a new bug for that? So closing the issues. Thanks for your great support. > RocksDB state backend Checkpointing (Async and Incremental) is not working > with CEP. > - > > Key: FLINK-7756 > URL: https://issues.apache.org/jira/browse/FLINK-7756 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing, Streaming >Affects Versions: 1.4.0, 1.3.2 > Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend >Reporter: Shashank Agarwal >Assignee: Aljoscha Krettek >Priority: Blocker > Fix For: 1.5.0, 1.4.2 > > Attachments: jobmanager.log, jobmanager_without_cassandra.log, > taskmanager.log, taskmanager_without_cassandra.log > > > When i try to use RocksDBStateBackend on my staging cluster (which is using > HDFS as file system) it crashes. But When i use FsStateBackend on staging > (which is using HDFS as file system) it is working fine. > On local with local file system it's working fine in both cases. > Please check attached logs. I have around 20-25 tasks in my app. > {code:java} > 2017-09-29 14:21:31,639 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=0). > 2017-09-29 14:21:31,640 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,020 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=1). > 2017-09-29 14:21:32,022 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,078 INFO com.datastax.driver.core.NettyUtil > - Found Netty's native epoll transport in the classpath, using > it > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (1/2) > (b879f192c4e8aae6671cdafb3a24c00a). > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Map (2/2) > (1ea5aef6ccc7031edc6b37da2912d90b). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (2/2) > (4bac8e764c67520d418a4c755be23d4d). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched > from RUNNING to FAILED. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 > for operator Co-Flat Map (1/2).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 2 for > operator Co-Flat Map (1/2). > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) > ... 5 more > Suppressed: java.lang.Exception: Could not properly cancel managed > keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) > at >
[jira] [Commented] (FLINK-7756) RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.
[ https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378386#comment-16378386 ] Shashank Agarwal commented on FLINK-7756: - [~aljoscha] Is there any info which we can provide to resolve this. > RocksDB state backend Checkpointing (Async and Incremental) is not working > with CEP. > - > > Key: FLINK-7756 > URL: https://issues.apache.org/jira/browse/FLINK-7756 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing, Streaming >Affects Versions: 1.4.0, 1.3.2 > Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend >Reporter: Shashank Agarwal >Assignee: Aljoscha Krettek >Priority: Blocker > Fix For: 1.5.0, 1.4.1 > > Attachments: jobmanager.log, jobmanager_without_cassandra.log, > taskmanager.log, taskmanager_without_cassandra.log > > > When i try to use RocksDBStateBackend on my staging cluster (which is using > HDFS as file system) it crashes. But When i use FsStateBackend on staging > (which is using HDFS as file system) it is working fine. > On local with local file system it's working fine in both cases. > Please check attached logs. I have around 20-25 tasks in my app. > {code:java} > 2017-09-29 14:21:31,639 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=0). > 2017-09-29 14:21:31,640 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,020 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=1). > 2017-09-29 14:21:32,022 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,078 INFO com.datastax.driver.core.NettyUtil > - Found Netty's native epoll transport in the classpath, using > it > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (1/2) > (b879f192c4e8aae6671cdafb3a24c00a). > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Map (2/2) > (1ea5aef6ccc7031edc6b37da2912d90b). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (2/2) > (4bac8e764c67520d418a4c755be23d4d). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched > from RUNNING to FAILED. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 > for operator Co-Flat Map (1/2).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 2 for > operator Co-Flat Map (1/2). > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) > ... 5 more > Suppressed: java.lang.Exception: Could not properly cancel managed > keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:961) > ... 5 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192
[jira] [Commented] (FLINK-7756) RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.
[ https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16362663#comment-16362663 ] Shashank Agarwal commented on FLINK-7756: - I’ll also try with removing Cassandra.. On Tue, 13 Feb 2018 at 10:27 PM, Shashank Agarwal > RocksDB state backend Checkpointing (Async and Incremental) is not working > with CEP. > - > > Key: FLINK-7756 > URL: https://issues.apache.org/jira/browse/FLINK-7756 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing, Streaming >Affects Versions: 1.4.0, 1.3.2 > Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend >Reporter: Shashank Agarwal >Assignee: Aljoscha Krettek >Priority: Blocker > Fix For: 1.5.0, 1.4.1 > > Attachments: jobmanager.log, taskmanager.log > > > When i try to use RocksDBStateBackend on my staging cluster (which is using > HDFS as file system) it crashes. But When i use FsStateBackend on staging > (which is using HDFS as file system) it is working fine. > On local with local file system it's working fine in both cases. > Please check attached logs. I have around 20-25 tasks in my app. > {code:java} > 2017-09-29 14:21:31,639 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=0). > 2017-09-29 14:21:31,640 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,020 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=1). > 2017-09-29 14:21:32,022 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,078 INFO com.datastax.driver.core.NettyUtil > - Found Netty's native epoll transport in the classpath, using > it > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (1/2) > (b879f192c4e8aae6671cdafb3a24c00a). > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Map (2/2) > (1ea5aef6ccc7031edc6b37da2912d90b). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (2/2) > (4bac8e764c67520d418a4c755be23d4d). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched > from RUNNING to FAILED. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 > for operator Co-Flat Map (1/2).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 2 for > operator Co-Flat Map (1/2). > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) > ... 5 more > Suppressed: java.lang.Exception: Could not properly cancel managed > keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:961) > ... 5 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.F
[jira] [Commented] (FLINK-7756) RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.
[ https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16362657#comment-16362657 ] Shashank Agarwal commented on FLINK-7756: - But this happens only when I choose rocksdb otherwise with fsStatebackend it always works fine. On Tue, 13 Feb 2018 at 10:26 PM, Aljoscha Krettek (JIRA) > RocksDB state backend Checkpointing (Async and Incremental) is not working > with CEP. > - > > Key: FLINK-7756 > URL: https://issues.apache.org/jira/browse/FLINK-7756 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing, Streaming >Affects Versions: 1.4.0, 1.3.2 > Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend >Reporter: Shashank Agarwal >Assignee: Aljoscha Krettek >Priority: Blocker > Fix For: 1.5.0, 1.4.1 > > Attachments: jobmanager.log, taskmanager.log > > > When i try to use RocksDBStateBackend on my staging cluster (which is using > HDFS as file system) it crashes. But When i use FsStateBackend on staging > (which is using HDFS as file system) it is working fine. > On local with local file system it's working fine in both cases. > Please check attached logs. I have around 20-25 tasks in my app. > {code:java} > 2017-09-29 14:21:31,639 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=0). > 2017-09-29 14:21:31,640 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,020 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=1). > 2017-09-29 14:21:32,022 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,078 INFO com.datastax.driver.core.NettyUtil > - Found Netty's native epoll transport in the classpath, using > it > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (1/2) > (b879f192c4e8aae6671cdafb3a24c00a). > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Map (2/2) > (1ea5aef6ccc7031edc6b37da2912d90b). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (2/2) > (4bac8e764c67520d418a4c755be23d4d). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched > from RUNNING to FAILED. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 > for operator Co-Flat Map (1/2).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 2 for > operator Co-Flat Map (1/2). > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) > ... 5 more > Suppressed: java.lang.Exception: Could not properly cancel managed > keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:961) > ... 5 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(Fut
[jira] [Commented] (FLINK-7756) RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.
[ https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16362632#comment-16362632 ] Shashank Agarwal commented on FLINK-7756: - Sure, I will check that. On Tue, 13 Feb 2018 at 10:21 PM, Aljoscha Krettek (JIRA) > RocksDB state backend Checkpointing (Async and Incremental) is not working > with CEP. > - > > Key: FLINK-7756 > URL: https://issues.apache.org/jira/browse/FLINK-7756 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing, Streaming >Affects Versions: 1.4.0, 1.3.2 > Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend >Reporter: Shashank Agarwal >Assignee: Aljoscha Krettek >Priority: Blocker > Fix For: 1.5.0, 1.4.1 > > Attachments: jobmanager.log, taskmanager.log > > > When i try to use RocksDBStateBackend on my staging cluster (which is using > HDFS as file system) it crashes. But When i use FsStateBackend on staging > (which is using HDFS as file system) it is working fine. > On local with local file system it's working fine in both cases. > Please check attached logs. I have around 20-25 tasks in my app. > {code:java} > 2017-09-29 14:21:31,639 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=0). > 2017-09-29 14:21:31,640 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,020 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=1). > 2017-09-29 14:21:32,022 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,078 INFO com.datastax.driver.core.NettyUtil > - Found Netty's native epoll transport in the classpath, using > it > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (1/2) > (b879f192c4e8aae6671cdafb3a24c00a). > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Map (2/2) > (1ea5aef6ccc7031edc6b37da2912d90b). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (2/2) > (4bac8e764c67520d418a4c755be23d4d). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched > from RUNNING to FAILED. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 > for operator Co-Flat Map (1/2).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 2 for > operator Co-Flat Map (1/2). > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) > ... 5 more > Suppressed: java.lang.Exception: Could not properly cancel managed > keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:961) > ... 5 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUti
[jira] [Commented] (FLINK-7756) RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.
[ https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16362610#comment-16362610 ] Shashank Agarwal commented on FLINK-7756: - That was the bug in version 1.3.2 during the build so I have included that. But that won't be an issue I think. cause it's specific to netty. On Tue, Feb 13, 2018 at 10:06 PM, Aljoscha Krettek (JIRA) > RocksDB state backend Checkpointing (Async and Incremental) is not working > with CEP. > - > > Key: FLINK-7756 > URL: https://issues.apache.org/jira/browse/FLINK-7756 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing, Streaming >Affects Versions: 1.4.0, 1.3.2 > Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend >Reporter: Shashank Agarwal >Assignee: Aljoscha Krettek >Priority: Blocker > Fix For: 1.5.0, 1.4.1 > > Attachments: jobmanager.log, taskmanager.log > > > When i try to use RocksDBStateBackend on my staging cluster (which is using > HDFS as file system) it crashes. But When i use FsStateBackend on staging > (which is using HDFS as file system) it is working fine. > On local with local file system it's working fine in both cases. > Please check attached logs. I have around 20-25 tasks in my app. > {code:java} > 2017-09-29 14:21:31,639 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=0). > 2017-09-29 14:21:31,640 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,020 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=1). > 2017-09-29 14:21:32,022 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,078 INFO com.datastax.driver.core.NettyUtil > - Found Netty's native epoll transport in the classpath, using > it > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (1/2) > (b879f192c4e8aae6671cdafb3a24c00a). > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Map (2/2) > (1ea5aef6ccc7031edc6b37da2912d90b). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (2/2) > (4bac8e764c67520d418a4c755be23d4d). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched > from RUNNING to FAILED. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 > for operator Co-Flat Map (1/2).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 2 for > operator Co-Flat Map (1/2). > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) > ... 5 more > Suppressed: java.lang.Exception: Could not properly cancel managed > keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:961) > ... 5 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) >
[jira] [Commented] (FLINK-7756) RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.
[ https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16362595#comment-16362595 ] Shashank Agarwal commented on FLINK-7756: - [~aljoscha] I am building using SBT please find attached : This is working fine with other state backends and on local. {code} resolvers in ThisBuild ++= Seq("Apache Development Snapshot Repository" at "https://repository.apache.org/content/repositories/orgapacheflink-1145";) name := "xyz" version := "0.2" organization := "co.to.my" scalaVersion in ThisBuild := "2.11.7" val flinkVersion = "1.4.1" val flinkDependencies = Seq( "org.slf4j" % "slf4j-log4j12" % "1.7.21", "org.apache.flink" %% "flink-scala" % flinkVersion % "provided", "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided", "org.apache.flink" %% "flink-cep-scala" % flinkVersion % "compile", "org.apache.flink" %% "flink-connector-kafka-0.10" % flinkVersion, "org.apache.flink" %% "flink-connector-filesystem" % flinkVersion, "org.apache.flink" %% "flink-statebackend-rocksdb" % flinkVersion % "provided", "org.apache.flink" %% "flink-connector-cassandra" % flinkVersion, "org.apache.flink" % "flink-shaded-hadoop2" % flinkVersion % "provided", "org.json4s" %% "json4s-jackson" % "3.5.3", "com.sanoma.cda" %% "maxmind-geoip2-scala" % "1.5.4", "org.http4s" %% "http4s-dsl" % "0.15.13", "org.http4s" %% "http4s-blaze-client" % "0.15.13", "com.googlecode.libphonenumber" % "libphonenumber" % "8.8.8") lazy val root = (project in file(".")). settings( libraryDependencies ++= flinkDependencies ) mainClass in assembly := Some("co.to.my.Job") // make run command include the provided dependencies run in Compile := Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run)) // exclude Scala library from assembly assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false) assemblyMergeStrategy in assembly := { case "META-INF/io.netty.versions.properties" => MergeStrategy.first case x => val oldStrategy = (assemblyMergeStrategy in assembly).value oldStrategy(x) } {code} > RocksDB state backend Checkpointing (Async and Incremental) is not working > with CEP. > - > > Key: FLINK-7756 > URL: https://issues.apache.org/jira/browse/FLINK-7756 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing, Streaming >Affects Versions: 1.4.0, 1.3.2 > Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend >Reporter: Shashank Agarwal >Assignee: Aljoscha Krettek >Priority: Blocker > Fix For: 1.5.0, 1.4.1 > > Attachments: jobmanager.log, taskmanager.log > > > When i try to use RocksDBStateBackend on my staging cluster (which is using > HDFS as file system) it crashes. But When i use FsStateBackend on staging > (which is using HDFS as file system) it is working fine. > On local with local file system it's working fine in both cases. > Please check attached logs. I have around 20-25 tasks in my app. > {code:java} > 2017-09-29 14:21:31,639 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=0). > 2017-09-29 14:21:31,640 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,020 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=1). > 2017-09-29 14:21:32,022 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,078 INFO com.datastax.driver.core.NettyUtil > - Found Netty's native epoll transport in the classpath, using > it > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (1/2) > (b879f192c4e8aae6671cdafb3a24c00a). > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Map (2/2) > (1ea5aef6ccc7031edc6b37da2912d90b). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (2/2) > (4bac8e764c67520d418a4c755be23d4d). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched > from RUNNING to FAILED. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 > for o
[jira] [Reopened] (FLINK-7756) RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.
[ https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Agarwal reopened FLINK-7756: - [~aljoscha] [~kkl0u] I have checked with 1.4.1-rc1 and 1.5.0 snapshot still it's running fine on local But giving error on yarn cluster and application getting the crash. I have printed logs in trace mode also, please check attached for 1.4.1-rc1, same was printing in 1.5.0-snapshot. > RocksDB state backend Checkpointing (Async and Incremental) is not working > with CEP. > - > > Key: FLINK-7756 > URL: https://issues.apache.org/jira/browse/FLINK-7756 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing, Streaming >Affects Versions: 1.4.0, 1.3.2 > Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend >Reporter: Shashank Agarwal >Assignee: Aljoscha Krettek >Priority: Blocker > Fix For: 1.5.0, 1.4.1 > > Attachments: jobmanager.log, taskmanager.log > > > When i try to use RocksDBStateBackend on my staging cluster (which is using > HDFS as file system) it crashes. But When i use FsStateBackend on staging > (which is using HDFS as file system) it is working fine. > On local with local file system it's working fine in both cases. > Please check attached logs. I have around 20-25 tasks in my app. > {code:java} > 2017-09-29 14:21:31,639 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=0). > 2017-09-29 14:21:31,640 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,020 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=1). > 2017-09-29 14:21:32,022 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,078 INFO com.datastax.driver.core.NettyUtil > - Found Netty's native epoll transport in the classpath, using > it > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (1/2) > (b879f192c4e8aae6671cdafb3a24c00a). > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Map (2/2) > (1ea5aef6ccc7031edc6b37da2912d90b). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (2/2) > (4bac8e764c67520d418a4c755be23d4d). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched > from RUNNING to FAILED. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 > for operator Co-Flat Map (1/2).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 2 for > operator Co-Flat Map (1/2). > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) > ... 5 more > Suppressed: java.lang.Exception: Could not properly cancel managed > keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:961) > ... 5 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(Fut
[jira] [Updated] (FLINK-7756) RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.
[ https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Agarwal updated FLINK-7756: Attachment: jobmanager.log taskmanager.log > RocksDB state backend Checkpointing (Async and Incremental) is not working > with CEP. > - > > Key: FLINK-7756 > URL: https://issues.apache.org/jira/browse/FLINK-7756 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing, Streaming >Affects Versions: 1.4.0, 1.3.2 > Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend >Reporter: Shashank Agarwal >Assignee: Aljoscha Krettek >Priority: Blocker > Fix For: 1.5.0, 1.4.1 > > Attachments: jobmanager.log, taskmanager.log > > > When i try to use RocksDBStateBackend on my staging cluster (which is using > HDFS as file system) it crashes. But When i use FsStateBackend on staging > (which is using HDFS as file system) it is working fine. > On local with local file system it's working fine in both cases. > Please check attached logs. I have around 20-25 tasks in my app. > {code:java} > 2017-09-29 14:21:31,639 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=0). > 2017-09-29 14:21:31,640 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,020 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=1). > 2017-09-29 14:21:32,022 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,078 INFO com.datastax.driver.core.NettyUtil > - Found Netty's native epoll transport in the classpath, using > it > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (1/2) > (b879f192c4e8aae6671cdafb3a24c00a). > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Map (2/2) > (1ea5aef6ccc7031edc6b37da2912d90b). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (2/2) > (4bac8e764c67520d418a4c755be23d4d). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched > from RUNNING to FAILED. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 > for operator Co-Flat Map (1/2).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 2 for > operator Co-Flat Map (1/2). > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) > ... 5 more > Suppressed: java.lang.Exception: Could not properly cancel managed > keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:961) > ... 5 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.r
[jira] [Updated] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeyedCEP.
[ https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Agarwal updated FLINK-6321: Attachment: jobmanager.log taskmanager.log > RocksDB state backend Checkpointing is not working with KeyedCEP. > - > > Key: FLINK-6321 > URL: https://issues.apache.org/jira/browse/FLINK-6321 > Project: Flink > Issue Type: Sub-task > Components: CEP >Affects Versions: 1.2.0 > Environment: yarn-cluster, RocksDB State backend, Checkpointing every > 1000 ms >Reporter: Shashank Agarwal >Assignee: Kostas Kloudas >Priority: Blocker > Fix For: 1.5.0, 1.4.1 > > Attachments: jobmanager.log, taskmanager.log > > > Checkpointing is not working with RocksDBStateBackend when using CEP. It's > working fine with FsStateBackend and MemoryStateBackend. Application failing > every-time. > {code} > 04/18/2017 21:53:20 Job execution switched to status FAILING. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint > 46 for operator KeyedCEPPatternOperator -> Map (1/4).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 46 for > operator KeyedCEPPatternOperator -> Map (1/4). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915) > ... 5 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeyedCEP.
[ https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Agarwal reopened FLINK-6321: - [~aljoscha] [~kkl0u] I have checked with 1.4.1-rc1 and 1.5.0 snapshot still it's running fine on local But giving error on yarn cluster and application getting the crash. I have printed logs in trace mode also, please check attached for 1.4.1-rc1, same was printing in 1.5.0-snapshot. > RocksDB state backend Checkpointing is not working with KeyedCEP. > - > > Key: FLINK-6321 > URL: https://issues.apache.org/jira/browse/FLINK-6321 > Project: Flink > Issue Type: Sub-task > Components: CEP >Affects Versions: 1.2.0 > Environment: yarn-cluster, RocksDB State backend, Checkpointing every > 1000 ms >Reporter: Shashank Agarwal >Assignee: Kostas Kloudas >Priority: Blocker > Fix For: 1.5.0, 1.4.1 > > > Checkpointing is not working with RocksDBStateBackend when using CEP. It's > working fine with FsStateBackend and MemoryStateBackend. Application failing > every-time. > {code} > 04/18/2017 21:53:20 Job execution switched to status FAILING. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint > 46 for operator KeyedCEPPatternOperator -> Map (1/4).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 46 for > operator KeyedCEPPatternOperator -> Map (1/4). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915) > ... 5 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-7760) Restore failing from external checkpointing metadata.
[ https://issues.apache.org/jira/browse/FLINK-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355676#comment-16355676 ] Shashank Agarwal commented on FLINK-7760: - Sure I’ll checkout by weekend. I’ll be back by then. On Wed, 7 Feb 2018 at 10:03 PM, Aljoscha Krettek (JIRA) > Restore failing from external checkpointing metadata. > - > > Key: FLINK-7760 > URL: https://issues.apache.org/jira/browse/FLINK-7760 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing >Affects Versions: 1.4.0, 1.3.2 > Environment: Yarn, Flink 1.3.2, HDFS, FsStateBackend >Reporter: Shashank Agarwal >Priority: Blocker > Fix For: 1.5.0, 1.4.1 > > > My job failed due to failure of cassandra. I have enabled > ExternalizedCheckpoints. But when job tried to restore from that checkpoint > it's failing continuously with following error. > {code:java} > 2017-10-04 09:39:20,611 INFO org.apache.flink.runtime.taskmanager.Task > - KeyedCEPPatternOperator -> Map (1/2) > (8ff7913f820ead571c8b54ccc6b16045) switched from RUNNING to FAILED. > java.lang.IllegalStateException: Could not initialize keyed state backend. > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:321) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:217) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:676) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:663) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.StreamCorruptedException: invalid type code: 00 > at > java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2519) > at > java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2553) > at > java.io.ObjectInputStream$BlockDataInputStream.skipBlockData(ObjectInputStream.java:2455) > at java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1951) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) > at > org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeCondition(NFA.java:1211) > at > org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeStates(NFA.java:1169) > at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:957) > at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:852) > at > org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders$StateTableByKeyGroupReaderV2V3.readMappingsInKeyGroup(StateTableByKeyGroupReaders.java:132) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:518) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:397) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(StreamTask.java:772) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:311) > ... 6 more > {code} > I have tried to start new job also after failure with parameter {code:java} > -s [checkpoint meta data path]{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeyedCEP.
[ https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354152#comment-16354152 ] Shashank Agarwal commented on FLINK-6321: - Sure I’ll check with 1.5 snapshot, only thing is during test on cluster I have to resolve the dependencies can you give me a clue or link how I can test that on cluster with dependencies. I using SBT. Is their any script in tools which I can use to publish the dependencies. On Tue, 6 Feb 2018 at 10:24 PM, Aljoscha Krettek (JIRA) > RocksDB state backend Checkpointing is not working with KeyedCEP. > - > > Key: FLINK-6321 > URL: https://issues.apache.org/jira/browse/FLINK-6321 > Project: Flink > Issue Type: Sub-task > Components: CEP >Affects Versions: 1.2.0 > Environment: yarn-cluster, RocksDB State backend, Checkpointing every > 1000 ms >Reporter: Shashank Agarwal >Assignee: Kostas Kloudas >Priority: Blocker > Fix For: 1.5.0, 1.4.1 > > > Checkpointing is not working with RocksDBStateBackend when using CEP. It's > working fine with FsStateBackend and MemoryStateBackend. Application failing > every-time. > {code} > 04/18/2017 21:53:20 Job execution switched to status FAILING. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint > 46 for operator KeyedCEPPatternOperator -> Map (1/4).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 46 for > operator KeyedCEPPatternOperator -> Map (1/4). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915) > ... 5 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-7756) RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.
[ https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354136#comment-16354136 ] Shashank Agarwal commented on FLINK-7756: - I m travelling now will send that logs in 1-2 days for sure. But I have checked nothing got failed before this. Cause this was the first bug. One more thing it’s working fine in local enviorment and local cluster. But when I publish my app to yarn cluster than reveived this error. Same when I replace it with fssatebackend it starts working. But there’s another issue with restore which I have reported in other bugs. So issue comes when I run on yarn cluster with HDFS backend. On Tue, 6 Feb 2018 at 10:19 PM, Aljoscha Krettek (JIRA) > RocksDB state backend Checkpointing (Async and Incremental) is not working > with CEP. > - > > Key: FLINK-7756 > URL: https://issues.apache.org/jira/browse/FLINK-7756 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing, Streaming >Affects Versions: 1.4.0, 1.3.2 > Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend >Reporter: Shashank Agarwal >Priority: Blocker > Fix For: 1.5.0, 1.4.1 > > > When i try to use RocksDBStateBackend on my staging cluster (which is using > HDFS as file system) it crashes. But When i use FsStateBackend on staging > (which is using HDFS as file system) it is working fine. > On local with local file system it's working fine in both cases. > Please check attached logs. I have around 20-25 tasks in my app. > {code:java} > 2017-09-29 14:21:31,639 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=0). > 2017-09-29 14:21:31,640 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,020 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=1). > 2017-09-29 14:21:32,022 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,078 INFO com.datastax.driver.core.NettyUtil > - Found Netty's native epoll transport in the classpath, using > it > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (1/2) > (b879f192c4e8aae6671cdafb3a24c00a). > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Map (2/2) > (1ea5aef6ccc7031edc6b37da2912d90b). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (2/2) > (4bac8e764c67520d418a4c755be23d4d). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched > from RUNNING to FAILED. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 > for operator Co-Flat Map (1/2).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 2 for > operator Co-Flat Map (1/2). > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) > ... 5 more > Suppressed: java.lang.Exception: Could not properly cancel managed > keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunna
[jira] [Comment Edited] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeyedCEP.
[ https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16302523#comment-16302523 ] Shashank Agarwal edited comment on FLINK-6321 at 1/29/18 11:22 AM: --- Still not working in 1.4.0 Refere issue : https://issues.apache.org/jira/browse/FLINK-7756 was (Author: shashank734): Still not working in 1.4.2 Refere issue : https://issues.apache.org/jira/browse/FLINK-7756 > RocksDB state backend Checkpointing is not working with KeyedCEP. > - > > Key: FLINK-6321 > URL: https://issues.apache.org/jira/browse/FLINK-6321 > Project: Flink > Issue Type: Sub-task > Components: CEP >Affects Versions: 1.2.0 > Environment: yarn-cluster, RocksDB State backend, Checkpointing every > 1000 ms >Reporter: Shashank Agarwal >Assignee: Kostas Kloudas >Priority: Blocker > Fix For: 1.5.0, 1.4.1 > > > Checkpointing is not working with RocksDBStateBackend when using CEP. It's > working fine with FsStateBackend and MemoryStateBackend. Application failing > every-time. > {code} > 04/18/2017 21:53:20 Job execution switched to status FAILING. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint > 46 for operator KeyedCEPPatternOperator -> Map (1/4).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 46 for > operator KeyedCEPPatternOperator -> Map (1/4). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915) > ... 5 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6883) Serializer for collection of Scala case classes are generated with different anonymous class names in 1.3
[ https://issues.apache.org/jira/browse/FLINK-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16335823#comment-16335823 ] Shashank Agarwal commented on FLINK-6883: - I am facing something similar issue in 1.4 where i have created save point and restored both in same 1.4 without changing application code. I am using CEP. {code} java.lang.IllegalStateException: Could not initialize keyed state backend. at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:293) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:225) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:692) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:679) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.InvalidClassException: org.apache.flink.cep.scala.pattern.Pattern$$anon$3; invalid descriptor for field at java.io.ObjectStreamClass.readNonProxy(ObjectStreamClass.java:723) at java.io.ObjectInputStream.readClassDescriptor(ObjectInputStream.java:833) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1609) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeCondition(NFA.java:1171) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeStates(NFA.java:1129) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:917) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:820) at org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders$StateTableByKeyGroupReaderV2V3.readMappingsInKeyGroup(StateTableByKeyGroupReaders.java:133) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:575) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:446) at org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(StreamTask.java:773) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:283) ... 6 more Caused by: java.lang.IllegalArgumentException: illegal signature at java.io.ObjectStreamField.(ObjectStreamField.java:122) at java.io.ObjectStreamClass.readNonProxy(ObjectStreamClass.java:721) ... 21 more {code} > Serializer for collection of Scala case classes are generated with different > anonymous class names in 1.3 > - > > Key: FLINK-6883 > URL: https://issues.apache.org/jira/browse/FLINK-6883 > Project: Flink > Issue Type: Bug > Components: Scala API, Type Serialization System >Affects Versions: 1.3.0 >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai >Priority: Blocker > Labels: flink-rel-1.3.1-blockers > Fix For: 1.3.1, 1.4.0 > > > In the Scala API, serializers are generated using Scala macros (via the > {{org.apache.flink.streaming.api.scala.createTypeInformation(..)}} util). > The generated serializers are inner anonymous classes, therefore classnames > will differ depending on when / order that the serializers are generated. > From 1.1 / 1.2 to Flink 1.3, the generated classnames for a serializer for a > collections of case classes (e.g. {{List[SomeUserCaseClass]}}) will be > different. In other words, the exact same user code written in the Scala API, > compiling it with 1.1 / 1.2 and with 1.3 will result in different classnames. > This is problematic for restoring older savepoints that have Scala case class > collections in their state, because the old serializer cannot be recovered > (due to the generated classname change). > For now, I've managed to identify that the root cause for this is that in 1.3 > the {{TypeSerializer}} base class additionally extends the > {{TypeDeserializer}} interface. Removing this extending resolves the problem. > The actual reason for why this affects the generated classname is still being > investigated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8451) CaseClassSerializer is not backwards compatible in 1.4
[ https://issues.apache.org/jira/browse/FLINK-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330562#comment-16330562 ] Shashank Agarwal commented on FLINK-8451: - I have taken savepoint from 1.4 and tried to restore on itself 1.4 but still, it's failing. I haven't changed any code or version. I am using java.utill.list, option and Seq in my case class. I am getting following error while restore : {code} java.lang.IllegalStateException: Could not initialize keyed state backend. at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:293) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:225) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:692) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:679) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.InvalidClassException: org.apache.flink.cep.scala.pattern.Pattern$$anon$3; invalid descriptor for field at java.io.ObjectStreamClass.readNonProxy(ObjectStreamClass.java:723) at java.io.ObjectInputStream.readClassDescriptor(ObjectInputStream.java:833) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1609) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeCondition(NFA.java:1171) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeStates(NFA.java:1129) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:917) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:820) at org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders$StateTableByKeyGroupReaderV2V3.readMappingsInKeyGroup(StateTableByKeyGroupReaders.java:133) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:575) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:446) at org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(StreamTask.java:773) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:283) ... 6 more Caused by: java.lang.IllegalArgumentException: illegal signature at java.io.ObjectStreamField.(ObjectStreamField.java:122) at java.io.ObjectStreamClass.readNonProxy(ObjectStreamClass.java:721) ... 21 more {code} > CaseClassSerializer is not backwards compatible in 1.4 > -- > > Key: FLINK-8451 > URL: https://issues.apache.org/jira/browse/FLINK-8451 > Project: Flink > Issue Type: Bug > Components: Type Serialization System >Affects Versions: 1.4.0, 1.5.0 >Reporter: Timo Walther >Assignee: Timo Walther >Priority: Major > > There seems to be problems with the updated Scala version and the > CaseClassSerializer that make it impossible to restore from a Flink 1.3 > savepoint. > http://mail-archives.apache.org/mod_mbox/flink-user/201801.mbox/%3CCACk7FThV5itjSj_1fG9oaWS86z8WTKWs7abHvok6FnHzq9XT-A%40mail.gmail.com%3E > http://mail-archives.apache.org/mod_mbox/flink-user/201801.mbox/%3C7CABB00B-D52F-4878-B04F-201415CEB658%40mediamath.com%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7760) Restore failing from external checkpointing metadata.
[ https://issues.apache.org/jira/browse/FLINK-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Agarwal updated FLINK-7760: Affects Version/s: 1.4.0 > Restore failing from external checkpointing metadata. > - > > Key: FLINK-7760 > URL: https://issues.apache.org/jira/browse/FLINK-7760 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing >Affects Versions: 1.4.0, 1.3.2 > Environment: Yarn, Flink 1.3.2, HDFS, FsStateBackend >Reporter: Shashank Agarwal >Priority: Blocker > Fix For: 1.4.0, 1.5.0, 1.4.1 > > > My job failed due to failure of cassandra. I have enabled > ExternalizedCheckpoints. But when job tried to restore from that checkpoint > it's failing continuously with following error. > {code:java} > 2017-10-04 09:39:20,611 INFO org.apache.flink.runtime.taskmanager.Task > - KeyedCEPPatternOperator -> Map (1/2) > (8ff7913f820ead571c8b54ccc6b16045) switched from RUNNING to FAILED. > java.lang.IllegalStateException: Could not initialize keyed state backend. > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:321) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:217) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:676) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:663) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.StreamCorruptedException: invalid type code: 00 > at > java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2519) > at > java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2553) > at > java.io.ObjectInputStream$BlockDataInputStream.skipBlockData(ObjectInputStream.java:2455) > at java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1951) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) > at > org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeCondition(NFA.java:1211) > at > org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeStates(NFA.java:1169) > at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:957) > at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:852) > at > org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders$StateTableByKeyGroupReaderV2V3.readMappingsInKeyGroup(StateTableByKeyGroupReaders.java:132) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:518) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:397) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(StreamTask.java:772) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:311) > ... 6 more > {code} > I have tried to start new job also after failure with parameter {code:java} > -s [checkpoint meta data path]{code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7760) Restore failing from external checkpointing metadata.
[ https://issues.apache.org/jira/browse/FLINK-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309634#comment-16309634 ] Shashank Agarwal commented on FLINK-7760: - Hi [~kkl0u] , I have checked again and facing the same issue while restore. So unable to checkpoint in Rocksdb and in fsStateBackend after savepoint during restore facing this issue. So unable to restore my state in any case. It's not printing any extra logs in debugging mode also. Please guide I am using CEP, Yarn, HDFS, Scala. Otherwise, i have to use some DB for the state which I don't want. Thanks > Restore failing from external checkpointing metadata. > - > > Key: FLINK-7760 > URL: https://issues.apache.org/jira/browse/FLINK-7760 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing >Affects Versions: 1.3.2 > Environment: Yarn, Flink 1.3.2, HDFS, FsStateBackend >Reporter: Shashank Agarwal >Priority: Blocker > Fix For: 1.4.0, 1.5.0, 1.4.1 > > > My job failed due to failure of cassandra. I have enabled > ExternalizedCheckpoints. But when job tried to restore from that checkpoint > it's failing continuously with following error. > {code:java} > 2017-10-04 09:39:20,611 INFO org.apache.flink.runtime.taskmanager.Task > - KeyedCEPPatternOperator -> Map (1/2) > (8ff7913f820ead571c8b54ccc6b16045) switched from RUNNING to FAILED. > java.lang.IllegalStateException: Could not initialize keyed state backend. > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:321) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:217) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:676) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:663) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.StreamCorruptedException: invalid type code: 00 > at > java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2519) > at > java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2553) > at > java.io.ObjectInputStream$BlockDataInputStream.skipBlockData(ObjectInputStream.java:2455) > at java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1951) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) > at > org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeCondition(NFA.java:1211) > at > org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeStates(NFA.java:1169) > at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:957) > at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:852) > at > org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders$StateTableByKeyGroupReaderV2V3.readMappingsInKeyGroup(StateTableByKeyGroupReaders.java:132) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:518) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:397) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(StreamTask.java:772) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:311) > ... 6 more > {code} > I have tried to start new job also after failure with parameter {code:java} > -s [checkpoint meta data path]{code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7756) RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.
[ https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309626#comment-16309626 ] Shashank Agarwal commented on FLINK-7756: - Hi [~kkl0u] , I have tried this, No this doesn't solve the issue. It's printing same logs and job is failing. > RocksDB state backend Checkpointing (Async and Incremental) is not working > with CEP. > - > > Key: FLINK-7756 > URL: https://issues.apache.org/jira/browse/FLINK-7756 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing, Streaming >Affects Versions: 1.4.0, 1.3.2 > Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend >Reporter: Shashank Agarwal >Priority: Blocker > Fix For: 1.4.0, 1.5.0, 1.4.1 > > > When i try to use RocksDBStateBackend on my staging cluster (which is using > HDFS as file system) it crashes. But When i use FsStateBackend on staging > (which is using HDFS as file system) it is working fine. > On local with local file system it's working fine in both cases. > Please check attached logs. I have around 20-25 tasks in my app. > {code:java} > 2017-09-29 14:21:31,639 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=0). > 2017-09-29 14:21:31,640 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,020 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=1). > 2017-09-29 14:21:32,022 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,078 INFO com.datastax.driver.core.NettyUtil > - Found Netty's native epoll transport in the classpath, using > it > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (1/2) > (b879f192c4e8aae6671cdafb3a24c00a). > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Map (2/2) > (1ea5aef6ccc7031edc6b37da2912d90b). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (2/2) > (4bac8e764c67520d418a4c755be23d4d). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched > from RUNNING to FAILED. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 > for operator Co-Flat Map (1/2).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 2 for > operator Co-Flat Map (1/2). > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) > ... 5 more > Suppressed: java.lang.Exception: Could not properly cancel managed > keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:961) > ... 5 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > o
[jira] [Commented] (FLINK-7760) Restore failing from external checkpointing metadata.
[ https://issues.apache.org/jira/browse/FLINK-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305614#comment-16305614 ] Shashank Agarwal commented on FLINK-7760: - No I have’t changed the version or code. I have done savepoint in 1.4.0 and restored also in 1.4.0 , I am using filesystemstatebackend . On Thu, 28 Dec 2017 at 8:32 PM, Kostas Kloudas (JIRA) > Restore failing from external checkpointing metadata. > - > > Key: FLINK-7760 > URL: https://issues.apache.org/jira/browse/FLINK-7760 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing >Affects Versions: 1.3.2 > Environment: Yarn, Flink 1.3.2, HDFS, FsStateBackend >Reporter: Shashank Agarwal >Priority: Blocker > Fix For: 1.4.0, 1.5.0, 1.4.1 > > > My job failed due to failure of cassandra. I have enabled > ExternalizedCheckpoints. But when job tried to restore from that checkpoint > it's failing continuously with following error. > {code:java} > 2017-10-04 09:39:20,611 INFO org.apache.flink.runtime.taskmanager.Task > - KeyedCEPPatternOperator -> Map (1/2) > (8ff7913f820ead571c8b54ccc6b16045) switched from RUNNING to FAILED. > java.lang.IllegalStateException: Could not initialize keyed state backend. > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:321) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:217) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:676) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:663) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.StreamCorruptedException: invalid type code: 00 > at > java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2519) > at > java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2553) > at > java.io.ObjectInputStream$BlockDataInputStream.skipBlockData(ObjectInputStream.java:2455) > at java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1951) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) > at > org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeCondition(NFA.java:1211) > at > org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeStates(NFA.java:1169) > at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:957) > at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:852) > at > org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders$StateTableByKeyGroupReaderV2V3.readMappingsInKeyGroup(StateTableByKeyGroupReaders.java:132) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:518) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:397) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(StreamTask.java:772) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:311) > ... 6 more > {code} > I have tried to start new job also after failure with parameter {code:java} > -s [checkpoint meta data path]{code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-8319) Savepoint restore failing in CEP
Shashank Agarwal created FLINK-8319: --- Summary: Savepoint restore failing in CEP Key: FLINK-8319 URL: https://issues.apache.org/jira/browse/FLINK-8319 Project: Flink Issue Type: Bug Components: CEP, State Backends, Checkpointing, YARN Affects Versions: 1.4.0 Environment: Yarn Cluster Reporter: Shashank Agarwal Fix For: 1.5.0, 1.4.1 I have reported some bugs before also in 1.3.2 but this time error is different while restoring savepoint or checkpoint. https://issues.apache.org/jira/browse/FLINK-7760 {code} java.lang.IllegalStateException: Could not initialize keyed state backend. at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:293) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:225) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:692) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:679) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.InvalidClassException: org.apache.flink.cep.scala.pattern.Pattern$$anon$3; invalid descriptor for field at java.io.ObjectStreamClass.readNonProxy(ObjectStreamClass.java:723) at java.io.ObjectInputStream.readClassDescriptor(ObjectInputStream.java:833) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1609) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeCondition(NFA.java:1171) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeStates(NFA.java:1129) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:917) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:820) at org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders$StateTableByKeyGroupReaderV2V3.readMappingsInKeyGroup(StateTableByKeyGroupReaders.java:133) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:575) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:446) at org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(StreamTask.java:773) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:283) ... 6 more Caused by: java.lang.IllegalArgumentException: illegal signature at java.io.ObjectStreamField.(ObjectStreamField.java:122) at java.io.ObjectStreamClass.readNonProxy(ObjectStreamClass.java:721) ... 21 more {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Reopened] (FLINK-7760) Restore failing from external checkpointing metadata.
[ https://issues.apache.org/jira/browse/FLINK-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Agarwal reopened FLINK-7760: - Restore failing from savepoint too in 1.4.0 Printing following logs : {code} java.lang.IllegalStateException: Could not initialize keyed state backend. at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:293) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:225) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:692) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:679) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.InvalidClassException: org.apache.flink.cep.scala.pattern.Pattern$$anon$3; invalid descriptor for field at java.io.ObjectStreamClass.readNonProxy(ObjectStreamClass.java:723) at java.io.ObjectInputStream.readClassDescriptor(ObjectInputStream.java:833) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1609) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeCondition(NFA.java:1171) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeStates(NFA.java:1129) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:917) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:820) at org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders$StateTableByKeyGroupReaderV2V3.readMappingsInKeyGroup(StateTableByKeyGroupReaders.java:133) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:575) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:446) at org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(StreamTask.java:773) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:283) ... 6 more Caused by: java.lang.IllegalArgumentException: illegal signature at java.io.ObjectStreamField.(ObjectStreamField.java:122) at java.io.ObjectStreamClass.readNonProxy(ObjectStreamClass.java:721) ... 21 more {code} > Restore failing from external checkpointing metadata. > - > > Key: FLINK-7760 > URL: https://issues.apache.org/jira/browse/FLINK-7760 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing >Affects Versions: 1.3.2 > Environment: Yarn, Flink 1.3.2, HDFS, FsStateBackend >Reporter: Shashank Agarwal >Priority: Blocker > Fix For: 1.4.0, 1.5.0, 1.4.1 > > > My job failed due to failure of cassandra. I have enabled > ExternalizedCheckpoints. But when job tried to restore from that checkpoint > it's failing continuously with following error. > {code:java} > 2017-10-04 09:39:20,611 INFO org.apache.flink.runtime.taskmanager.Task > - KeyedCEPPatternOperator -> Map (1/2) > (8ff7913f820ead571c8b54ccc6b16045) switched from RUNNING to FAILED. > java.lang.IllegalStateException: Could not initialize keyed state backend. > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:321) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:217) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:676) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:663) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.StreamCorruptedException: invalid type code: 00 > at > java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2519) > at > java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2553) > at > java.io.ObjectInputStream$BlockDataInputStr
[jira] [Updated] (FLINK-7760) Restore failing from external checkpointing metadata.
[ https://issues.apache.org/jira/browse/FLINK-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Agarwal updated FLINK-7760: Fix Version/s: 1.4.1 1.5.0 > Restore failing from external checkpointing metadata. > - > > Key: FLINK-7760 > URL: https://issues.apache.org/jira/browse/FLINK-7760 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing >Affects Versions: 1.3.2 > Environment: Yarn, Flink 1.3.2, HDFS, FsStateBackend >Reporter: Shashank Agarwal >Priority: Blocker > Fix For: 1.4.0, 1.5.0, 1.4.1 > > > My job failed due to failure of cassandra. I have enabled > ExternalizedCheckpoints. But when job tried to restore from that checkpoint > it's failing continuously with following error. > {code:java} > 2017-10-04 09:39:20,611 INFO org.apache.flink.runtime.taskmanager.Task > - KeyedCEPPatternOperator -> Map (1/2) > (8ff7913f820ead571c8b54ccc6b16045) switched from RUNNING to FAILED. > java.lang.IllegalStateException: Could not initialize keyed state backend. > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:321) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:217) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:676) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:663) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.StreamCorruptedException: invalid type code: 00 > at > java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2519) > at > java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2553) > at > java.io.ObjectInputStream$BlockDataInputStream.skipBlockData(ObjectInputStream.java:2455) > at java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1951) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) > at > org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeCondition(NFA.java:1211) > at > org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeStates(NFA.java:1169) > at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:957) > at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:852) > at > org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders$StateTableByKeyGroupReaderV2V3.readMappingsInKeyGroup(StateTableByKeyGroupReaders.java:132) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:518) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:397) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(StreamTask.java:772) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:311) > ... 6 more > {code} > I have tried to start new job also after failure with parameter {code:java} > -s [checkpoint meta data path]{code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Reopened] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeyedCEP.
[ https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Agarwal reopened FLINK-6321: - Still not working in 1.4.2 Refere issue : https://issues.apache.org/jira/browse/FLINK-7756 > RocksDB state backend Checkpointing is not working with KeyedCEP. > - > > Key: FLINK-6321 > URL: https://issues.apache.org/jira/browse/FLINK-6321 > Project: Flink > Issue Type: Sub-task > Components: CEP >Affects Versions: 1.2.0 > Environment: yarn-cluster, RocksDB State backend, Checkpointing every > 1000 ms >Reporter: Shashank Agarwal >Assignee: Kostas Kloudas >Priority: Blocker > Fix For: 1.4.0, 1.5.0, 1.4.1 > > > Checkpointing is not working with RocksDBStateBackend when using CEP. It's > working fine with FsStateBackend and MemoryStateBackend. Application failing > every-time. > {code} > 04/18/2017 21:53:20 Job execution switched to status FAILING. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint > 46 for operator KeyedCEPPatternOperator -> Map (1/4).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 46 for > operator KeyedCEPPatternOperator -> Map (1/4). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915) > ... 5 more > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeyedCEP.
[ https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Agarwal updated FLINK-6321: Fix Version/s: 1.4.1 1.5.0 > RocksDB state backend Checkpointing is not working with KeyedCEP. > - > > Key: FLINK-6321 > URL: https://issues.apache.org/jira/browse/FLINK-6321 > Project: Flink > Issue Type: Sub-task > Components: CEP >Affects Versions: 1.2.0 > Environment: yarn-cluster, RocksDB State backend, Checkpointing every > 1000 ms >Reporter: Shashank Agarwal >Assignee: Kostas Kloudas >Priority: Blocker > Fix For: 1.4.0, 1.5.0, 1.4.1 > > > Checkpointing is not working with RocksDBStateBackend when using CEP. It's > working fine with FsStateBackend and MemoryStateBackend. Application failing > every-time. > {code} > 04/18/2017 21:53:20 Job execution switched to status FAILING. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint > 46 for operator KeyedCEPPatternOperator -> Map (1/4).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 46 for > operator KeyedCEPPatternOperator -> Map (1/4). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915) > ... 5 more > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-7756) RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.
[ https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Agarwal updated FLINK-7756: Fix Version/s: 1.4.1 1.5.0 > RocksDB state backend Checkpointing (Async and Incremental) is not working > with CEP. > - > > Key: FLINK-7756 > URL: https://issues.apache.org/jira/browse/FLINK-7756 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing, Streaming >Affects Versions: 1.4.0, 1.3.2 > Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend >Reporter: Shashank Agarwal >Priority: Blocker > Fix For: 1.4.0, 1.5.0, 1.4.1 > > > When i try to use RocksDBStateBackend on my staging cluster (which is using > HDFS as file system) it crashes. But When i use FsStateBackend on staging > (which is using HDFS as file system) it is working fine. > On local with local file system it's working fine in both cases. > Please check attached logs. I have around 20-25 tasks in my app. > {code:java} > 2017-09-29 14:21:31,639 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=0). > 2017-09-29 14:21:31,640 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,020 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=1). > 2017-09-29 14:21:32,022 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,078 INFO com.datastax.driver.core.NettyUtil > - Found Netty's native epoll transport in the classpath, using > it > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (1/2) > (b879f192c4e8aae6671cdafb3a24c00a). > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Map (2/2) > (1ea5aef6ccc7031edc6b37da2912d90b). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (2/2) > (4bac8e764c67520d418a4c755be23d4d). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched > from RUNNING to FAILED. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 > for operator Co-Flat Map (1/2).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 2 for > operator Co-Flat Map (1/2). > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) > ... 5 more > Suppressed: java.lang.Exception: Could not properly cancel managed > keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:961) > ... 5 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:85) > at > org.apache.flink.str
[jira] [Updated] (FLINK-7756) RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.
[ https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Agarwal updated FLINK-7756: Affects Version/s: 1.4.0 > RocksDB state backend Checkpointing (Async and Incremental) is not working > with CEP. > - > > Key: FLINK-7756 > URL: https://issues.apache.org/jira/browse/FLINK-7756 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing, Streaming >Affects Versions: 1.4.0, 1.3.2 > Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend >Reporter: Shashank Agarwal >Priority: Blocker > Fix For: 1.4.0 > > > When i try to use RocksDBStateBackend on my staging cluster (which is using > HDFS as file system) it crashes. But When i use FsStateBackend on staging > (which is using HDFS as file system) it is working fine. > On local with local file system it's working fine in both cases. > Please check attached logs. I have around 20-25 tasks in my app. > {code:java} > 2017-09-29 14:21:31,639 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=0). > 2017-09-29 14:21:31,640 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,020 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=1). > 2017-09-29 14:21:32,022 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,078 INFO com.datastax.driver.core.NettyUtil > - Found Netty's native epoll transport in the classpath, using > it > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (1/2) > (b879f192c4e8aae6671cdafb3a24c00a). > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Map (2/2) > (1ea5aef6ccc7031edc6b37da2912d90b). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (2/2) > (4bac8e764c67520d418a4c755be23d4d). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched > from RUNNING to FAILED. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 > for operator Co-Flat Map (1/2).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 2 for > operator Co-Flat Map (1/2). > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) > ... 5 more > Suppressed: java.lang.Exception: Could not properly cancel managed > keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:961) > ... 5 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:85) > at > org.apache.flink.streaming.api.operators.OperatorSnapsh
[jira] [Reopened] (FLINK-7756) RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.
[ https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Agarwal reopened FLINK-7756: - Checked with version 1.4.0 still coming this issue, I have checked it's creating files and directories at HDFS path but still failing with this : {code} AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 for operator Create account CEP -> (Account state, Account Property State) (3/6).} at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:948) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Exception: Could not materialize checkpoint 2 for operator Create account CEP -> (Account state, Account Property State) (3/6). ... 6 more Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:894) ... 5 more Suppressed: java.lang.Exception: Could not properly cancel managed keyed state future. at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:91) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:976) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:939) ... 5 more Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) at org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:66) at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:89) ... 7 more Caused by: java.lang.IllegalStateException at org.apache.flink.util.Preconditions.checkState(Preconditions.java:179) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.materializeSnapshot(RocksDBKeyedStateBackend.java:920) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:383) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:380) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:894) ... 5 more [CIRCULAR REFERENCE:java.lang.IllegalStateException] {code} > RocksDB state backend Checkpointing (Async and Incremental) is not working > with CEP. > - > > Key: FLINK-7756 > URL: https://issues.apache.org/jira/browse/FLINK-7756 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing, Streaming >Affects Versions: 1.4.0, 1.3.2 > Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend >Reporter: Shashank Agarwal >Priority: Blocker > Fix For: 1.4.0 > > > When i try to use RocksDBStateBackend on my staging cluster (which is using > HDFS as file system) it crashes. But When i use FsStateBackend on staging > (which is using HDFS as file system) it is working fine. > On local with local file system it's working fine in both cases. > Please check attached logs. I have around 20-25 tasks in my app. > {code:java} > 2017-09-29 14:21:31,639 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=0). > 2017-09-29 14:21:31,640 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:2
[jira] [Commented] (FLINK-8182) Unable to read hdfs file system directory(which contains sub directories) recursively
[ https://issues.apache.org/jira/browse/FLINK-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276799#comment-16276799 ] Shashank Agarwal commented on FLINK-8182: - I think it gets stuck in the first directory cause it's not printing any directory or file listing. On Mon, Dec 4, 2017 at 5:49 PM, Fabian Hueske (JIRA) > Unable to read hdfs file system directory(which contains sub directories) > recursively > --- > > Key: FLINK-8182 > URL: https://issues.apache.org/jira/browse/FLINK-8182 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.3.2 >Reporter: Shashank Agarwal > > Unable to read hdfs file system directory(which contains subdirectories) > recursively, It works fine when a single directory contains only files but > when the directory contains subdirectories it dosesn't read subdirectory > files. > {code} > streamExecutionEnvironment.readTextFile("HDFS path") > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-8182) Unable to read hdfs file system directory(which contains sub directories) recursively
[ https://issues.apache.org/jira/browse/FLINK-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274262#comment-16274262 ] Shashank Agarwal commented on FLINK-8182: - Actually I have checked it for around 10 mins till 300 attempts than I stoped it. > Unable to read hdfs file system directory(which contains sub directories) > recursively > --- > > Key: FLINK-8182 > URL: https://issues.apache.org/jira/browse/FLINK-8182 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.3.2 >Reporter: Shashank Agarwal > > Unable to read hdfs file system directory(which contains subdirectories) > recursively, It works fine when a single directory contains only files but > when the directory contains subdirectories it dosesn't read subdirectory > files. > {code} > streamExecutionEnvironment.readTextFile("HDFS path") > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-8182) Unable to read hdfs file system directory(which contains sub directories) recursively
[ https://issues.apache.org/jira/browse/FLINK-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274215#comment-16274215 ] Shashank Agarwal commented on FLINK-8182: - Actually it's unable to access listing after the change to Debug logs its printing continues following logs : {code} 15:49:44,713 DEBUG org.apache.hadoop.ipc.Client - Connecting to {masked}/{masked}::8020 15:49:45,302 DEBUG org.apache.hadoop.ipc.Client - IPC Client (428622004) connection to {masked}/{masked}:8020 from shashank: starting, having connections 1 15:49:45,307 DEBUG org.apache.hadoop.ipc.Client - IPC Client (428622004) connection to {masked}/{masked}::8020 from shashank sending #0 15:49:47,292 DEBUG org.apache.hadoop.ipc.Client - IPC Client (428622004) connection to {masked}/{masked}::8020 from shashank got value #0 15:49:47,292 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine - Call: getFileInfo took 2598ms 15:49:47,320 DEBUG org.apache.hadoop.ipc.Client - IPC Client (428622004) connection to {masked}/{masked}::8020 from shashank sending #1 15:49:47,454 DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Sending heartbeat to JobManager 15:49:48,278 DEBUG org.apache.hadoop.ipc.Client - IPC Client (428622004) connection to {masked}/{masked}::8020 from shashank got value #1 15:49:48,279 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine - Call: getListing took 958ms 15:49:48,303 DEBUG org.apache.hadoop.ipc.Client - IPC Client (428622004) connection to {masked}/{masked}::8020 from shashank sending #2 {code} > Unable to read hdfs file system directory(which contains sub directories) > recursively > --- > > Key: FLINK-8182 > URL: https://issues.apache.org/jira/browse/FLINK-8182 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.3.2 >Reporter: Shashank Agarwal > > Unable to read hdfs file system directory(which contains subdirectories) > recursively, It works fine when a single directory contains only files but > when the directory contains subdirectories it dosesn't read subdirectory > files. > {code} > streamExecutionEnvironment.readTextFile("HDFS path") > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-8182) Unable to read hdfs file system directory(which contains sub directories) recursively
[ https://issues.apache.org/jira/browse/FLINK-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274195#comment-16274195 ] Shashank Agarwal commented on FLINK-8182: - are you using : import org.apache.flink.api.java.io.TextInputFormat import org.apache.flink.core.fs.Path > Unable to read hdfs file system directory(which contains sub directories) > recursively > --- > > Key: FLINK-8182 > URL: https://issues.apache.org/jira/browse/FLINK-8182 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.3.2 >Reporter: Shashank Agarwal > > Unable to read hdfs file system directory(which contains subdirectories) > recursively, It works fine when a single directory contains only files but > when the directory contains subdirectories it dosesn't read subdirectory > files. > {code} > streamExecutionEnvironment.readTextFile("HDFS path") > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-8182) Unable to read hdfs file system directory(which contains sub directories) recursively
Shashank Agarwal created FLINK-8182: --- Summary: Unable to read hdfs file system directory(which contains sub directories) recursively Key: FLINK-8182 URL: https://issues.apache.org/jira/browse/FLINK-8182 Project: Flink Issue Type: Improvement Components: Streaming Affects Versions: 1.3.2 Reporter: Shashank Agarwal Unable to read hdfs file system directory(which contains subdirectories) recursively, It works fine when a single directory contains only files but when the directory contains subdirectories it dosesn't read subdirectory files. {code} streamExecutionEnvironment.readTextFile("HDFS path") {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (FLINK-8098) LeaseExpiredException when using FsStateBackend for checkpointing due to multiple mappers tries to access the same file.
[ https://issues.apache.org/jira/browse/FLINK-8098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Agarwal closed FLINK-8098. --- Resolution: Not A Problem > LeaseExpiredException when using FsStateBackend for checkpointing due to > multiple mappers tries to access the same file. > > > Key: FLINK-8098 > URL: https://issues.apache.org/jira/browse/FLINK-8098 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Affects Versions: 1.3.2 > Environment: Yarn, HDFS 2.7.3, Kafka, scala streaming API, CEP >Reporter: Shashank Agarwal > > I am running streaming application with parallelism 6. I have enabled > checkpointing(1000). But application gets the crash after 1-2 days. After > analysing logs i found following trace. > {code} > 2017-11-17 11:19:06,696 WARN > org.apache.flink.streaming.runtime.tasks.StreamTask - Could not > properly clean up the async checkpoint runnable. > java.lang.Exception: Could not properly cancel managed keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.close(StreamTask.java:983) > at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:262) > at org.apache.flink.util.IOUtils.closeAllQuietly(IOUtils.java:251) > at > org.apache.flink.util.AbstractCloseableRegistry.close(AbstractCloseableRegistry.java:97) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.cancel(StreamTask.java:355) > at > org.apache.flink.runtime.taskmanager.Task$TaskCanceler.run(Task.java:1463) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.ExecutionException: java.io.IOException: > Could not flush and close the file system output stream to > hdfs://xyz.com:8020/flink/sux/54944cea1f566ee801656e06cdeeabbc/chk-40191/cf145018-0599-4281-b254-96600a4e4965 > in order to obtain the stream state handle > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:85) > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:88) > ... 8 more > Caused by: java.io.IOException: Could not flush and close the file system > output stream to > hdfs://xyz.com:8020/flink/sux/54944cea1f566ee801656e06cdeeabbc/chk-40191/cf145018-0599-4281-b254-96600a4e4965 > in order to obtain the stream state handle > at > org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:336) > at > org.apache.flink.runtime.checkpoint.AbstractAsyncSnapshotIOCallable.closeStreamAndGetStateHandle(AbstractAsyncSnapshotIOCallable.java:100) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$1.performOperation(HeapKeyedStateBackend.java:351) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$1.performOperation(HeapKeyedStateBackend.java:329) > at > org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ... 1 more > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease > flink/sux/54944cea1f566ee801656e06cdeeabbc/chk-40191/cf145018-0599-4281-b254-96600a4e4965 > (inode 812148671): File does not exist. [Lease. Holder: > DFSClient_NONMAPREDUCE_1721510813_94, pendingcreates: 161] > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3659) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3749) > at > or
[jira] [Commented] (FLINK-8098) LeaseExpiredException when using FsStateBackend for checkpointing due to multiple mappers tries to access the same file.
[ https://issues.apache.org/jira/browse/FLINK-8098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16264222#comment-16264222 ] Shashank Agarwal commented on FLINK-8098: - yes, agree with the scenario. About file cleanup, i have checked manually file was there. So maybe this is ongoing checkpoint issue. We can close this now. If I find again I'll reopen with more info. > LeaseExpiredException when using FsStateBackend for checkpointing due to > multiple mappers tries to access the same file. > > > Key: FLINK-8098 > URL: https://issues.apache.org/jira/browse/FLINK-8098 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Affects Versions: 1.3.2 > Environment: Yarn, HDFS 2.7.3, Kafka, scala streaming API, CEP >Reporter: Shashank Agarwal > > I am running streaming application with parallelism 6. I have enabled > checkpointing(1000). But application gets the crash after 1-2 days. After > analysing logs i found following trace. > {code} > 2017-11-17 11:19:06,696 WARN > org.apache.flink.streaming.runtime.tasks.StreamTask - Could not > properly clean up the async checkpoint runnable. > java.lang.Exception: Could not properly cancel managed keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.close(StreamTask.java:983) > at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:262) > at org.apache.flink.util.IOUtils.closeAllQuietly(IOUtils.java:251) > at > org.apache.flink.util.AbstractCloseableRegistry.close(AbstractCloseableRegistry.java:97) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.cancel(StreamTask.java:355) > at > org.apache.flink.runtime.taskmanager.Task$TaskCanceler.run(Task.java:1463) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.ExecutionException: java.io.IOException: > Could not flush and close the file system output stream to > hdfs://xyz.com:8020/flink/sux/54944cea1f566ee801656e06cdeeabbc/chk-40191/cf145018-0599-4281-b254-96600a4e4965 > in order to obtain the stream state handle > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:85) > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:88) > ... 8 more > Caused by: java.io.IOException: Could not flush and close the file system > output stream to > hdfs://xyz.com:8020/flink/sux/54944cea1f566ee801656e06cdeeabbc/chk-40191/cf145018-0599-4281-b254-96600a4e4965 > in order to obtain the stream state handle > at > org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:336) > at > org.apache.flink.runtime.checkpoint.AbstractAsyncSnapshotIOCallable.closeStreamAndGetStateHandle(AbstractAsyncSnapshotIOCallable.java:100) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$1.performOperation(HeapKeyedStateBackend.java:351) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$1.performOperation(HeapKeyedStateBackend.java:329) > at > org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ... 1 more > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease > flink/sux/54944cea1f566ee801656e06cdeeabbc/chk-40191/cf145018-0599-4281-b254-96600a4e4965 > (inode 812148671): File does not exist. [Lease. Holder: > DFSClient_NONMAPREDUCE_1721510813_94, pendingcreates:
[jira] [Commented] (FLINK-8098) LeaseExpiredException when using FsStateBackend for checkpointing due to multiple mappers tries to access the same file.
[ https://issues.apache.org/jira/browse/FLINK-8098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16264148#comment-16264148 ] Shashank Agarwal commented on FLINK-8098: - I got the main issue after another crash. I had another HDFS exception where I was trying to create a path more than 255 character limit. So why the application was crashing. During cancel tasks and restart again I got this logs. But I am sure I am not running any other scripts to read or write that files in checkpointing folder. About parallelism, i thought multiple threads trying to write in the same file during checkpointing. But as I checked different threads and operators are handling different files so that should not be the issue. On Wed, Nov 22, 2017 at 4:49 PM, Stefan Richter (JIRA) > LeaseExpiredException when using FsStateBackend for checkpointing due to > multiple mappers tries to access the same file. > > > Key: FLINK-8098 > URL: https://issues.apache.org/jira/browse/FLINK-8098 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Affects Versions: 1.3.2 > Environment: Yarn, HDFS 2.7.3, Kafka, scala streaming API, CEP >Reporter: Shashank Agarwal > > I am running streaming application with parallelism 6. I have enabled > checkpointing(1000). But application gets the crash after 1-2 days. After > analysing logs i found following trace. > {code} > 2017-11-17 11:19:06,696 WARN > org.apache.flink.streaming.runtime.tasks.StreamTask - Could not > properly clean up the async checkpoint runnable. > java.lang.Exception: Could not properly cancel managed keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.close(StreamTask.java:983) > at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:262) > at org.apache.flink.util.IOUtils.closeAllQuietly(IOUtils.java:251) > at > org.apache.flink.util.AbstractCloseableRegistry.close(AbstractCloseableRegistry.java:97) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.cancel(StreamTask.java:355) > at > org.apache.flink.runtime.taskmanager.Task$TaskCanceler.run(Task.java:1463) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.ExecutionException: java.io.IOException: > Could not flush and close the file system output stream to > hdfs://xyz.com:8020/flink/sux/54944cea1f566ee801656e06cdeeabbc/chk-40191/cf145018-0599-4281-b254-96600a4e4965 > in order to obtain the stream state handle > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:85) > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:88) > ... 8 more > Caused by: java.io.IOException: Could not flush and close the file system > output stream to > hdfs://xyz.com:8020/flink/sux/54944cea1f566ee801656e06cdeeabbc/chk-40191/cf145018-0599-4281-b254-96600a4e4965 > in order to obtain the stream state handle > at > org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:336) > at > org.apache.flink.runtime.checkpoint.AbstractAsyncSnapshotIOCallable.closeStreamAndGetStateHandle(AbstractAsyncSnapshotIOCallable.java:100) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$1.performOperation(HeapKeyedStateBackend.java:351) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$1.performOperation(HeapKeyedStateBackend.java:329) > at > org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolE
[jira] [Commented] (FLINK-8098) LeaseExpiredException when using FsStateBackend for checkpointing due to multiple mappers tries to access the same file.
[ https://issues.apache.org/jira/browse/FLINK-8098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259659#comment-16259659 ] Shashank Agarwal commented on FLINK-8098: - Actually, we are not running any script on that checkpointing folder. Only that flink job have access to that. Maybe this is some race condition during parallelism of checkpointing. > LeaseExpiredException when using FsStateBackend for checkpointing due to > multiple mappers tries to access the same file. > > > Key: FLINK-8098 > URL: https://issues.apache.org/jira/browse/FLINK-8098 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Affects Versions: 1.3.2 > Environment: Yarn, HDFS 2.7.3, Kafka, scala streaming API, CEP >Reporter: Shashank Agarwal > > I am running streaming application with parallelism 6. I have enabled > checkpointing(1000). But application gets the crash after 1-2 days. After > analysing logs i found following trace. > {code} > 2017-11-17 11:19:06,696 WARN > org.apache.flink.streaming.runtime.tasks.StreamTask - Could not > properly clean up the async checkpoint runnable. > java.lang.Exception: Could not properly cancel managed keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.close(StreamTask.java:983) > at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:262) > at org.apache.flink.util.IOUtils.closeAllQuietly(IOUtils.java:251) > at > org.apache.flink.util.AbstractCloseableRegistry.close(AbstractCloseableRegistry.java:97) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.cancel(StreamTask.java:355) > at > org.apache.flink.runtime.taskmanager.Task$TaskCanceler.run(Task.java:1463) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.ExecutionException: java.io.IOException: > Could not flush and close the file system output stream to > hdfs://xyz.com:8020/flink/sux/54944cea1f566ee801656e06cdeeabbc/chk-40191/cf145018-0599-4281-b254-96600a4e4965 > in order to obtain the stream state handle > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:85) > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:88) > ... 8 more > Caused by: java.io.IOException: Could not flush and close the file system > output stream to > hdfs://xyz.com:8020/flink/sux/54944cea1f566ee801656e06cdeeabbc/chk-40191/cf145018-0599-4281-b254-96600a4e4965 > in order to obtain the stream state handle > at > org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:336) > at > org.apache.flink.runtime.checkpoint.AbstractAsyncSnapshotIOCallable.closeStreamAndGetStateHandle(AbstractAsyncSnapshotIOCallable.java:100) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$1.performOperation(HeapKeyedStateBackend.java:351) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$1.performOperation(HeapKeyedStateBackend.java:329) > at > org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ... 1 more > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease > flink/sux/54944cea1f566ee801656e06cdeeabbc/chk-40191/cf145018-0599-4281-b254-96600a4e4965 > (inode 812148671): File does not exist. [Lease. Holder: > DFSClient_NONMAPREDUCE_1721510813_94, pendingcreates: 161] > at >
[jira] [Updated] (FLINK-8098) LeaseExpiredException when using FsStateBackend for checkpointing due to multiple mappers tries to access the same file.
[ https://issues.apache.org/jira/browse/FLINK-8098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Agarwal updated FLINK-8098: Fix Version/s: 1.4.0 > LeaseExpiredException when using FsStateBackend for checkpointing due to > multiple mappers tries to access the same file. > > > Key: FLINK-8098 > URL: https://issues.apache.org/jira/browse/FLINK-8098 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Affects Versions: 1.3.2 > Environment: Yarn, HDFS 2.7.3, Kafka, scala streaming API, CEP >Reporter: Shashank Agarwal > Fix For: 1.4.0 > > > I am running streaming application with parallelism 6. I have enabled > checkpointing(1000). But application gets the crash after 1-2 days. After > analysing logs i found following trace. > {code} > 2017-11-17 11:19:06,696 WARN > org.apache.flink.streaming.runtime.tasks.StreamTask - Could not > properly clean up the async checkpoint runnable. > java.lang.Exception: Could not properly cancel managed keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.close(StreamTask.java:983) > at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:262) > at org.apache.flink.util.IOUtils.closeAllQuietly(IOUtils.java:251) > at > org.apache.flink.util.AbstractCloseableRegistry.close(AbstractCloseableRegistry.java:97) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.cancel(StreamTask.java:355) > at > org.apache.flink.runtime.taskmanager.Task$TaskCanceler.run(Task.java:1463) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.ExecutionException: java.io.IOException: > Could not flush and close the file system output stream to > hdfs://xyz.com:8020/flink/sux/54944cea1f566ee801656e06cdeeabbc/chk-40191/cf145018-0599-4281-b254-96600a4e4965 > in order to obtain the stream state handle > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:85) > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:88) > ... 8 more > Caused by: java.io.IOException: Could not flush and close the file system > output stream to > hdfs://xyz.com:8020/flink/sux/54944cea1f566ee801656e06cdeeabbc/chk-40191/cf145018-0599-4281-b254-96600a4e4965 > in order to obtain the stream state handle > at > org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:336) > at > org.apache.flink.runtime.checkpoint.AbstractAsyncSnapshotIOCallable.closeStreamAndGetStateHandle(AbstractAsyncSnapshotIOCallable.java:100) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$1.performOperation(HeapKeyedStateBackend.java:351) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$1.performOperation(HeapKeyedStateBackend.java:329) > at > org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ... 1 more > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease > flink/sux/54944cea1f566ee801656e06cdeeabbc/chk-40191/cf145018-0599-4281-b254-96600a4e4965 > (inode 812148671): File does not exist. [Lease. Holder: > DFSClient_NONMAPREDUCE_1721510813_94, pendingcreates: 161] > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3659) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem
[jira] [Created] (FLINK-8098) LeaseExpiredException when using FsStateBackend for checkpointing due to multiple mappers tries to access the same file.
Shashank Agarwal created FLINK-8098: --- Summary: LeaseExpiredException when using FsStateBackend for checkpointing due to multiple mappers tries to access the same file. Key: FLINK-8098 URL: https://issues.apache.org/jira/browse/FLINK-8098 Project: Flink Issue Type: Bug Components: State Backends, Checkpointing Affects Versions: 1.3.2 Environment: Yarn, HDFS 2.7.3, Kafka, scala streaming API, CEP Reporter: Shashank Agarwal I am running streaming application with parallelism 6. I have enabled checkpointing(1000). But application gets the crash after 1-2 days. After analysing logs i found following trace. {code} 2017-11-17 11:19:06,696 WARN org.apache.flink.streaming.runtime.tasks.StreamTask - Could not properly clean up the async checkpoint runnable. java.lang.Exception: Could not properly cancel managed keyed state future. at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.close(StreamTask.java:983) at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:262) at org.apache.flink.util.IOUtils.closeAllQuietly(IOUtils.java:251) at org.apache.flink.util.AbstractCloseableRegistry.close(AbstractCloseableRegistry.java:97) at org.apache.flink.streaming.runtime.tasks.StreamTask.cancel(StreamTask.java:355) at org.apache.flink.runtime.taskmanager.Task$TaskCanceler.run(Task.java:1463) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Could not flush and close the file system output stream to hdfs://xyz.com:8020/flink/sux/54944cea1f566ee801656e06cdeeabbc/chk-40191/cf145018-0599-4281-b254-96600a4e4965 in order to obtain the stream state handle at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) at org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:85) at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:88) ... 8 more Caused by: java.io.IOException: Could not flush and close the file system output stream to hdfs://xyz.com:8020/flink/sux/54944cea1f566ee801656e06cdeeabbc/chk-40191/cf145018-0599-4281-b254-96600a4e4965 in order to obtain the stream state handle at org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:336) at org.apache.flink.runtime.checkpoint.AbstractAsyncSnapshotIOCallable.closeStreamAndGetStateHandle(AbstractAsyncSnapshotIOCallable.java:100) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$1.performOperation(HeapKeyedStateBackend.java:351) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$1.performOperation(HeapKeyedStateBackend.java:329) at org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ... 1 more Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease flink/sux/54944cea1f566ee801656e06cdeeabbc/chk-40191/cf145018-0599-4281-b254-96600a4e4965 (inode 812148671): File does not exist. [Lease. Holder: DFSClient_NONMAPREDUCE_1721510813_94, pendingcreates: 161] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3659) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3749) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3716) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:911) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:547)
[jira] [Created] (FLINK-7953) Kafka consumer printing error - java.lang.IllegalArgumentException: Invalid offset: -915623761772
Shashank Agarwal created FLINK-7953: --- Summary: Kafka consumer printing error - java.lang.IllegalArgumentException: Invalid offset: -915623761772 Key: FLINK-7953 URL: https://issues.apache.org/jira/browse/FLINK-7953 Project: Flink Issue Type: Bug Components: Kafka Connector Affects Versions: 1.3.2 Environment: kafka 10.0, yarn Reporter: Shashank Agarwal Priority: Minor As it's printing as Warning and not impacting running program so marked it minor. {code} 2017-10-31 19:26:09,218 WARN org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher - Committing offsets to Kafka failed. This does not compromise Flink's checkpoints. java.lang.IllegalArgumentException: Invalid offset: -915623761772 at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:687) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.doCommitOffsetsAsync(ConsumerCoordinator.java:531) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsAsync(ConsumerCoordinator.java:499) at org.apache.kafka.clients.consumer.KafkaConsumer.commitAsync(KafkaConsumer.java:1181) at org.apache.flink.streaming.connectors.kafka.internal.KafkaConsumerThread.run(KafkaConsumerThread.java:223) 2017-10-31 19:26:09,223 ERROR org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase - Async Kafka commit failed. java.lang.IllegalArgumentException: Invalid offset: -915623761772 at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:687) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.doCommitOffsetsAsync(ConsumerCoordinator.java:531) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsAsync(ConsumerCoordinator.java:499) at org.apache.kafka.clients.consumer.KafkaConsumer.commitAsync(KafkaConsumer.java:1181) at org.apache.flink.streaming.connectors.kafka.internal.KafkaConsumerThread.run(KafkaConsumerThread.java:223) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7830) Problematic interaction of CEP and asynchronous snapshots
[ https://issues.apache.org/jira/browse/FLINK-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202187#comment-16202187 ] Shashank Agarwal commented on FLINK-7830: - [~aljoscha] I think you have covered all reported by me. > Problematic interaction of CEP and asynchronous snapshots > - > > Key: FLINK-7830 > URL: https://issues.apache.org/jira/browse/FLINK-7830 > Project: Flink > Issue Type: Bug > Components: CEP, State Backends, Checkpointing >Reporter: Aljoscha Krettek > Fix For: 1.4.0 > > > Just so we collect all the (possibly duplicate) issue reports. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7760) Restore failing from external checkpointing metadata.
[ https://issues.apache.org/jira/browse/FLINK-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202173#comment-16202173 ] Shashank Agarwal commented on FLINK-7760: - [~kkl0u] Yes that is redundent. Actually i am using this stream in some other places also. Is this a problem ?? cause logically it shouldn't ?? [~aljoscha] Restart strategy also tried automatically and failed. I also tried using same jar and failed again and again. I have tried different jar also but failed. > Restore failing from external checkpointing metadata. > - > > Key: FLINK-7760 > URL: https://issues.apache.org/jira/browse/FLINK-7760 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing >Affects Versions: 1.3.2 > Environment: Yarn, Flink 1.3.2, HDFS, FsStateBackend >Reporter: Shashank Agarwal >Priority: Blocker > Fix For: 1.4.0 > > > My job failed due to failure of cassandra. I have enabled > ExternalizedCheckpoints. But when job tried to restore from that checkpoint > it's failing continuously with following error. > {code:java} > 2017-10-04 09:39:20,611 INFO org.apache.flink.runtime.taskmanager.Task > - KeyedCEPPatternOperator -> Map (1/2) > (8ff7913f820ead571c8b54ccc6b16045) switched from RUNNING to FAILED. > java.lang.IllegalStateException: Could not initialize keyed state backend. > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:321) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:217) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:676) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:663) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.StreamCorruptedException: invalid type code: 00 > at > java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2519) > at > java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2553) > at > java.io.ObjectInputStream$BlockDataInputStream.skipBlockData(ObjectInputStream.java:2455) > at java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1951) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) > at > org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeCondition(NFA.java:1211) > at > org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeStates(NFA.java:1169) > at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:957) > at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:852) > at > org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders$StateTableByKeyGroupReaderV2V3.readMappingsInKeyGroup(StateTableByKeyGroupReaders.java:132) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:518) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:397) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(StreamTask.java:772) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:311) > ... 6 more > {code} > I have tried to start new job also after failure with parameter {code:java} > -s [checkpoint meta data path]{code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeyedCEP.
[ https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202162#comment-16202162 ] Shashank Agarwal commented on FLINK-6321: - [~kkl0u] I am using 4-5 CEP's and around 20-22 map, flat-map and co-flatmap in my code. Which have simple conditions like. {code} val successOrderPattern = Pattern.begin[RawSignal]("event1").where(_._type.getOrElse(null).toInt == 1) .followedBy("event2").where(signal => (signal._type.getOrElse("0").toInt == 2 && "_creditCard".equalsIgnoreCase(signal._paymentType.getOrElse(null val successOrderPatternStream = CEP.pattern(stream.keyBy((x) => (x._someKey1.getOrElse(0), x._someSubKey2.getOrElse(0))), successOrderPattern) val ordersStream: DataStream[OrderSignal] = successOrderPatternStream.select(new TransPatternFlatMap).uid("order CEP") {code} > RocksDB state backend Checkpointing is not working with KeyedCEP. > - > > Key: FLINK-6321 > URL: https://issues.apache.org/jira/browse/FLINK-6321 > Project: Flink > Issue Type: Sub-task > Components: CEP >Affects Versions: 1.2.0 > Environment: yarn-cluster, RocksDB State backend, Checkpointing every > 1000 ms >Reporter: Shashank Agarwal >Assignee: Kostas Kloudas >Priority: Blocker > Fix For: 1.4.0 > > > Checkpointing is not working with RocksDBStateBackend when using CEP. It's > working fine with FsStateBackend and MemoryStateBackend. Application failing > every-time. > {code} > 04/18/2017 21:53:20 Job execution switched to status FAILING. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint > 46 for operator KeyedCEPPatternOperator -> Map (1/4).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 46 for > operator KeyedCEPPatternOperator -> Map (1/4). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915) > ... 5 more > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeyedCEP.
[ https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202035#comment-16202035 ] Shashank Agarwal commented on FLINK-6321: - I will try to run that in next week. > RocksDB state backend Checkpointing is not working with KeyedCEP. > - > > Key: FLINK-6321 > URL: https://issues.apache.org/jira/browse/FLINK-6321 > Project: Flink > Issue Type: Bug > Components: CEP >Affects Versions: 1.2.0 > Environment: yarn-cluster, RocksDB State backend, Checkpointing every > 1000 ms >Reporter: Shashank Agarwal >Assignee: Kostas Kloudas >Priority: Blocker > Fix For: 1.4.0 > > > Checkpointing is not working with RocksDBStateBackend when using CEP. It's > working fine with FsStateBackend and MemoryStateBackend. Application failing > every-time. > {code} > 04/18/2017 21:53:20 Job execution switched to status FAILING. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint > 46 for operator KeyedCEPPatternOperator -> Map (1/4).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 46 for > operator KeyedCEPPatternOperator -> Map (1/4). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915) > ... 5 more > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7760) Restore failing from external checkpointing metadata.
[ https://issues.apache.org/jira/browse/FLINK-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202018#comment-16202018 ] Shashank Agarwal commented on FLINK-7760: - Ho [~kkl0u], Hi [~kkl0u] , Actually it's too complicated with kafka streams and custom serializer. Above steps are correct but still I try to put some code. I have modified parameter names and some things in code. If you find any issue let me know. {code} object Job { def main(args: Array[String]) { // set up the execution environment val env = StreamExecutionEnvironment.getExecutionEnvironment val propertiesFile = getClass.getClassLoader.getResource("xyz.properties").getPath val parameter = ParameterTool.fromPropertiesFile(propertiesFile) env.getConfig.setGlobalJobParameters(parameter) env.setStateBackend(new FsStateBackend(parameter.get("hdfsSnapshotPath"))) // enable fault-tolerance env.enableCheckpointing(1000) env.getCheckpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE) // enable restarts env.setRestartStrategy(RestartStrategies.fixedDelayRestart(50, 500L)) val properties = new Properties() properties.setProperty("bootstrap.servers", parameter.get("kafkaUrl")) properties.setProperty("group.id", parameter.get("kafkaGroupId")) val kafka10 = new FlinkKafkaConsumer010[RawSignal](parameter.get("kafkaBundleName"), new SignalDeserializationSchema(), properties) val stream = env.addSource(kafka10).keyBy(_._someKey.getOrElse(0)) //Creating a pattern for successful event val successOrderPattern = Pattern.begin[RawSignal]("someEvent"). .followedBy("otherEvent") val successOrderPatternStream = CEP.pattern(stream.keyBy((x) => (x._someKey.getOrElse(0), x._someSubKey.getOrElse(0))), successOrderPattern) val ordersStream: DataStream[TransactionSignal] = successOrderPatternStream.select(new TransactionPatternFlatMap) //Put Ip count in the stream with maintaining the state val ipStateStream = ordersStream.keyBy((x) => (x._someKey, x._deviceIp)) .mapWithState((in: OrderSignal, ipState: Option[Int]) => { if(!in._deviceIp.equalsIgnoreCase(parameter.get("defaultIp"))) { val newCount = ipState.getOrElse(0) + 1 val output = in.copy(_numOfOrderSameIp = newCount) (output, Some(newCount)) } else { (in, Some(0)) } } ) ipStateStream.print env.execute("Thirdwatch Mitra") {code} Here is the kafka deserialiser i am using SignalDeserializationSchema {code} import RawSignal import org.apache.flink.streaming.util.serialization.AbstractDeserializationSchema import org.json4s.DefaultFormats import org.json4s.jackson.JsonMethods.parse /** * Created by shashank on 13/01/17. * * Deserialize raw json string from kafka to Raw signal object. */ class SignalDeserializationSchema extends AbstractDeserializationSchema[RawSignal] { implicit lazy val formats = DefaultFormats override def deserialize(message: Array[Byte]): RawSignal = { parse(new String(message)).extract[RawSignal] } override def isEndOfStream(nextElement: RawSignal): Boolean = false } {code} and RawSignal Example class... {code} case class RawSignal(name: Option[String], email: Option[String], UserId: Option[String]) {code} > Restore failing from external checkpointing metadata. > - > > Key: FLINK-7760 > URL: https://issues.apache.org/jira/browse/FLINK-7760 > Project: Flink > Issue Type: Bug > Components: CEP, State Backends, Checkpointing >Affects Versions: 1.3.2 > Environment: Yarn, Flink 1.3.2, HDFS, FsStateBackend >Reporter: Shashank Agarwal >Priority: Blocker > Fix For: 1.4.0 > > > My job failed due to failure of cassandra. I have enabled > ExternalizedCheckpoints. But when job tried to restore from that checkpoint > it's failing continuously with following error. > {code:java} > 2017-10-04 09:39:20,611 INFO org.apache.flink.runtime.taskmanager.Task > - KeyedCEPPatternOperator -> Map (1/2) > (8ff7913f820ead571c8b54ccc6b16045) switched from RUNNING to FAILED. > java.lang.IllegalStateException: Could not initialize keyed state backend. > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:321) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:217) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:676) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:663) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) > at java.lang.Thread.run(Thread.java:745) >
[jira] [Comment Edited] (FLINK-7760) Restore failing from external checkpointing metadata.
[ https://issues.apache.org/jira/browse/FLINK-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202018#comment-16202018 ] Shashank Agarwal edited comment on FLINK-7760 at 10/12/17 2:31 PM: --- Ho [~kkl0u], Actually it's too complicated with kafka streams and custom serializer. Above steps are correct but still I try to put some code. I have modified parameter names and some things in code. If you find any issue let me know. {code} object Job { def main(args: Array[String]) { // set up the execution environment val env = StreamExecutionEnvironment.getExecutionEnvironment val propertiesFile = getClass.getClassLoader.getResource("xyz.properties").getPath val parameter = ParameterTool.fromPropertiesFile(propertiesFile) env.getConfig.setGlobalJobParameters(parameter) env.setStateBackend(new FsStateBackend(parameter.get("hdfsSnapshotPath"))) // enable fault-tolerance env.enableCheckpointing(1000) env.getCheckpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE) // enable restarts env.setRestartStrategy(RestartStrategies.fixedDelayRestart(50, 500L)) val properties = new Properties() properties.setProperty("bootstrap.servers", parameter.get("kafkaUrl")) properties.setProperty("group.id", parameter.get("kafkaGroupId")) val kafka10 = new FlinkKafkaConsumer010[RawSignal](parameter.get("kafkaBundleName"), new SignalDeserializationSchema(), properties) val stream = env.addSource(kafka10).keyBy(_._someKey.getOrElse(0)) //Creating a pattern for successful event val successOrderPattern = Pattern.begin[RawSignal]("someEvent"). .followedBy("otherEvent") val successOrderPatternStream = CEP.pattern(stream.keyBy((x) => (x._someKey.getOrElse(0), x._someSubKey.getOrElse(0))), successOrderPattern) val ordersStream: DataStream[TransactionSignal] = successOrderPatternStream.select(new TransactionPatternFlatMap) //Put Ip count in the stream with maintaining the state val ipStateStream = ordersStream.keyBy((x) => (x._someKey, x._deviceIp)) .mapWithState((in: OrderSignal, ipState: Option[Int]) => { if(!in._deviceIp.equalsIgnoreCase(parameter.get("defaultIp"))) { val newCount = ipState.getOrElse(0) + 1 val output = in.copy(_numOfOrderSameIp = newCount) (output, Some(newCount)) } else { (in, Some(0)) } } ) ipStateStream.print env.execute("Thirdwatch Mitra") {code} Here is the kafka deserialiser i am using SignalDeserializationSchema {code} import RawSignal import org.apache.flink.streaming.util.serialization.AbstractDeserializationSchema import org.json4s.DefaultFormats import org.json4s.jackson.JsonMethods.parse /** * Created by shashank on 13/01/17. * * Deserialize raw json string from kafka to Raw signal object. */ class SignalDeserializationSchema extends AbstractDeserializationSchema[RawSignal] { implicit lazy val formats = DefaultFormats override def deserialize(message: Array[Byte]): RawSignal = { parse(new String(message)).extract[RawSignal] } override def isEndOfStream(nextElement: RawSignal): Boolean = false } {code} and RawSignal Example class... {code} case class RawSignal(name: Option[String], email: Option[String], UserId: Option[String]) {code} was (Author: shashank734): Ho [~kkl0u], Hi [~kkl0u] , Actually it's too complicated with kafka streams and custom serializer. Above steps are correct but still I try to put some code. I have modified parameter names and some things in code. If you find any issue let me know. {code} object Job { def main(args: Array[String]) { // set up the execution environment val env = StreamExecutionEnvironment.getExecutionEnvironment val propertiesFile = getClass.getClassLoader.getResource("xyz.properties").getPath val parameter = ParameterTool.fromPropertiesFile(propertiesFile) env.getConfig.setGlobalJobParameters(parameter) env.setStateBackend(new FsStateBackend(parameter.get("hdfsSnapshotPath"))) // enable fault-tolerance env.enableCheckpointing(1000) env.getCheckpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE) // enable restarts env.setRestartStrategy(RestartStrategies.fixedDelayRestart(50, 500L)) val properties = new Properties() properties.setProperty("bootstrap.servers", parameter.get("kafkaUrl")) properties.setProperty("group.id", parameter.get("kafkaGroupId")) val kafka10 = new FlinkKafkaConsumer010[RawSignal](parameter.get("kafkaBundleName"), new SignalDeserializationSchema(), properties) val stream = env.addSource(kafka10).keyBy(_._someKey.getOrElse(0)) //Creating a pattern for successful event val successOrderPattern = Pattern.begin[RawSignal]("someEvent"). .followedBy("otherEvent") val successOrderPatternStream = CEP.pattern(stream.keyBy((x) => (x._someKey.getOrElse(0), x._someSubKey.getOrElse(0))), successOrderPattern) val ordersStream: DataStream[TransactionSignal] = successOrderPa
[jira] [Commented] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeyedCEP.
[ https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201999#comment-16201999 ] Shashank Agarwal commented on FLINK-6321: - Hi, [~kkl0u] , My program is in Scala, But before sinking to cassandra i am converting stream to java stream. There are some java collection objects used. I am using FS backend for that. > RocksDB state backend Checkpointing is not working with KeyedCEP. > - > > Key: FLINK-6321 > URL: https://issues.apache.org/jira/browse/FLINK-6321 > Project: Flink > Issue Type: Bug > Components: CEP >Affects Versions: 1.2.0 > Environment: yarn-cluster, RocksDB State backend, Checkpointing every > 1000 ms >Reporter: Shashank Agarwal >Assignee: Kostas Kloudas >Priority: Blocker > Fix For: 1.4.0 > > > Checkpointing is not working with RocksDBStateBackend when using CEP. It's > working fine with FsStateBackend and MemoryStateBackend. Application failing > every-time. > {code} > 04/18/2017 21:53:20 Job execution switched to status FAILING. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint > 46 for operator KeyedCEPPatternOperator -> Map (1/4).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 46 for > operator KeyedCEPPatternOperator -> Map (1/4). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915) > ... 5 more > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7756) RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.
[ https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201992#comment-16201992 ] Shashank Agarwal commented on FLINK-7756: - Hi [~kkl0u] I have checked all the logs. These are the only error warn logs getting printed. It's printing same logs for all the operators. I also thought may be there's some other reason. But these are the only logs. > RocksDB state backend Checkpointing (Async and Incremental) is not working > with CEP. > - > > Key: FLINK-7756 > URL: https://issues.apache.org/jira/browse/FLINK-7756 > Project: Flink > Issue Type: Bug > Components: CEP, State Backends, Checkpointing, Streaming >Affects Versions: 1.3.2 > Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend >Reporter: Shashank Agarwal > > When i try to use RocksDBStateBackend on my staging cluster (which is using > HDFS as file system) it crashes. But When i use FsStateBackend on staging > (which is using HDFS as file system) it is working fine. > On local with local file system it's working fine in both cases. > Please check attached logs. I have around 20-25 tasks in my app. > {code:java} > 2017-09-29 14:21:31,639 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=0). > 2017-09-29 14:21:31,640 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,020 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=1). > 2017-09-29 14:21:32,022 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,078 INFO com.datastax.driver.core.NettyUtil > - Found Netty's native epoll transport in the classpath, using > it > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (1/2) > (b879f192c4e8aae6671cdafb3a24c00a). > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Map (2/2) > (1ea5aef6ccc7031edc6b37da2912d90b). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (2/2) > (4bac8e764c67520d418a4c755be23d4d). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched > from RUNNING to FAILED. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 > for operator Co-Flat Map (1/2).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 2 for > operator Co-Flat Map (1/2). > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) > ... 5 more > Suppressed: java.lang.Exception: Could not properly cancel managed > keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:961) > ... 5 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43)
[jira] [Commented] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeyedCEP.
[ https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199946#comment-16199946 ] Shashank Agarwal commented on FLINK-6321: - It's coming in 1.3.2 also. I have used Keyedcep with 2 events. I have passed around 200k first event, than i passed around 150k second event. Than i have cancelled the job with savepoint. But whenever i try to restore from savepoint, It's giving following exception: java.lang.IllegalStateException: Could not initialize keyed state backend. at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:321) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:217) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:676) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:663) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.StreamCorruptedException: invalid type code: 00 at java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2519) at java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2553) at java.io.ObjectInputStream$BlockDataInputStream.skipBlockData(ObjectInputStream.java:2455) at java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1951) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeCondition(NFA.java:1211) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeStates(NFA.java:1169) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:957) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:852) at org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders$StateTableByKeyGroupReaderV2V3.readMappingsInKeyGroup(StateTableByKeyGroupReaders.java:132) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:518) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:397) at org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(StreamTask.java:772) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:311) ... 6 more > RocksDB state backend Checkpointing is not working with KeyedCEP. > - > > Key: FLINK-6321 > URL: https://issues.apache.org/jira/browse/FLINK-6321 > Project: Flink > Issue Type: Bug > Components: CEP >Affects Versions: 1.2.0 > Environment: yarn-cluster, RocksDB State backend, Checkpointing every > 1000 ms >Reporter: Shashank Agarwal >Assignee: Kostas Kloudas >Priority: Critical > Fix For: 1.4.0 > > > Checkpointing is not working with RocksDBStateBackend when using CEP. It's > working fine with FsStateBackend and MemoryStateBackend. Application failing > every-time. > {code} > 04/18/2017 21:53:20 Job execution switched to status FAILING. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint > 46 for operator KeyedCEPPatternOperator -> Map (1/4).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 46 for > operator KeyedCEPPatternOperator -> Map (1/4). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDon
[jira] [Comment Edited] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeyedCEP.
[ https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199946#comment-16199946 ] Shashank Agarwal edited comment on FLINK-6321 at 10/11/17 8:14 AM: --- It's coming in 1.3.2 also. I have used Keyedcep with 2 events. I have passed around 200k first event, than i passed around 150k second event. Than i have cancelled the job with savepoint. But whenever i try to restore from savepoint, It's giving following exception: {code:java} java.lang.IllegalStateException: Could not initialize keyed state backend. at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:321) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:217) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:676) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:663) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.StreamCorruptedException: invalid type code: 00 at java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2519) at java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2553) at java.io.ObjectInputStream$BlockDataInputStream.skipBlockData(ObjectInputStream.java:2455) at java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1951) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeCondition(NFA.java:1211) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeStates(NFA.java:1169) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:957) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:852) at org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders$StateTableByKeyGroupReaderV2V3.readMappingsInKeyGroup(StateTableByKeyGroupReaders.java:132) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:518) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:397) at org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(StreamTask.java:772) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:311) ... 6 more {code} was (Author: shashank734): It's coming in 1.3.2 also. I have used Keyedcep with 2 events. I have passed around 200k first event, than i passed around 150k second event. Than i have cancelled the job with savepoint. But whenever i try to restore from savepoint, It's giving following exception: java.lang.IllegalStateException: Could not initialize keyed state backend. at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:321) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:217) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:676) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:663) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.StreamCorruptedException: invalid type code: 00 at java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2519) at java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2553) at java.io.ObjectInputStream$BlockDataInputStream.skipBlockData(ObjectInputStream.java:2455) at java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1951) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) at java.io.ObjectInputStream.readObject0(ObjectInputSt
[jira] [Created] (FLINK-7760) Restore failing from external checkpointing metadata.
Shashank Agarwal created FLINK-7760: --- Summary: Restore failing from external checkpointing metadata. Key: FLINK-7760 URL: https://issues.apache.org/jira/browse/FLINK-7760 Project: Flink Issue Type: Bug Components: CEP, State Backends, Checkpointing Affects Versions: 1.3.2 Environment: Yarn, Flink 1.3.2, HDFS, FsStateBackend Reporter: Shashank Agarwal My job failed due to failure of cassandra. I have enabled ExternalizedCheckpoints. But when job tried to restore from that checkpoint it's failing continuously with following error. {code:java} 2017-10-04 09:39:20,611 INFO org.apache.flink.runtime.taskmanager.Task - KeyedCEPPatternOperator -> Map (1/2) (8ff7913f820ead571c8b54ccc6b16045) switched from RUNNING to FAILED. java.lang.IllegalStateException: Could not initialize keyed state backend. at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:321) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:217) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:676) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:663) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.StreamCorruptedException: invalid type code: 00 at java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2519) at java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2553) at java.io.ObjectInputStream$BlockDataInputStream.skipBlockData(ObjectInputStream.java:2455) at java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1951) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeCondition(NFA.java:1211) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeStates(NFA.java:1169) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:957) at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:852) at org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders$StateTableByKeyGroupReaderV2V3.readMappingsInKeyGroup(StateTableByKeyGroupReaders.java:132) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:518) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:397) at org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(StreamTask.java:772) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:311) ... 6 more {code} I have tried to start new job also after failure with parameter {code:java} -s [checkpoint meta data path]{code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7756) RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.
[ https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189913#comment-16189913 ] Shashank Agarwal commented on FLINK-7756: - [~StephanEwen] Actually my code base is very large with around 4-5 CEP's, 30-35 operators, 4-5 sinks. I have to check if i can create minimal code for reproducing that. It's working fine with local file system. I think some issue with HDFS. > RocksDB state backend Checkpointing (Async and Incremental) is not working > with CEP. > - > > Key: FLINK-7756 > URL: https://issues.apache.org/jira/browse/FLINK-7756 > Project: Flink > Issue Type: Bug > Components: CEP, State Backends, Checkpointing, Streaming >Affects Versions: 1.3.2 > Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend >Reporter: Shashank Agarwal > > When i try to use RocksDBStateBackend on my staging cluster (which is using > HDFS as file system) it crashes. But When i use FsStateBackend on staging > (which is using HDFS as file system) it is working fine. > On local with local file system it's working fine in both cases. > Please check attached logs. I have around 20-25 tasks in my app. > {code:java} > 2017-09-29 14:21:31,639 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=0). > 2017-09-29 14:21:31,640 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,020 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=1). > 2017-09-29 14:21:32,022 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,078 INFO com.datastax.driver.core.NettyUtil > - Found Netty's native epoll transport in the classpath, using > it > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (1/2) > (b879f192c4e8aae6671cdafb3a24c00a). > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Map (2/2) > (1ea5aef6ccc7031edc6b37da2912d90b). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (2/2) > (4bac8e764c67520d418a4c755be23d4d). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched > from RUNNING to FAILED. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 > for operator Co-Flat Map (1/2).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 2 for > operator Co-Flat Map (1/2). > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) > ... 5 more > Suppressed: java.lang.Exception: Could not properly cancel managed > keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:961) > ... 5 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet
[jira] [Created] (FLINK-7756) RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.
Shashank Agarwal created FLINK-7756: --- Summary: RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP. Key: FLINK-7756 URL: https://issues.apache.org/jira/browse/FLINK-7756 Project: Flink Issue Type: Bug Components: CEP, State Backends, Checkpointing, Streaming Affects Versions: 1.3.2 Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend Reporter: Shashank Agarwal When i try to use RocksDBStateBackend on my staging cluster (which is using HDFS as file system) it crashes. But When i use FsStateBackend on staging (which is using HDFS as file system) it is working fine. On local with local file system it's working fine in both cases. Please check attached logs. I have around 20-25 tasks in my app. {code:java} 2017-09-29 14:21:31,639 INFO org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state to restore for the BucketingSink (taskIdx=0). 2017-09-29 14:21:31,640 INFO org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - Initializing RocksDB keyed state backend from snapshot. 2017-09-29 14:21:32,020 INFO org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state to restore for the BucketingSink (taskIdx=1). 2017-09-29 14:21:32,022 INFO org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - Initializing RocksDB keyed state backend from snapshot. 2017-09-29 14:21:32,078 INFO com.datastax.driver.core.NettyUtil - Found Netty's native epoll transport in the classpath, using it 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task - Attempting to fail task externally Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a). 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task - Attempting to fail task externally Map (2/2) (1ea5aef6ccc7031edc6b37da2912d90b). 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task - Attempting to fail task externally Co-Flat Map (2/2) (4bac8e764c67520d418a4c755be23d4d). 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched from RUNNING to FAILED. AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 for operator Co-Flat Map (1/2).} at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Exception: Could not materialize checkpoint 2 for operator Co-Flat Map (1/2). ... 6 more Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) ... 5 more Suppressed: java.lang.Exception: Could not properly cancel managed keyed state future. at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:961) ... 5 more Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) at org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:85) at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:88) ... 7 more Caused by: java.lang.IllegalStateException at org.apache.flink.util.Preconditions.checkState(Preconditions.java:179) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.materializeSnapshot(RocksDBKeyedStateBackend.java:878) a
[jira] [Commented] (FLINK-7484) com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index: 7, Size: 5
[ https://issues.apache.org/jira/browse/FLINK-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140606#comment-16140606 ] Shashank Agarwal commented on FLINK-7484: - Got another crash. I have not used any INR class in code. May be it's taking automatically from some value. {code) com.esotericsoftware.kryo.KryoException: Unable to find class: INR Serialization trace: underlying (scala.collection.convert.Wrappers$SeqWrapper) at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138) at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115) at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:752) at com.twitter.chill.TraversableSerializer.read(Traversable.scala:43) at com.twitter.chill.TraversableSerializer.read(Traversable.scala:21) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:657) at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.copy(KryoSerializer.java:190) at org.apache.flink.api.scala.typeutils.CaseClassSerializer.copy(CaseClassSerializer.scala:101) at org.apache.flink.api.scala.typeutils.CaseClassSerializer.copy(CaseClassSerializer.scala:32) at org.apache.flink.runtime.state.ArrayListSerializer.copy(ArrayListSerializer.java:74) at org.apache.flink.runtime.state.ArrayListSerializer.copy(ArrayListSerializer.java:34) at org.apache.flink.runtime.state.heap.CopyOnWriteStateTable.get(CopyOnWriteStateTable.java:279) at org.apache.flink.runtime.state.heap.CopyOnWriteStateTable.get(CopyOnWriteStateTable.java:296) at org.apache.flink.runtime.state.heap.HeapListState.add(HeapListState.java:77) at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processElement(WindowOperator.java:442) at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:206) at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:69) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: INR at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136) ... 23 more {code} > com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: > Index: 7, Size: 5 > --- > > Key: FLINK-7484 > URL: https://issues.apache.org/jira/browse/FLINK-7484 > Project: Flink > Issue Type: Bug > Components: CEP, DataStream API, Scala API >Affects Versions: 1.3.2 > Environment: Flink 1.3.2 , Yarn Cluster, FsStateBackend >Reporter: Shashank Agarwal > > I am using many CEP's and Global Window. I am getting following error > sometimes and application crashes. I have checked logically there's no flow > in the program. Here ItemPojo is a Pojo class and we are using > java.utill.List[ItemPojo]. We are using Scala DataStream API please find > attached logs. > {code} > 2017-08-17 10:04:12,814 INFO org.apache.flink.runtime.taskmanager.Task > - TriggerWindow(GlobalWindows(), > ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@6d36aa3c}, > co.thirdwatch.trigger.TransactionTrigger@5707c1cb, > WindowedStream.apply(WindowedStream.scala:582)) -> Flat Map -> Map -> Sink: > Saving CSV Features Sink (1/2) (06c0d4d231bc620ba9e7924b9b0da8d1) switched > from RUNNING to FAILED. > com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: > Index: 7, Size: 5 > Serialization trace: > category (co.thirdwatch.pojo.ItemPojo) > underlying (scala.collection.convert.Wrappers$SeqWrapper) > at > com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528) > at com.esotericsoftware.kr
[jira] [Commented] (FLINK-7484) com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index: 7, Size: 5
[ https://issues.apache.org/jira/browse/FLINK-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16135101#comment-16135101 ] Shashank Agarwal commented on FLINK-7484: - Hi [~yew1eb] It's very big code base, With lot of business logics. But i can give you overview and scenario. We are using scala streams. But due to some cassandra sink issue we convert scala stream to Java stream. So we convert internal objects to java object also. As per logs i guess this is the scenario. We use a ItemPojo class. {code:java} @SerialVersionUID(224567L) @UDT(keyspace = "cstable", name = "item") case class ItemPojo( @BeanProperty var item_id: String, @BeanProperty var product_title: String, @BeanProperty var price: String ) extends Serializable { def this() { this(null, null, null) } } {code} In a stream object we use java.util.List[ItemPojo] , It's not creating any issue till now we were using lot of CEP's and we were using in global window also. But after some time due to some need we have iterate over that list in global window. Than we are getting this error some time and application got crashed. {code:java} for (cItem <- cItemList) { some logic here. } {code} I think may be this is issue cause i am getting error after these. > com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: > Index: 7, Size: 5 > --- > > Key: FLINK-7484 > URL: https://issues.apache.org/jira/browse/FLINK-7484 > Project: Flink > Issue Type: Bug > Components: DataStream API, Scala API >Affects Versions: 1.3.2 > Environment: Flink 1.3.2 , Yarn Cluster, FsStateBackend >Reporter: Shashank Agarwal > > I am using many CEP's and Global Window. I am getting following error > sometimes and application crashes. I have checked logically there's no flow > in the program. Here ItemPojo is a Pojo class and we are using > java.utill.List[ItemPojo]. We are using Scala DataStream API please find > attached logs. > {code} > 2017-08-17 10:04:12,814 INFO org.apache.flink.runtime.taskmanager.Task > - TriggerWindow(GlobalWindows(), > ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@6d36aa3c}, > co.thirdwatch.trigger.TransactionTrigger@5707c1cb, > WindowedStream.apply(WindowedStream.scala:582)) -> Flat Map -> Map -> Sink: > Saving CSV Features Sink (1/2) (06c0d4d231bc620ba9e7924b9b0da8d1) switched > from RUNNING to FAILED. > com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: > Index: 7, Size: 5 > Serialization trace: > category (co.thirdwatch.pojo.ItemPojo) > underlying (scala.collection.convert.Wrappers$SeqWrapper) > at > com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761) > at com.twitter.chill.TraversableSerializer.read(Traversable.scala:43) > at com.twitter.chill.TraversableSerializer.read(Traversable.scala:21) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679) > at > com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:657) > at > org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.copy(KryoSerializer.java:190) > at > org.apache.flink.api.scala.typeutils.CaseClassSerializer.copy(CaseClassSerializer.scala:101) > at > org.apache.flink.api.scala.typeutils.CaseClassSerializer.copy(CaseClassSerializer.scala:32) > at > org.apache.flink.runtime.state.ArrayListSerializer.copy(ArrayListSerializer.java:74) > at > org.apache.flink.runtime.state.ArrayListSerializer.copy(ArrayListSerializer.java:34) > at > org.apache.flink.runtime.state.heap.CopyOnWriteStateTable.get(CopyOnWriteStateTable.java:279) > at > org.apache.flink.runtime.state.heap.CopyOnWriteStateTable.get(CopyOnWriteStateTable.java:296) > at > org.apache.flink.runtime.state.heap.HeapListState.add(HeapListState.java:77) > at > org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processElement(WindowOperator.java:442) > at > org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:206) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:69) > at > org.apache.flink.streaming.runtime.tasks.StreamTa
[jira] [Created] (FLINK-7484) com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index: 7, Size: 5
Shashank Agarwal created FLINK-7484: --- Summary: com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index: 7, Size: 5 Key: FLINK-7484 URL: https://issues.apache.org/jira/browse/FLINK-7484 Project: Flink Issue Type: Bug Components: DataStream API, Scala API Affects Versions: 1.3.2 Environment: Flink 1.3.2 , Yarn Cluster, FsStateBackend Reporter: Shashank Agarwal I am using many CEP's and Global Window. I am getting following error sometimes and application crashes. I have checked logically there's no flow in the program. Here ItemPojo is a Pojo class and we are using java.utill.List[ItemPojo]. We are using Scala DataStream API please find attached logs. {code} 2017-08-17 10:04:12,814 INFO org.apache.flink.runtime.taskmanager.Task - TriggerWindow(GlobalWindows(), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@6d36aa3c}, co.thirdwatch.trigger.TransactionTrigger@5707c1cb, WindowedStream.apply(WindowedStream.scala:582)) -> Flat Map -> Map -> Sink: Saving CSV Features Sink (1/2) (06c0d4d231bc620ba9e7924b9b0da8d1) switched from RUNNING to FAILED. com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index: 7, Size: 5 Serialization trace: category (co.thirdwatch.pojo.ItemPojo) underlying (scala.collection.convert.Wrappers$SeqWrapper) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761) at com.twitter.chill.TraversableSerializer.read(Traversable.scala:43) at com.twitter.chill.TraversableSerializer.read(Traversable.scala:21) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:657) at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.copy(KryoSerializer.java:190) at org.apache.flink.api.scala.typeutils.CaseClassSerializer.copy(CaseClassSerializer.scala:101) at org.apache.flink.api.scala.typeutils.CaseClassSerializer.copy(CaseClassSerializer.scala:32) at org.apache.flink.runtime.state.ArrayListSerializer.copy(ArrayListSerializer.java:74) at org.apache.flink.runtime.state.ArrayListSerializer.copy(ArrayListSerializer.java:34) at org.apache.flink.runtime.state.heap.CopyOnWriteStateTable.get(CopyOnWriteStateTable.java:279) at org.apache.flink.runtime.state.heap.CopyOnWriteStateTable.get(CopyOnWriteStateTable.java:296) at org.apache.flink.runtime.state.heap.HeapListState.add(HeapListState.java:77) at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processElement(WindowOperator.java:442) at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:206) at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:69) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IndexOutOfBoundsException: Index: 7, Size: 5 at java.util.ArrayList.rangeCheck(ArrayList.java:653) at java.util.ArrayList.get(ArrayList.java:429) at com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:42) at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:805) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:728) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:113) ... 22 more 2017-08-17 10:04:12,816 INFO org.apache.flink.runtime.taskmanager.Task - Freeing task resources for TriggerWindow(GlobalWindows(), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@6d36aa3c}, co.thirdwatch.trigger.TransactionTrigger@5707c1cb, WindowedStream.apply(WindowedStream.scala:582)) -> Flat Map -> Map -> Sink: Saving CSV Features Sink (1/2) (06c0d4d231bc620ba9e7924b9b0da8d1). 2017-08-17 10:04:12,816 INFO org.apache.flink.runtime.taskmanager.Task - Ensuring all FileSystem streams are closed for task TriggerWindow(GlobalWindows(), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@6d36aa3c}, co.thirdwatch.trigger.TransactionTrigger@5707c1cb, WindowedStream.apply(WindowedStream.scal
[jira] [Commented] (FLINK-6997) SavepointITCase fails in master branch sometimes
[ https://issues.apache.org/jira/browse/FLINK-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062257#comment-16062257 ] Shashank Agarwal commented on FLINK-6997: - I have tried savepoint on release-1.3 getting same error. {code} flink cancel -s hdfs:///fl/savepoint-130 f6bfa9e01be030d5d144a6aa680ff3ed {code} I am getting error {code} The program finished with the following exception: java.lang.Exception: Canceling the job with ID f6bfa9e01be030d5d144a6aa680ff3ed failed. at org.apache.flink.client.CliFrontend.cancel(CliFrontend.java:637) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1092) at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1133) at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1130) at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1130) Caused by: java.lang.Exception: Failed to trigger savepoint. at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$6.apply(JobManager.scala:639) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$6.apply(JobManager.scala:629) at org.apache.flink.runtime.concurrent.impl.FlinkFuture$5.onComplete(FlinkFuture.java:272) at akka.dispatch.OnComplete.internal(Future.scala:247) at akka.dispatch.OnComplete.internal(Future.scala:245) at akka.dispatch.japi$CallbackBridge.apply(Future.scala:175) at akka.dispatch.japi$CallbackBridge.apply(Future.scala:172) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91) at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.Exception: Failed to trigger savepoint: Not all required tasks are currently running. at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.triggerSavepoint(CheckpointCoordinator.java:382) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:625) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.flink.runtime.clusterframework.ContaineredJobManager$$anonfun$handleContainerMessage$1.applyOrElse(ContaineredJobManager.scala:100) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) at org.apache.flink.yarn.YarnJobManager$$anonfun$handleYarnShutdown$1.applyOrElse(YarnJobManager.scala:103) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:38) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) at akka.actor.Actor$class.aroundReceive(Actor.scala:467) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:125) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch
[jira] [Created] (FLINK-6993) Not reading recursive files in Batch by using readTextFile when file name contains _ in starting.
Shashank Agarwal created FLINK-6993: --- Summary: Not reading recursive files in Batch by using readTextFile when file name contains _ in starting. Key: FLINK-6993 URL: https://issues.apache.org/jira/browse/FLINK-6993 Project: Flink Issue Type: Bug Components: Batch Connectors and Input/Output Formats Affects Versions: 1.3.0 Reporter: Shashank Agarwal Priority: Critical Fix For: 1.3.2 When i try to read files from a folder using using readTextFile in batch and using recursive.file.enumeration, It's not reading the files when file name contains _ in starting. But when i removed the _ from start it's working fine. It also working fine in case of direct path of single file not working with Directory path. For replicate the issue : {code} import org.apache.flink.api.scala.{DataSet, ExecutionEnvironment} import org.apache.flink.configuration.Configuration object CSVMerge { def main(args: Array[String]): Unit = { val env = ExecutionEnvironment.getExecutionEnvironment // create a configuration object val parameters = new Configuration // set the recursive enumeration parameter parameters.setBoolean("recursive.file.enumeration", true) val stream = env.readTextFile("file:///Users/data") .withParameters(parameters) stream.print() } } {code} When you put 2-3 Text files with name like 1.txt, 2.txt etc. in data folder it's working fine. But when we put _1.txt, _2.txt file it's not working. Flink BucketingSink in stream by default put _ before the file names. So unable to read Sinked files from DataStream. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-6844) TraversableSerializer should implement compatibility methods
[ https://issues.apache.org/jira/browse/FLINK-6844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056326#comment-16056326 ] Shashank Agarwal commented on FLINK-6844: - So what is the release date for version 1.3.1 cause i don't wanna build this with master branch and than have to publish libraries like CEP etc. local. Everything will change. is there any quick fix which i can apply on release-1.3.0 with this commit. > TraversableSerializer should implement compatibility methods > > > Key: FLINK-6844 > URL: https://issues.apache.org/jira/browse/FLINK-6844 > Project: Flink > Issue Type: Bug > Components: Type Serialization System >Affects Versions: 1.3.0 >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai >Priority: Blocker > Labels: flink-rel-1.3.1-blockers > Fix For: 1.3.1, 1.4.0 > > > The {{TraversableSerializer}} may be used as a serializer for managed state > and takes part in checkpointing, therefore should implement the compatibility > methods. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-6844) TraversableSerializer should implement compatibility methods
[ https://issues.apache.org/jira/browse/FLINK-6844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055854#comment-16055854 ] Shashank Agarwal commented on FLINK-6844: - [~tzulitai] I am using kafka as source and there's no issue with that. Actually what I am using flink cep this code was working fine with 1.2.0 and 1.2.1 i have updated my applications to 1.3.0 Applications are working where i haven't used CEP, In application i have used cep was giving following exception and terminating the checkpointing for all. {code} java.lang.Exception: Could not perform checkpoint 1 for operator KeyedCEPPatternOperator -> Map (6/6). at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:550) at org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyCheckpoint(BarrierBuffer.java:378) at org.apache.flink.streaming.runtime.io.BarrierBuffer.processBarrier(BarrierBuffer.java:281) at org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:183) at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:213) at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:69) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:262) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Exception: Could not complete snapshot 1 for operator KeyedCEPPatternOperator -> Map (6/6). at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:406) at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1157) at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1089) at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:653) at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:589) at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:542) ... 8 more Caused by: java.lang.UnsupportedOperationException at org.apache.flink.api.scala.typeutils.TraversableSerializer.snapshotConfiguration(TraversableSerializer.scala:155) at org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.(CompositeTypeSerializerConfigSnapshot.java:53) at org.apache.flink.api.scala.typeutils.OptionSerializer$OptionSerializerConfigSnapshot.(OptionSerializer.scala:139) at org.apache.flink.api.scala.typeutils.OptionSerializer.snapshotConfiguration(OptionSerializer.scala:104) at org.apache.flink.api.scala.typeutils.OptionSerializer.snapshotConfiguration(OptionSerializer.scala:28) at org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.(CompositeTypeSerializerConfigSnapshot.java:53) at org.apache.flink.api.java.typeutils.runtime.TupleSerializerConfigSnapshot.(TupleSerializerConfigSnapshot.java:45) at org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:132) at org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:39) at org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.(CompositeTypeSerializerConfigSnapshot.java:53) at org.apache.flink.api.common.typeutils.base.CollectionSerializerConfigSnapshot.(CollectionSerializerConfigSnapshot.java:39) at org.apache.flink.api.common.typeutils.base.ListSerializer.snapshotConfiguration(ListSerializer.java:183) at org.apache.flink.api.common.typeutils.base.ListSerializer.snapshotConfiguration(ListSerializer.java:47) at org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.(CompositeTypeSerializerConfigSnapshot.java:53) at org.apache.flink.api.common.typeutils.base.MapSerializerConfigSnapshot.(MapSerializerConfigSnapshot.java:38) at org.apache.flink.runtime.state.HashMapSerializer.snapshotConfiguration(HashMapSerializer.java:210) at org.apache.flink.runtime.state.RegisteredKeyedBackendStateMetaInfo.snapshot(RegisteredKeyedBackendStateMetaInfo.java:71) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.snapshot(HeapKeyedStateBackend.java:267) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:396) ... 13 more {code} Than i have applied your patch on release-1.3.0 tag and used that with this code still without CEP app is working fine, b
[jira] [Commented] (FLINK-6844) TraversableSerializer should implement compatibility methods
[ https://issues.apache.org/jira/browse/FLINK-6844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055824#comment-16055824 ] Shashank Agarwal commented on FLINK-6844: - I have cross variefied with source code and flink dashboard also i have successfully applied the patch. Actually it's not printing any other error stack traces. > TraversableSerializer should implement compatibility methods > > > Key: FLINK-6844 > URL: https://issues.apache.org/jira/browse/FLINK-6844 > Project: Flink > Issue Type: Bug > Components: Type Serialization System >Affects Versions: 1.3.0 >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai >Priority: Blocker > Labels: flink-rel-1.3.1-blockers > Fix For: 1.3.1, 1.4.0 > > > The {{TraversableSerializer}} may be used as a serializer for managed state > and takes part in checkpointing, therefore should implement the compatibility > methods. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-6844) TraversableSerializer should implement compatibility methods
[ https://issues.apache.org/jira/browse/FLINK-6844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055816#comment-16055816 ] Shashank Agarwal commented on FLINK-6844: - There is no more stack traces it's printing after applying the patch May be it's due to you have removed {code} throw new UnsupportedOperationException() {code} > TraversableSerializer should implement compatibility methods > > > Key: FLINK-6844 > URL: https://issues.apache.org/jira/browse/FLINK-6844 > Project: Flink > Issue Type: Bug > Components: Type Serialization System >Affects Versions: 1.3.0 >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai >Priority: Blocker > Labels: flink-rel-1.3.1-blockers > Fix For: 1.3.1, 1.4.0 > > > The {{TraversableSerializer}} may be used as a serializer for managed state > and takes part in checkpointing, therefore should implement the compatibility > methods. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-6844) TraversableSerializer should implement compatibility methods
[ https://issues.apache.org/jira/browse/FLINK-6844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055813#comment-16055813 ] Shashank Agarwal commented on FLINK-6844: - [~tzulitai] I have applied this patch on https://github.com/apache/flink/tree/release-1.3.0. Latest 1.3.0 Release. I need stable version. > TraversableSerializer should implement compatibility methods > > > Key: FLINK-6844 > URL: https://issues.apache.org/jira/browse/FLINK-6844 > Project: Flink > Issue Type: Bug > Components: Type Serialization System >Affects Versions: 1.3.0 >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai >Priority: Blocker > Labels: flink-rel-1.3.1-blockers > Fix For: 1.3.1, 1.4.0 > > > The {{TraversableSerializer}} may be used as a serializer for managed state > and takes part in checkpointing, therefore should implement the compatibility > methods. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-6844) TraversableSerializer should implement compatibility methods
[ https://issues.apache.org/jira/browse/FLINK-6844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055804#comment-16055804 ] Shashank Agarwal commented on FLINK-6844: - [~tzulitai] Checked with patch not working with KeyedCEPPatternOperator. In commit you are not throwing exception so only log is printing no exception. But checkpointing is not working it was working fine in 1.2.1 {code} 2017-06-20 15:26:25,518 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: Custom File Source (1/1) is not being executed at the moment. Aborting checkpoint. {code} > TraversableSerializer should implement compatibility methods > > > Key: FLINK-6844 > URL: https://issues.apache.org/jira/browse/FLINK-6844 > Project: Flink > Issue Type: Bug > Components: Type Serialization System >Affects Versions: 1.3.0 >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai >Priority: Blocker > Labels: flink-rel-1.3.1-blockers > Fix For: 1.3.1, 1.4.0 > > > The {{TraversableSerializer}} may be used as a serializer for managed state > and takes part in checkpointing, therefore should implement the compatibility > methods. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (FLINK-6954) Flink 1.3 checkpointing failing with KeyedCEPPatternOperator
[ https://issues.apache.org/jira/browse/FLINK-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055794#comment-16055794 ] Shashank Agarwal edited comment on FLINK-6954 at 6/20/17 1:55 PM: -- Checked with patch not working with KeyedCEPPatternOperator. In commit they are not throwing exception so only log is printing no exception. {code} 2017-06-20 15:26:25,518 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: Custom File Source (1/1) is not being executed at the moment. Aborting checkpoint. {code} was (Author: shashank734): Checked with patch not working with KeyedCEPPatternOperator. > Flink 1.3 checkpointing failing with KeyedCEPPatternOperator > > > Key: FLINK-6954 > URL: https://issues.apache.org/jira/browse/FLINK-6954 > Project: Flink > Issue Type: Bug > Components: CEP, DataStream API, State Backends, Checkpointing >Affects Versions: 1.3.0 > Environment: yarn, flink 1.3, HDFS >Reporter: Shashank Agarwal > Fix For: 1.3.1 > > > After upgrading to Flink 1.3 Checkpointing is not working, it's failing again > and again. Check operator state. I have checked with both Rocks DB state > backend and FS state backend. Check stack trace. > {code} > java.lang.Exception: Could not perform checkpoint 1 for operator > KeyedCEPPatternOperator -> Map (6/6). > at > org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:550) > at > org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyCheckpoint(BarrierBuffer.java:378) > at > org.apache.flink.streaming.runtime.io.BarrierBuffer.processBarrier(BarrierBuffer.java:281) > at > org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:183) > at > org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:213) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:69) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:262) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not complete snapshot 1 for operator > KeyedCEPPatternOperator -> Map (6/6). > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:406) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1157) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1089) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:653) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:589) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:542) > ... 8 more > Caused by: java.lang.UnsupportedOperationException > at > org.apache.flink.api.scala.typeutils.TraversableSerializer.snapshotConfiguration(TraversableSerializer.scala:155) > at > org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.(CompositeTypeSerializerConfigSnapshot.java:53) > at > org.apache.flink.api.scala.typeutils.OptionSerializer$OptionSerializerConfigSnapshot.(OptionSerializer.scala:139) > at > org.apache.flink.api.scala.typeutils.OptionSerializer.snapshotConfiguration(OptionSerializer.scala:104) > at > org.apache.flink.api.scala.typeutils.OptionSerializer.snapshotConfiguration(OptionSerializer.scala:28) > at > org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.(CompositeTypeSerializerConfigSnapshot.java:53) > at > org.apache.flink.api.java.typeutils.runtime.TupleSerializerConfigSnapshot.(TupleSerializerConfigSnapshot.java:45) > at > org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:132) > at > org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:39) > at > org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.(CompositeTypeSerializerConfigSnapshot.java:53) > at > org.apache.flink.api.common.typeutils.base.CollectionSerializerConfigSnapshot.(CollectionSerializerConfigSnapshot.java:39) > at > org.apache.flink.api.common.typeutils.base.ListSerializer.snapshotConfiguration(ListSerializer.java:183) > at > org.apache.flink.api.common.typeutils.base.ListSer
[jira] [Reopened] (FLINK-6954) Flink 1.3 checkpointing failing with KeyedCEPPatternOperator
[ https://issues.apache.org/jira/browse/FLINK-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Agarwal reopened FLINK-6954: - Checked with patch not working with KeyedCEPPatternOperator. > Flink 1.3 checkpointing failing with KeyedCEPPatternOperator > > > Key: FLINK-6954 > URL: https://issues.apache.org/jira/browse/FLINK-6954 > Project: Flink > Issue Type: Bug > Components: CEP, DataStream API, State Backends, Checkpointing >Affects Versions: 1.3.0 > Environment: yarn, flink 1.3, HDFS >Reporter: Shashank Agarwal > Fix For: 1.3.1 > > > After upgrading to Flink 1.3 Checkpointing is not working, it's failing again > and again. Check operator state. I have checked with both Rocks DB state > backend and FS state backend. Check stack trace. > {code} > java.lang.Exception: Could not perform checkpoint 1 for operator > KeyedCEPPatternOperator -> Map (6/6). > at > org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:550) > at > org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyCheckpoint(BarrierBuffer.java:378) > at > org.apache.flink.streaming.runtime.io.BarrierBuffer.processBarrier(BarrierBuffer.java:281) > at > org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:183) > at > org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:213) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:69) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:262) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not complete snapshot 1 for operator > KeyedCEPPatternOperator -> Map (6/6). > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:406) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1157) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1089) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:653) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:589) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:542) > ... 8 more > Caused by: java.lang.UnsupportedOperationException > at > org.apache.flink.api.scala.typeutils.TraversableSerializer.snapshotConfiguration(TraversableSerializer.scala:155) > at > org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.(CompositeTypeSerializerConfigSnapshot.java:53) > at > org.apache.flink.api.scala.typeutils.OptionSerializer$OptionSerializerConfigSnapshot.(OptionSerializer.scala:139) > at > org.apache.flink.api.scala.typeutils.OptionSerializer.snapshotConfiguration(OptionSerializer.scala:104) > at > org.apache.flink.api.scala.typeutils.OptionSerializer.snapshotConfiguration(OptionSerializer.scala:28) > at > org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.(CompositeTypeSerializerConfigSnapshot.java:53) > at > org.apache.flink.api.java.typeutils.runtime.TupleSerializerConfigSnapshot.(TupleSerializerConfigSnapshot.java:45) > at > org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:132) > at > org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:39) > at > org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.(CompositeTypeSerializerConfigSnapshot.java:53) > at > org.apache.flink.api.common.typeutils.base.CollectionSerializerConfigSnapshot.(CollectionSerializerConfigSnapshot.java:39) > at > org.apache.flink.api.common.typeutils.base.ListSerializer.snapshotConfiguration(ListSerializer.java:183) > at > org.apache.flink.api.common.typeutils.base.ListSerializer.snapshotConfiguration(ListSerializer.java:47) > at > org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.(CompositeTypeSerializerConfigSnapshot.java:53) > at > org.apache.flink.api.common.typeutils.base.MapSerializerConfigSnapshot.(MapSerializerConfigSnapshot.java:38) > at > org.apache.flink.runtime.state.HashMapSerializer.snapshotConfiguration(HashMapSerializer.java:210) > at > org.apache.flink.runtime.state.RegisteredKeyedBackendStateMetaIn
[jira] [Closed] (FLINK-6954) Flink 1.3 checkpointing failing with KeyedCEPPatternOperator
[ https://issues.apache.org/jira/browse/FLINK-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Agarwal closed FLINK-6954. --- Resolution: Duplicate Fix Version/s: 1.3.1 > Flink 1.3 checkpointing failing with KeyedCEPPatternOperator > > > Key: FLINK-6954 > URL: https://issues.apache.org/jira/browse/FLINK-6954 > Project: Flink > Issue Type: Bug > Components: CEP, DataStream API, State Backends, Checkpointing >Affects Versions: 1.3.0 > Environment: yarn, flink 1.3, HDFS >Reporter: Shashank Agarwal > Fix For: 1.3.1 > > > After upgrading to Flink 1.3 Checkpointing is not working, it's failing again > and again. Check operator state. I have checked with both Rocks DB state > backend and FS state backend. Check stack trace. > {code} > java.lang.Exception: Could not perform checkpoint 1 for operator > KeyedCEPPatternOperator -> Map (6/6). > at > org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:550) > at > org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyCheckpoint(BarrierBuffer.java:378) > at > org.apache.flink.streaming.runtime.io.BarrierBuffer.processBarrier(BarrierBuffer.java:281) > at > org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:183) > at > org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:213) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:69) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:262) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not complete snapshot 1 for operator > KeyedCEPPatternOperator -> Map (6/6). > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:406) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1157) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1089) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:653) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:589) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:542) > ... 8 more > Caused by: java.lang.UnsupportedOperationException > at > org.apache.flink.api.scala.typeutils.TraversableSerializer.snapshotConfiguration(TraversableSerializer.scala:155) > at > org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.(CompositeTypeSerializerConfigSnapshot.java:53) > at > org.apache.flink.api.scala.typeutils.OptionSerializer$OptionSerializerConfigSnapshot.(OptionSerializer.scala:139) > at > org.apache.flink.api.scala.typeutils.OptionSerializer.snapshotConfiguration(OptionSerializer.scala:104) > at > org.apache.flink.api.scala.typeutils.OptionSerializer.snapshotConfiguration(OptionSerializer.scala:28) > at > org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.(CompositeTypeSerializerConfigSnapshot.java:53) > at > org.apache.flink.api.java.typeutils.runtime.TupleSerializerConfigSnapshot.(TupleSerializerConfigSnapshot.java:45) > at > org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:132) > at > org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:39) > at > org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.(CompositeTypeSerializerConfigSnapshot.java:53) > at > org.apache.flink.api.common.typeutils.base.CollectionSerializerConfigSnapshot.(CollectionSerializerConfigSnapshot.java:39) > at > org.apache.flink.api.common.typeutils.base.ListSerializer.snapshotConfiguration(ListSerializer.java:183) > at > org.apache.flink.api.common.typeutils.base.ListSerializer.snapshotConfiguration(ListSerializer.java:47) > at > org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.(CompositeTypeSerializerConfigSnapshot.java:53) > at > org.apache.flink.api.common.typeutils.base.MapSerializerConfigSnapshot.(MapSerializerConfigSnapshot.java:38) > at > org.apache.flink.runtime.state.HashMapSerializer.snapshotConfiguration(HashMapSerializer.java:210) > at > org.apache.flink.runtime.state.RegisteredKeyedBackendStateMetaInfo.snapshot(
[jira] [Created] (FLINK-6954) Flink 1.3 checkpointing failing with KeyedCEPPatternOperator
Shashank Agarwal created FLINK-6954: --- Summary: Flink 1.3 checkpointing failing with KeyedCEPPatternOperator Key: FLINK-6954 URL: https://issues.apache.org/jira/browse/FLINK-6954 Project: Flink Issue Type: Bug Components: CEP, DataStream API, State Backends, Checkpointing Affects Versions: 1.3.0 Environment: yarn, flink 1.3, HDFS Reporter: Shashank Agarwal After upgrading to Flink 1.3 Checkpointing is not working, it's failing again and again. Check operator state. I have checked with both Rocks DB state backend and FS state backend. Check stack trace. {code} java.lang.Exception: Could not perform checkpoint 1 for operator KeyedCEPPatternOperator -> Map (6/6). at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:550) at org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyCheckpoint(BarrierBuffer.java:378) at org.apache.flink.streaming.runtime.io.BarrierBuffer.processBarrier(BarrierBuffer.java:281) at org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:183) at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:213) at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:69) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:262) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Exception: Could not complete snapshot 1 for operator KeyedCEPPatternOperator -> Map (6/6). at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:406) at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1157) at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1089) at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:653) at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:589) at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:542) ... 8 more Caused by: java.lang.UnsupportedOperationException at org.apache.flink.api.scala.typeutils.TraversableSerializer.snapshotConfiguration(TraversableSerializer.scala:155) at org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.(CompositeTypeSerializerConfigSnapshot.java:53) at org.apache.flink.api.scala.typeutils.OptionSerializer$OptionSerializerConfigSnapshot.(OptionSerializer.scala:139) at org.apache.flink.api.scala.typeutils.OptionSerializer.snapshotConfiguration(OptionSerializer.scala:104) at org.apache.flink.api.scala.typeutils.OptionSerializer.snapshotConfiguration(OptionSerializer.scala:28) at org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.(CompositeTypeSerializerConfigSnapshot.java:53) at org.apache.flink.api.java.typeutils.runtime.TupleSerializerConfigSnapshot.(TupleSerializerConfigSnapshot.java:45) at org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:132) at org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase.snapshotConfiguration(TupleSerializerBase.java:39) at org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.(CompositeTypeSerializerConfigSnapshot.java:53) at org.apache.flink.api.common.typeutils.base.CollectionSerializerConfigSnapshot.(CollectionSerializerConfigSnapshot.java:39) at org.apache.flink.api.common.typeutils.base.ListSerializer.snapshotConfiguration(ListSerializer.java:183) at org.apache.flink.api.common.typeutils.base.ListSerializer.snapshotConfiguration(ListSerializer.java:47) at org.apache.flink.api.common.typeutils.CompositeTypeSerializerConfigSnapshot.(CompositeTypeSerializerConfigSnapshot.java:53) at org.apache.flink.api.common.typeutils.base.MapSerializerConfigSnapshot.(MapSerializerConfigSnapshot.java:38) at org.apache.flink.runtime.state.HashMapSerializer.snapshotConfiguration(HashMapSerializer.java:210) at org.apache.flink.runtime.state.RegisteredKeyedBackendStateMetaInfo.snapshot(RegisteredKeyedBackendStateMetaInfo.java:71) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.snapshot(HeapKeyedStateBackend.java:267) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:396) ... 13 more {code} -- This message was sent by
[jira] [Commented] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeyedCEP.
[ https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013942#comment-16013942 ] Shashank Agarwal commented on FLINK-6321: - Hi [~kkl0u] , Please find SignalDeserializationSchema class.. {code} import RawSignal import org.apache.flink.streaming.util.serialization.AbstractDeserializationSchema import org.json4s.DefaultFormats import org.json4s.jackson.JsonMethods.parse /** * Created by shashank on 13/01/17. * * Deserialize raw json string from kafka to Raw signal object. */ class SignalDeserializationSchema extends AbstractDeserializationSchema[RawSignal] { implicit lazy val formats = DefaultFormats override def deserialize(message: Array[Byte]): RawSignal = { parse(new String(message)).extract[RawSignal] } override def isEndOfStream(nextElement: RawSignal): Boolean = false } {code} and RawSignal Example class... {code} case class RawSignal(name: Option[String], email: Option[String], UserId: Option[String]) {code} > RocksDB state backend Checkpointing is not working with KeyedCEP. > - > > Key: FLINK-6321 > URL: https://issues.apache.org/jira/browse/FLINK-6321 > Project: Flink > Issue Type: Bug > Components: CEP >Affects Versions: 1.2.0 > Environment: yarn-cluster, RocksDB State backend, Checkpointing every > 1000 ms >Reporter: Shashank Agarwal >Assignee: Kostas Kloudas >Priority: Blocker > Fix For: 1.3.0 > > > Checkpointing is not working with RocksDBStateBackend when using CEP. It's > working fine with FsStateBackend and MemoryStateBackend. Application failing > every-time. > {code} > 04/18/2017 21:53:20 Job execution switched to status FAILING. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint > 46 for operator KeyedCEPPatternOperator -> Map (1/4).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 46 for > operator KeyedCEPPatternOperator -> Map (1/4). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915) > ... 5 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6460) Build Fat Jar failed while building with SBT Assembly flink 1.2.1
[ https://issues.apache.org/jira/browse/FLINK-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15998033#comment-15998033 ] Shashank Agarwal commented on FLINK-6460: - I have solved this issue by adding following Merge Strategy in build.sbt, But that was not the case in previous flink versions. If that is the case we can add that in documentations. {code} assemblyMergeStrategy in assembly := { case "META-INF/io.netty.versions.properties" => MergeStrategy.first case x => val oldStrategy = (assemblyMergeStrategy in assembly).value oldStrategy(x) } {code} > Build Fat Jar failed while building with SBT Assembly flink 1.2.1 > - > > Key: FLINK-6460 > URL: https://issues.apache.org/jira/browse/FLINK-6460 > Project: Flink > Issue Type: Bug > Components: Build System, Scala API >Affects Versions: 1.2.1 > Environment: Ubuntu 16.04, Scala, SBT, Flink 1.2.1 >Reporter: Shashank Agarwal > Labels: build-failure, flink, sbt, scala > > While Creating SBT assembly from command > {code}sbt clean assembly{code} > Getting error deduplicate: different file contents. error log : > {code} > [error] (root/*:assembly) deduplicate: different file contents found in the > following: > [error] > /Users/shashank/.ivy2/cache/io.netty/netty-handler/jars/netty-handler-4.0.33.Final.jar:META-INF/io.netty.versions.properties > [error] > /Users/shashank/.ivy2/cache/io.netty/netty-buffer/jars/netty-buffer-4.0.33.Final.jar:META-INF/io.netty.versions.properties > [error] > /Users/shashank/.ivy2/cache/io.netty/netty-common/jars/netty-common-4.0.33.Final.jar:META-INF/io.netty.versions.properties > [error] > /Users/shashank/.ivy2/cache/io.netty/netty-transport/jars/netty-transport-4.0.33.Final.jar:META-INF/io.netty.versions.properties > [error] > /Users/shashank/.ivy2/cache/io.netty/netty-codec/jars/netty-codec-4.0.33.Final.jar:META-INF/io.netty.versions.properties > [error] Total time: 66 s, completed 5 May, 2017 2:47:03 PM > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (FLINK-6460) Build Fat Jar failed while building with SBT Assembly flink 1.2.1
Shashank Agarwal created FLINK-6460: --- Summary: Build Fat Jar failed while building with SBT Assembly flink 1.2.1 Key: FLINK-6460 URL: https://issues.apache.org/jira/browse/FLINK-6460 Project: Flink Issue Type: Bug Components: Build System, Scala API Affects Versions: 1.2.1 Environment: Ubuntu 16.04, Scala, SBT, Flink 1.2.1 Reporter: Shashank Agarwal While Creating SBT assembly from command {code}sbt clean assembly{code} Getting error deduplicate: different file contents. error log : {code} [error] (root/*:assembly) deduplicate: different file contents found in the following: [error] /Users/shashank/.ivy2/cache/io.netty/netty-handler/jars/netty-handler-4.0.33.Final.jar:META-INF/io.netty.versions.properties [error] /Users/shashank/.ivy2/cache/io.netty/netty-buffer/jars/netty-buffer-4.0.33.Final.jar:META-INF/io.netty.versions.properties [error] /Users/shashank/.ivy2/cache/io.netty/netty-common/jars/netty-common-4.0.33.Final.jar:META-INF/io.netty.versions.properties [error] /Users/shashank/.ivy2/cache/io.netty/netty-transport/jars/netty-transport-4.0.33.Final.jar:META-INF/io.netty.versions.properties [error] /Users/shashank/.ivy2/cache/io.netty/netty-codec/jars/netty-codec-4.0.33.Final.jar:META-INF/io.netty.versions.properties [error] Total time: 66 s, completed 5 May, 2017 2:47:03 PM {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeyedCEP.
[ https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15998019#comment-15998019 ] Shashank Agarwal commented on FLINK-6321: - Hi [~kkl0u] , Actually it's too complicated with kafka streams and custom serializer. Above steps are correct but still I try tpo put some code. I have modified parameter names and some things in code. If you find any issue let me know. {code} object Job { def main(args: Array[String]) { // set up the execution environment val env = StreamExecutionEnvironment.getExecutionEnvironment val propertiesFile = getClass.getClassLoader.getResource("xyz.properties").getPath val parameter = ParameterTool.fromPropertiesFile(propertiesFile) env.getConfig.setGlobalJobParameters(parameter) env.setStateBackend(new RocksDBStateBackend(parameter.get("rocksDBPath"))) // enable fault-tolerance env.enableCheckpointing(1000) env.getCheckpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE) // enable restarts env.setRestartStrategy(RestartStrategies.fixedDelayRestart(50, 500L)) val properties = new Properties() properties.setProperty("bootstrap.servers", parameter.get("kafkaUrl")) properties.setProperty("group.id", parameter.get("kafkaGroupId")) val kafka10 = new FlinkKafkaConsumer010[RawSignal](parameter.get("kafkaBundleName"), new SignalDeserializationSchema(), properties) val stream = env.addSource(kafka10).keyBy(_._someKey.getOrElse(0)) //Creating a pattern for successful event val successOrderPattern = Pattern.begin[RawSignal]("someEvent"). .followedBy("otherEvent") val successOrderPatternStream = CEP.pattern(stream.keyBy((x) => (x._someKey.getOrElse(0), x._someSubKey.getOrElse(0))), successOrderPattern) val ordersStream: DataStream[TransactionSignal] = successOrderPatternStream.select(new TransactionPatternFlatMap) //Put Ip count in the stream with maintaining the state val ipStateStream = ordersStream.keyBy((x) => (x._someKey, x._deviceIp)) .mapWithState((in: OrderSignal, ipState: Option[Int]) => { if(!in._deviceIp.equalsIgnoreCase(parameter.get("defaultIp"))) { val newCount = ipState.getOrElse(0) + 1 val output = in.copy(_numOfOrderSameIp = newCount) (output, Some(newCount)) } else { (in, Some(0)) } } ) ipStateStream.print env.execute("Thirdwatch Mitra") {code} > RocksDB state backend Checkpointing is not working with KeyedCEP. > - > > Key: FLINK-6321 > URL: https://issues.apache.org/jira/browse/FLINK-6321 > Project: Flink > Issue Type: Bug > Components: CEP >Affects Versions: 1.2.0 > Environment: yarn-cluster, RocksDB State backend, Checkpointing every > 1000 ms >Reporter: Shashank Agarwal >Assignee: Kostas Kloudas >Priority: Blocker > Fix For: 1.3.0 > > > Checkpointing is not working with RocksDBStateBackend when using CEP. It's > working fine with FsStateBackend and MemoryStateBackend. Application failing > every-time. > {code} > 04/18/2017 21:53:20 Job execution switched to status FAILING. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint > 46 for operator KeyedCEPPatternOperator -> Map (1/4).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 46 for > operator KeyedCEPPatternOperator -> Map (1/4). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915) > ... 5 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeydCEP.
[ https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15990738#comment-15990738 ] Shashank Agarwal commented on FLINK-6321: - Hi [~kkl0u] , Thanks, Actually i am bit busy with deadlines and traveling also. I can check it but it will take time time. Scenerio what i have done in code for replicate the issue. 1. Created keyedCEPStream. 2. enabled checkpointing. 3. Maintaining keyed-state for a variable. 4. Set rocksDb as state backend. 5. Running that on yarn cluster and this error came. 6. I think you can replicate this with few lines of code. Still required let me know I'll try to test this asap. > RocksDB state backend Checkpointing is not working with KeydCEP. > > > Key: FLINK-6321 > URL: https://issues.apache.org/jira/browse/FLINK-6321 > Project: Flink > Issue Type: Bug > Components: CEP, State Backends, Checkpointing >Affects Versions: 1.2.0 > Environment: yarn-cluster, RocksDB State backend, Checkpointing every > 1000 ms >Reporter: Shashank Agarwal >Assignee: Kostas Kloudas > > Checkpointing is not working with RocksDBStateBackend when using CEP. It's > working fine with FsStateBackend and MemoryStateBackend. Application failing > every-time. > {code} > 04/18/2017 21:53:20 Job execution switched to status FAILING. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint > 46 for operator KeyedCEPPatternOperator -> Map (1/4).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 46 for > operator KeyedCEPPatternOperator -> Map (1/4). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915) > ... 5 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeydCEP.
[ https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15990738#comment-15990738 ] Shashank Agarwal edited comment on FLINK-6321 at 5/1/17 11:38 AM: -- Hi [~kkl0u] , Thanks, Actually i am bit busy with deadlines and traveling also. I can check it but it will take time time. Scenario what i have done in code for replicate the issue. 1. Created keyedCEPStream. 2. enabled checkpointing. 3. Maintaining keyed-state for a variable. 4. Set rocksDb as state backend. 5. Running that on yarn cluster and this error came. 6. I think you can replicate this with few lines of code. Still required let me know I'll try to test this asap. was (Author: shashank734): Hi [~kkl0u] , Thanks, Actually i am bit busy with deadlines and traveling also. I can check it but it will take time time. Scenerio what i have done in code for replicate the issue. 1. Created keyedCEPStream. 2. enabled checkpointing. 3. Maintaining keyed-state for a variable. 4. Set rocksDb as state backend. 5. Running that on yarn cluster and this error came. 6. I think you can replicate this with few lines of code. Still required let me know I'll try to test this asap. > RocksDB state backend Checkpointing is not working with KeydCEP. > > > Key: FLINK-6321 > URL: https://issues.apache.org/jira/browse/FLINK-6321 > Project: Flink > Issue Type: Bug > Components: CEP, State Backends, Checkpointing >Affects Versions: 1.2.0 > Environment: yarn-cluster, RocksDB State backend, Checkpointing every > 1000 ms >Reporter: Shashank Agarwal >Assignee: Kostas Kloudas > > Checkpointing is not working with RocksDBStateBackend when using CEP. It's > working fine with FsStateBackend and MemoryStateBackend. Application failing > every-time. > {code} > 04/18/2017 21:53:20 Job execution switched to status FAILING. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint > 46 for operator KeyedCEPPatternOperator -> Map (1/4).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 46 for > operator KeyedCEPPatternOperator -> Map (1/4). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915) > ... 5 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeydCEP.
[ https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976390#comment-15976390 ] Shashank Agarwal commented on FLINK-6321: - Yes it was failing endlessly, As i had restart policy it was restarting again and again. I had to cancel the job. No it was not able to find the class. > RocksDB state backend Checkpointing is not working with KeydCEP. > > > Key: FLINK-6321 > URL: https://issues.apache.org/jira/browse/FLINK-6321 > Project: Flink > Issue Type: Bug > Components: CEP, State Backends, Checkpointing >Affects Versions: 1.2.0 > Environment: yarn-cluster, RocksDB State backend, Checkpointing every > 1000 ms >Reporter: Shashank Agarwal > > Checkpointing is not working with RocksDBStateBackend when using CEP. It's > working fine with FsStateBackend and MemoryStateBackend. Application failing > every-time. > {code} > 04/18/2017 21:53:20 Job execution switched to status FAILING. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint > 46 for operator KeyedCEPPatternOperator -> Map (1/4).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 46 for > operator KeyedCEPPatternOperator -> Map (1/4). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915) > ... 5 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeydCEP.
[ https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976378#comment-15976378 ] Shashank Agarwal commented on FLINK-6321: - It's not working with RocksDBBackend but it's working with FsStateBackend and MemoryStateBackend. So i switched to FsStateBackend and it's working fine and checkpointing also working fine. These are the error logs i have shared. According to [~aljoscha]. The problem seems to be that NFA.readObject() internally uses a TypeSerializer to read some other stuff and wrapped in those TypeSerializers might be code that tries to resolve classes and there we then don't use the user-code class loader. If you check the comment in issue i have mentioned in previous comment. > RocksDB state backend Checkpointing is not working with KeydCEP. > > > Key: FLINK-6321 > URL: https://issues.apache.org/jira/browse/FLINK-6321 > Project: Flink > Issue Type: Bug > Components: CEP, State Backends, Checkpointing >Affects Versions: 1.2.0 > Environment: yarn-cluster, RocksDB State backend, Checkpointing every > 1000 ms >Reporter: Shashank Agarwal > > Checkpointing is not working with RocksDBStateBackend when using CEP. It's > working fine with FsStateBackend and MemoryStateBackend. Application failing > every-time. > {code} > 04/18/2017 21:53:20 Job execution switched to status FAILING. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint > 46 for operator KeyedCEPPatternOperator -> Map (1/4).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 46 for > operator KeyedCEPPatternOperator -> Map (1/4). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915) > ... 5 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeydCEP.
[ https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976345#comment-15976345 ] Shashank Agarwal commented on FLINK-6321: - I have checked This is printing same Logs which i have reported before also. So may be Duplicate. https://issues.apache.org/jira/browse/FLINK-6318 {code} 04/12/2017 10:05:04 Job execution switched to status FAILING. java.lang.RuntimeException: Could not deserialize NFA. at org.apache.flink.cep.nfa.NFA$Serializer.deserialize(NFA.java:538) at org.apache.flink.cep.nfa.NFA$Serializer.deserialize(NFA.java:469) at org.apache.flink.contrib.streaming.state.RocksDBValueState.value(RocksDBValueState.java:81) at org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.getNFA(AbstractKeyedCEPPatternOperator.java:124) at org.apache.flink.cep.operator.AbstractCEPBasePatternOperator.processElement(AbstractCEPBasePatternOperator.java:72) at org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.processElement(AbstractKeyedCEPPatternOperator.java:162) at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:185) at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:272) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:655) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: co.ronak.nto.Job$$anon$18$$anon$21$$anon$3 at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:626) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:501) at org.apache.flink.api.scala.typeutils.TraversableSerializer.readObject(TraversableSerializer.scala:53) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1707) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:501) at org.apache.flink.cep.NonDuplicatingTypeSerializer.readObject(NonDuplicatingTypeSerializer.java:190) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at ja
[jira] [Updated] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeydCEP.
[ https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Agarwal updated FLINK-6321: Description: Checkpointing is not working with RocksDBStateBackend when using CEP. It's working fine with FsStateBackend and MemoryStateBackend. Application failing every-time. {code} 04/18/2017 21:53:20 Job execution switched to status FAILING. AsynchronousException{java.lang.Exception: Could not materialize checkpoint 46 for operator KeyedCEPPatternOperator -> Map (1/4).} at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Exception: Could not materialize checkpoint 46 for operator KeyedCEPPatternOperator -> Map (1/4). ... 6 more Caused by: java.util.concurrent.CancellationException at java.util.concurrent.FutureTask.report(FutureTask.java:121) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915) ... 5 more {code} was: Checkpointing is not working with RocksDBStateBackend when using CEP. It's working fine with FsStateBackend and MemoryStateBackend. Application failing every-time. ``` 04/18/2017 21:53:20 Job execution switched to status FAILING. AsynchronousException{java.lang.Exception: Could not materialize checkpoint 46 for operator KeyedCEPPatternOperator -> Map (1/4).} at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Exception: Could not materialize checkpoint 46 for operator KeyedCEPPatternOperator -> Map (1/4). ... 6 more Caused by: java.util.concurrent.CancellationException at java.util.concurrent.FutureTask.report(FutureTask.java:121) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915) ... 5 more ``` > RocksDB state backend Checkpointing is not working with KeydCEP. > > > Key: FLINK-6321 > URL: https://issues.apache.org/jira/browse/FLINK-6321 > Project: Flink > Issue Type: Bug > Components: CEP, State Backends, Checkpointing >Affects Versions: 1.2.0 > Environment: yarn-cluster, RocksDB State backend, Checkpointing every > 1000 ms >Reporter: Shashank Agarwal > > Checkpointing is not working with RocksDBStateBackend when using CEP. It's > working fine with FsStateBackend and MemoryStateBackend. Application failing > every-time. > {code} > 04/18/2017 21:53:20 Job execution switched to status FAILING. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint > 46 for operator KeyedCEPPatternOperator -> Map (1/4).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 46 for > operator KeyedCEPPatternOperator -> Map (1/4). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnabl
[jira] [Updated] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeydCEP.
[ https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashank Agarwal updated FLINK-6321: Description: Checkpointing is not working with RocksDBStateBackend when using CEP. It's working fine with FsStateBackend and MemoryStateBackend. Application failing every-time. ``` 04/18/2017 21:53:20 Job execution switched to status FAILING. AsynchronousException{java.lang.Exception: Could not materialize checkpoint 46 for operator KeyedCEPPatternOperator -> Map (1/4).} at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Exception: Could not materialize checkpoint 46 for operator KeyedCEPPatternOperator -> Map (1/4). ... 6 more Caused by: java.util.concurrent.CancellationException at java.util.concurrent.FutureTask.report(FutureTask.java:121) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915) ... 5 more ``` was: Checkpointing is not working with RocksDBStateBackend when using CEP. It's working fine with FsStateBackend and MemoryStateBackend. Application failing every-time. ''' 04/18/2017 21:53:20 Job execution switched to status FAILING. AsynchronousException{java.lang.Exception: Could not materialize checkpoint 46 for operator KeyedCEPPatternOperator -> Map (1/4).} at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Exception: Could not materialize checkpoint 46 for operator KeyedCEPPatternOperator -> Map (1/4). ... 6 more Caused by: java.util.concurrent.CancellationException at java.util.concurrent.FutureTask.report(FutureTask.java:121) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915) ... 5 more ''' > RocksDB state backend Checkpointing is not working with KeydCEP. > > > Key: FLINK-6321 > URL: https://issues.apache.org/jira/browse/FLINK-6321 > Project: Flink > Issue Type: Bug > Components: CEP, State Backends, Checkpointing >Affects Versions: 1.2.0 > Environment: yarn-cluster, RocksDB State backend, Checkpointing every > 1000 ms >Reporter: Shashank Agarwal > > Checkpointing is not working with RocksDBStateBackend when using CEP. It's > working fine with FsStateBackend and MemoryStateBackend. Application failing > every-time. > ``` > 04/18/2017 21:53:20 Job execution switched to status FAILING. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint > 46 for operator KeyedCEPPatternOperator -> Map (1/4).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 46 for > operator KeyedCEPPatternOperator -> Map (1/4). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(Str
[jira] [Created] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeydCEP.
Shashank Agarwal created FLINK-6321: --- Summary: RocksDB state backend Checkpointing is not working with KeydCEP. Key: FLINK-6321 URL: https://issues.apache.org/jira/browse/FLINK-6321 Project: Flink Issue Type: Bug Components: CEP, State Backends, Checkpointing Affects Versions: 1.2.0 Environment: yarn-cluster, RocksDB State backend, Checkpointing every 1000 ms Reporter: Shashank Agarwal Checkpointing is not working with RocksDBStateBackend when using CEP. It's working fine with FsStateBackend and MemoryStateBackend. Application failing every-time. ''' 04/18/2017 21:53:20 Job execution switched to status FAILING. AsynchronousException{java.lang.Exception: Could not materialize checkpoint 46 for operator KeyedCEPPatternOperator -> Map (1/4).} at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Exception: Could not materialize checkpoint 46 for operator KeyedCEPPatternOperator -> Map (1/4). ... 6 more Caused by: java.util.concurrent.CancellationException at java.util.concurrent.FutureTask.report(FutureTask.java:121) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915) ... 5 more ''' -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-4169) CEP Does Not Work with RocksDB StateBackend
[ https://issues.apache.org/jira/browse/FLINK-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966258#comment-15966258 ] Shashank Agarwal commented on FLINK-4169: - Again getting this error in flink 1.2, Running on yarn cluster. After changing state backend from Rocksdb it's working fine. ``` 04/12/2017 10:05:04 Job execution switched to status FAILING. java.lang.RuntimeException: Could not deserialize NFA. at org.apache.flink.cep.nfa.NFA$Serializer.deserialize(NFA.java:538) at org.apache.flink.cep.nfa.NFA$Serializer.deserialize(NFA.java:469) at org.apache.flink.contrib.streaming.state.RocksDBValueState.value(RocksDBValueState.java:81) at org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.getNFA(AbstractKeyedCEPPatternOperator.java:124) at org.apache.flink.cep.operator.AbstractCEPBasePatternOperator.processElement(AbstractCEPBasePatternOperator.java:72) at org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.processElement(AbstractKeyedCEPPatternOperator.java:162) at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:185) at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:272) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:655) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: co.ronak.nto.Job$$anon$18$$anon$21$$anon$3 at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:626) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:501) at org.apache.flink.api.scala.typeutils.TraversableSerializer.readObject(TraversableSerializer.scala:53) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1707) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:501) at org.apache.flink.cep.NonDuplicatingTypeSerializer.readObject(NonDuplicatingTypeSerializer.java:190) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.re