Narayanan Arunachalam created FLINK-9291: --------------------------------------------
Summary: Checkpoint failure (CIRCULAR REFERENCE:java.lang.NegativeArraySizeException) Key: FLINK-9291 URL: https://issues.apache.org/jira/browse/FLINK-9291 Project: Flink Issue Type: Bug Reporter: Narayanan Arunachalam Using rocksdb for state and after running for few hours, checkpointing fails with the following error. The job recovers fine after this. AsynchronousException\{java.lang.Exception: Could not materialize checkpoint 215 for operator makeSalpTrace -> countTraces -> makeZipkinTrace -> (Map -> Sink: bs, Sink: es) (14/80).} at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:948) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.Exception: Could not materialize checkpoint 215 for operator makeSalpTrace -> countTraces -> makeZipkinTrace -> (Map -> Sink: bs, Sink: es) (14/80). ... 6 more Caused by: java.util.concurrent.ExecutionException: java.lang.NegativeArraySizeException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:894) ... 5 more Suppressed: java.lang.Exception: Could not properly cancel managed keyed state future. at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:91) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:976) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:939) ... 5 more Caused by: java.util.concurrent.ExecutionException: java.lang.NegativeArraySizeException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) at org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:66) at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:89) ... 7 more Caused by: java.lang.NegativeArraySizeException at org.rocksdb.RocksIterator.value0(Native Method) at org.rocksdb.RocksIterator.value(RocksIterator.java:50) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBMergeIterator.value(RocksDBKeyedStateBackend.java:1898) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullSnapshotOperation.writeKVStateData(RocksDBKeyedStateBackend.java:704) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullSnapshotOperation.writeDBSnapshot(RocksDBKeyedStateBackend.java:556) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$3.performOperation(RocksDBKeyedStateBackend.java:466) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$3.performOperation(RocksDBKeyedStateBackend.java:424) at org.apache.flink.runtime.io.async.AbstractAsyncCallableWithResources.call(AbstractAsyncCallableWithResources.java:75) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:894) ... 5 more [CIRCULAR REFERENCE:java.lang.NegativeArraySizeException] -- This message was sent by Atlassian JIRA (v7.6.3#76005)