Re: rocksdb max open file descriptor issue crashed application
Hi Apoorv, I am not so familiar with the internal of RocksDB and how the number of open files correlates with the number of (keyed) states and the parallelism you have, but as a starting point you can have a look to [1] for recommendations on how to tune RocksDb for large state and I am also cc'ing Andrey who may have some more knowledge on the topic. [1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#incremental-checkpoints Cheers, Kostas On Wed, Feb 12, 2020 at 7:55 AM Apoorv Upadhyay wrote: > > Hi, > > Below is the error I am getting : > > 2020-02-08 05:40:24,543 INFO org.apache.flink.runtime.taskmanager.Task > - order-steamBy-api-order-ip (3/6) > (34c7b05d5a75dbbcc5718acf6b18) switched from RUNNING to CANCELING. > 2020-02-08 05:40:24,543 INFO org.apache.flink.runtime.taskmanager.Task > - Triggering cancellation of task code > order-steamBy-api-order-ip (3/6) (34c7b05d5a75dbbcc5718acf6b18). > 2020-02-08 05:40:24,543 ERROR > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder - > Caught unexpected exception. > java.io.IOException: Error while opening RocksDB instance. > at > org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:74) > at > org.apache.flink.contrib.streaming.state.restore.AbstractRocksDBRestoreOperation.openDB(AbstractRocksDBRestoreOperation.java:131) > at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:214) > at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188) > at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162) > at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:148) > at > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:268) > at > org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:520) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:740) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:291) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.rocksdb.RocksDBException: While open directory: > /hadoop/yarn/local/usercache/flink/appcache/application_1580464300238_0045/flink-io-d947dea6-270b-44c0-94ca-4a49dbf02f52/job_97167effbb11a8e9ffcba36be7e4da80_op_CoStreamFlatMap_51abbbda2947171827fd9e53509c2fb4__4_6__uuid_3f8c7b20-6d17-43ad-a016-8d08f7ed9d50/db: > Too many open files > at org.rocksdb.RocksDB.open(Native Method) > at org.rocksdb.RocksDB.open(RocksDB.java:286) > at > org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:66) > ... 17 more > 2020-02-08 05:40:24,544 INFO org.apache.flink.runtime.taskmanager.Task > - order-status-mapping-join (4/6) > (4409b4e2d93f0441100f0f1575a1dcb9) switched from CANCELING to CANCELED. > 2020-02-08 05:40:24,544 INFO org.apache.flink.runtime.taskmanager.Task > - Freeing task resources for order-status-mapping-join (4/6) > (4409b4e2d93f0441100f0f1575a1dcb9). > 2020-02-08 05:40:24,543 ERROR > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder - > Caught unexpected exception. > java.io.IOException: Error while opening RocksDB instance. > at > org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:74) > at > org.apache.flink.contrib.streaming.state.restore.AbstractRocksDBRestoreOperation.openDB(AbstractRocksDBRestoreOperation.java:131) > at >
Re: rocksdb max open file descriptor issue crashed application
Hi, Below is the error I am getting : 2020-02-08 05:40:24,543 INFO org.apache.flink.runtime.taskmanager.Task - order-steamBy-api-order-ip (3/6) (34c7b05d5a75dbbcc5718acf6b18) switched from RUNNING to CANCELING. 2020-02-08 05:40:24,543 INFO org.apache.flink.runtime.taskmanager.Task - Triggering cancellation of task code order-steamBy-api-order-ip (3/6) (34c7b05d5a75dbbcc5718acf6b18). 2020-02-08 05:40:24,543 ERROR org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder - Caught unexpected exception. java.io.IOException: Error while opening RocksDB instance. at org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:74) at org.apache.flink.contrib.streaming.state.restore.AbstractRocksDBRestoreOperation.openDB(AbstractRocksDBRestoreOperation.java:131) at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:214) at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188) at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162) at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:148) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:268) at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:520) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291) at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142) at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:740) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:291) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) at java.lang.Thread.run(Thread.java:745) Caused by: org.rocksdb.RocksDBException: While open directory: /hadoop/yarn/local/usercache/flink/appcache/application_1580464300238_0045/flink-io-d947dea6-270b-44c0-94ca-4a49dbf02f52/job_97167effbb11a8e9ffcba36be7e4da80_op_CoStreamFlatMap_51abbbda2947171827fd9e53509c2fb4__4_6__uuid_3f8c7b20-6d17-43ad-a016-8d08f7ed9d50/db: Too many open files at org.rocksdb.RocksDB.open(Native Method) at org.rocksdb.RocksDB.open(RocksDB.java:286) at org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:66) ... 17 more 2020-02-08 05:40:24,544 INFO org.apache.flink.runtime.taskmanager.Task - order-status-mapping-join (4/6) (4409b4e2d93f0441100f0f1575a1dcb9) switched from CANCELING to CANCELED. 2020-02-08 05:40:24,544 INFO org.apache.flink.runtime.taskmanager.Task - Freeing task resources for order-status-mapping-join (4/6) (4409b4e2d93f0441100f0f1575a1dcb9). 2020-02-08 05:40:24,543 ERROR org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder - Caught unexpected exception. java.io.IOException: Error while opening RocksDB instance. at org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:74) at org.apache.flink.contrib.streaming.state.restore.AbstractRocksDBRestoreOperation.openDB(AbstractRocksDBRestoreOperation.java:131) at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:214) at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188) at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162) at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:148) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:268) at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:520) at
Re: rocksdb max open file descriptor issue crashed application
Hi >From the given description, you use RocksDBStateBackend, and will always open 20k files in one machine, and app suddenly opened 35K files than crashed. Could you please share what are the opened files? and what the exception (given the full taskmanager.log maybe helpful) Best, Congxian ApoorvK 于2020年2月11日周二 下午5:22写道: > flink app is crashing due to "too many file opens" issue , currently app is > having 300 operator and 60GB is the state size. suddenly app is opening 35k > around files which was 20k few weeks before, hence app is crashing, I have > updated the machine as well as yarn limit to 60k hoping it will not crash > again. > Please suggest if there is any alternative solution for this > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >
rocksdb max open file descriptor issue crashed application
flink app is crashing due to "too many file opens" issue , currently app is having 300 operator and 60GB is the state size. suddenly app is opening 35k around files which was 20k few weeks before, hence app is crashing, I have updated the machine as well as yarn limit to 60k hoping it will not crash again. Please suggest if there is any alternative solution for this -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/