Re: rocksdb max open file descriptor issue crashed application

2020-02-12 Thread Kostas Kloudas
Hi Apoorv,

I am not so familiar with the internal of RocksDB and how the number
of open files correlates with the number of (keyed) states and the
parallelism you have, but as a starting point you can have a look to
[1] for recommendations on how to tune RocksDb for large state and I
am also cc'ing Andrey who may have some more knowledge on the topic.

[1] 
https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#incremental-checkpoints

Cheers,
Kostas

On Wed, Feb 12, 2020 at 7:55 AM Apoorv Upadhyay
 wrote:
>
> Hi,
>
> Below is the error I am getting :
>
> 2020-02-08 05:40:24,543 INFO  org.apache.flink.runtime.taskmanager.Task   
>  - order-steamBy-api-order-ip (3/6) 
> (34c7b05d5a75dbbcc5718acf6b18) switched from RUNNING to CANCELING.
> 2020-02-08 05:40:24,543 INFO  org.apache.flink.runtime.taskmanager.Task   
>  - Triggering cancellation of task code 
> order-steamBy-api-order-ip (3/6) (34c7b05d5a75dbbcc5718acf6b18).
> 2020-02-08 05:40:24,543 ERROR 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder  - 
> Caught unexpected exception.
> java.io.IOException: Error while opening RocksDB instance.
> at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:74)
> at 
> org.apache.flink.contrib.streaming.state.restore.AbstractRocksDBRestoreOperation.openDB(AbstractRocksDBRestoreOperation.java:131)
> at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:214)
> at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188)
> at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162)
> at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:148)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:268)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:520)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:740)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:291)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.rocksdb.RocksDBException: While open directory: 
> /hadoop/yarn/local/usercache/flink/appcache/application_1580464300238_0045/flink-io-d947dea6-270b-44c0-94ca-4a49dbf02f52/job_97167effbb11a8e9ffcba36be7e4da80_op_CoStreamFlatMap_51abbbda2947171827fd9e53509c2fb4__4_6__uuid_3f8c7b20-6d17-43ad-a016-8d08f7ed9d50/db:
>  Too many open files
> at org.rocksdb.RocksDB.open(Native Method)
> at org.rocksdb.RocksDB.open(RocksDB.java:286)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:66)
> ... 17 more
> 2020-02-08 05:40:24,544 INFO  org.apache.flink.runtime.taskmanager.Task   
>  - order-status-mapping-join (4/6) 
> (4409b4e2d93f0441100f0f1575a1dcb9) switched from CANCELING to CANCELED.
> 2020-02-08 05:40:24,544 INFO  org.apache.flink.runtime.taskmanager.Task   
>  - Freeing task resources for order-status-mapping-join (4/6) 
> (4409b4e2d93f0441100f0f1575a1dcb9).
> 2020-02-08 05:40:24,543 ERROR 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder  - 
> Caught unexpected exception.
> java.io.IOException: Error while opening RocksDB instance.
> at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:74)
> at 
> org.apache.flink.contrib.streaming.state.restore.AbstractRocksDBRestoreOperation.openDB(AbstractRocksDBRestoreOperation.java:131)
> at 
> 

Re: rocksdb max open file descriptor issue crashed application

2020-02-11 Thread Apoorv Upadhyay
Hi,

Below is the error I am getting :

2020-02-08 05:40:24,543 INFO  org.apache.flink.runtime.taskmanager.Task
   - order-steamBy-api-order-ip (3/6)
(34c7b05d5a75dbbcc5718acf6b18) switched from RUNNING to CANCELING.
2020-02-08 05:40:24,543 INFO  org.apache.flink.runtime.taskmanager.Task
   - Triggering cancellation of task code
order-steamBy-api-order-ip (3/6) (34c7b05d5a75dbbcc5718acf6b18).
2020-02-08 05:40:24,543 ERROR
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder  -
Caught unexpected exception.
java.io.IOException: Error while opening RocksDB instance.
at
org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:74)
at
org.apache.flink.contrib.streaming.state.restore.AbstractRocksDBRestoreOperation.openDB(AbstractRocksDBRestoreOperation.java:131)
at
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:214)
at
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188)
at
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162)
at
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:148)
at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:268)
at
org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:520)
at
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
at
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
at
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
at
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
at
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:740)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:291)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.rocksdb.RocksDBException: While open directory:
/hadoop/yarn/local/usercache/flink/appcache/application_1580464300238_0045/flink-io-d947dea6-270b-44c0-94ca-4a49dbf02f52/job_97167effbb11a8e9ffcba36be7e4da80_op_CoStreamFlatMap_51abbbda2947171827fd9e53509c2fb4__4_6__uuid_3f8c7b20-6d17-43ad-a016-8d08f7ed9d50/db:
Too many open files
at org.rocksdb.RocksDB.open(Native Method)
at org.rocksdb.RocksDB.open(RocksDB.java:286)
at
org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:66)
... 17 more
2020-02-08 05:40:24,544 INFO  org.apache.flink.runtime.taskmanager.Task
   - order-status-mapping-join (4/6)
(4409b4e2d93f0441100f0f1575a1dcb9) switched from CANCELING to CANCELED.
2020-02-08 05:40:24,544 INFO  org.apache.flink.runtime.taskmanager.Task
   - Freeing task resources for order-status-mapping-join (4/6)
(4409b4e2d93f0441100f0f1575a1dcb9).
2020-02-08 05:40:24,543 ERROR
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder  -
Caught unexpected exception.
java.io.IOException: Error while opening RocksDB instance.
at
org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:74)
at
org.apache.flink.contrib.streaming.state.restore.AbstractRocksDBRestoreOperation.openDB(AbstractRocksDBRestoreOperation.java:131)
at
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:214)
at
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188)
at
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162)
at
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:148)
at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:268)
at
org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:520)
at

Re: rocksdb max open file descriptor issue crashed application

2020-02-11 Thread Congxian Qiu
Hi
>From the given description, you use RocksDBStateBackend, and will always
open 20k files in one machine, and app suddenly opened 35K files than
crashed.
Could you please share what are the opened files?   and what the exception
(given the full taskmanager.log maybe helpful)

Best,
Congxian


ApoorvK  于2020年2月11日周二 下午5:22写道:

> flink app is crashing due to "too many file opens" issue , currently app is
> having 300 operator and 60GB is the state size. suddenly app is opening 35k
> around files which was 20k few weeks before, hence app is crashing, I have
> updated the machine as well as yarn limit to 60k hoping it will not crash
> again.
> Please suggest if there is any alternative solution for this
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>


rocksdb max open file descriptor issue crashed application

2020-02-11 Thread ApoorvK
flink app is crashing due to "too many file opens" issue , currently app is
having 300 operator and 60GB is the state size. suddenly app is opening 35k
around files which was 20k few weeks before, hence app is crashing, I have
updated the machine as well as yarn limit to 60k hoping it will not crash
again.
Please suggest if there is any alternative solution for this



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/