Jackson Chung created CASSANDRA-8874: ----------------------------------------
Summary: running out of FD, and causing clients hang when dropping a keyspace with many CF with many sstables Key: CASSANDRA-8874 URL: https://issues.apache.org/jira/browse/CASSANDRA-8874 Project: Cassandra Issue Type: Bug Reporter: Jackson Chung we already set number of file descriptors to 100000 for c* usage, and confirmed that from /proc/$cass_pid/limits we have 16 nodes, 2 DC, each node stores about 600GB to 1TB data; ec2, i2-2xl instances, raid0 the 2 disks we use both hector and datastax drivers, and there are many clients connecting to the cluster. 1 day we dropped a keyspace (that our app no longer use), which has a good amount of CFs, with some of them use leveledbcompaction and have some good amount of sstables... and our app went down. CPU/load avg were high and we couldn't even ssh to them. We have to force a reboot, and restart 2 of the C*, that was filled (hundreds of thousands) of errors of "too many open files" C* 2.0.11 {noformat}$ grep -ic "caused by.*too many open file" system.log.* system.log.1:0 system.log.10:18659 system.log.11:17539 system.log.12:18941 system.log.13:18936 system.log.14:18601 system.log.15:18933 system.log.16:18937 system.log.17:18954 system.log.18:18892 system.log.19:18942 system.log.2:0 system.log.20:18977 system.log.21:18977 system.log.22:18852 system.log.23:18978 system.log.24:18978 system.log.25:18978 system.log.26:18978 system.log.27:18978 system.log.28:18978 system.log.29:18978 system.log.3:654 system.log.30:18978 system.log.31:18978 system.log.32:18978 system.log.33:18977 system.log.34:18978 system.log.35:18978 system.log.36:17943 system.log.37:18867 system.log.38:15082 system.log.39:17766 system.log.4:17932 system.log.40:18029 system.log.41:18890 system.log.42:18048 system.log.43:18812 system.log.44:18787 system.log.45:18962 system.log.46:18978 system.log.47:18978 system.log.48:18978 system.log.49:18978 system.log.5:15284 system.log.50:18978 system.log.6:17180 system.log.7:17286 system.log.8:18651 system.log.9:17720 {noformat} all the logs are from that day.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)