[ https://issues.apache.org/jira/browse/CASSANDRA-8874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Philip Thompson updated CASSANDRA-8874: --------------------------------------- Reproduced In: 2.0.11 Fix Version/s: 2.0.13 > running out of FD, and causing clients hang when dropping a keyspace with > many CF with many sstables > ---------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-8874 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8874 > Project: Cassandra > Issue Type: Bug > Reporter: Jackson Chung > Fix For: 2.0.13 > > > we already set number of file descriptors to 100000 for c* usage, and > confirmed that from /proc/$cass_pid/limits > we have 16 nodes, 2 DC, each node stores about 600GB to 1TB data; ec2, i2-2xl > instances, raid0 the 2 disks > we use both hector and datastax drivers, and there are many clients > connecting to the cluster. > 1 day we dropped a keyspace (that our app no longer use), which has a good > amount of CFs, with some of them use leveledbcompaction and have some good > amount of sstables... and our app went down. CPU/load avg were high and we > couldn't even ssh to them. We have to force a reboot, and restart 2 of the > C*, that was filled (hundreds of thousands) of errors of "too many open files" > C* 2.0.11 > {noformat}$ grep -ic "caused by.*too many open file" system.log.* > system.log.1:0 > system.log.10:18659 > system.log.11:17539 > system.log.12:18941 > system.log.13:18936 > system.log.14:18601 > system.log.15:18933 > system.log.16:18937 > system.log.17:18954 > system.log.18:18892 > system.log.19:18942 > system.log.2:0 > system.log.20:18977 > system.log.21:18977 > system.log.22:18852 > system.log.23:18978 > system.log.24:18978 > system.log.25:18978 > system.log.26:18978 > system.log.27:18978 > system.log.28:18978 > system.log.29:18978 > system.log.3:654 > system.log.30:18978 > system.log.31:18978 > system.log.32:18978 > system.log.33:18977 > system.log.34:18978 > system.log.35:18978 > system.log.36:17943 > system.log.37:18867 > system.log.38:15082 > system.log.39:17766 > system.log.4:17932 > system.log.40:18029 > system.log.41:18890 > system.log.42:18048 > system.log.43:18812 > system.log.44:18787 > system.log.45:18962 > system.log.46:18978 > system.log.47:18978 > system.log.48:18978 > system.log.49:18978 > system.log.5:15284 > system.log.50:18978 > system.log.6:17180 > system.log.7:17286 > system.log.8:18651 > system.log.9:17720 > {noformat} > all the logs are from that day.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)