Jackson Chung created CASSANDRA-8874:
----------------------------------------

             Summary: running out of FD, and causing clients hang when dropping 
a keyspace with many CF with many sstables
                 Key: CASSANDRA-8874
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8874
             Project: Cassandra
          Issue Type: Bug
            Reporter: Jackson Chung


we already set number of file descriptors to 100000 for c* usage, and confirmed 
that from /proc/$cass_pid/limits

we have 16 nodes, 2 DC, each node stores about 600GB to 1TB data; ec2, i2-2xl 
instances, raid0 the 2 disks

we use both hector and datastax drivers, and there are many clients connecting 
to the cluster.

1 day we dropped a keyspace (that our app no longer use), which has a good 
amount of CFs, with some of them use leveledbcompaction and have some good 
amount of sstables... and our app went down. CPU/load avg were high and we 
couldn't even ssh to them. We have to force a reboot, and restart 2 of the C*, 
that was filled (hundreds of thousands) of errors of "too many open files"

C* 2.0.11

{noformat}$ grep -ic "caused by.*too many open file" system.log.*
system.log.1:0
system.log.10:18659
system.log.11:17539
system.log.12:18941
system.log.13:18936
system.log.14:18601
system.log.15:18933
system.log.16:18937
system.log.17:18954
system.log.18:18892
system.log.19:18942
system.log.2:0
system.log.20:18977
system.log.21:18977
system.log.22:18852
system.log.23:18978
system.log.24:18978
system.log.25:18978
system.log.26:18978
system.log.27:18978
system.log.28:18978
system.log.29:18978
system.log.3:654
system.log.30:18978
system.log.31:18978
system.log.32:18978
system.log.33:18977
system.log.34:18978
system.log.35:18978
system.log.36:17943
system.log.37:18867
system.log.38:15082
system.log.39:17766
system.log.4:17932
system.log.40:18029
system.log.41:18890
system.log.42:18048
system.log.43:18812
system.log.44:18787
system.log.45:18962
system.log.46:18978
system.log.47:18978
system.log.48:18978
system.log.49:18978
system.log.5:15284
system.log.50:18978
system.log.6:17180
system.log.7:17286
system.log.8:18651
system.log.9:17720
{noformat}

all the logs are from that day..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to