[ https://issues.apache.org/jira/browse/CASSANDRA-6275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gianluca Borello updated CASSANDRA-6275: ---------------------------------------- Attachment: leak.log We are experiencing a similar issue in 2.0.2. It started happening after we set a TTL for all our columns in a very limited datastore (just a few GBs). We can easily see the fd count rapidly increase to 100000+, and the majority of fds are (from lsof): {noformat} java 13168 cassandra 267r REG 9,0 273129 671089723 /raid0/cassandra/data/draios/process_counters_by_exe/draios-process_counters_by_exe-jb-231-Data.db (deleted) java 13168 cassandra 268r REG 9,0 273129 671089723 /raid0/cassandra/data/draios/process_counters_by_exe/draios-process_counters_by_exe-jb-231-Data.db (deleted) java 13168 cassandra 269r REG 9,0 273129 671089723 /raid0/cassandra/data/draios/process_counters_by_exe/draios-process_counters_by_exe-jb-231-Data.db (deleted) java 13168 cassandra 270r REG 9,0 273129 671089723 /raid0/cassandra/data/draios/process_counters_by_exe/draios-process_counters_by_exe-jb-231-Data.db (deleted) java 13168 cassandra 271r REG 9,0 273129 671089723 /raid0/cassandra/data/draios/process_counters_by_exe/draios-process_counters_by_exe-jb-231-Data.db (deleted) java 13168 cassandra 272r REG 9,0 273129 671089723 /raid0/cassandra/data/draios/process_counters_by_exe/draios-process_counters_by_exe-jb-231-Data.db (deleted) java 13168 cassandra 273r REG 9,0 273129 671089723 /raid0/cassandra/data/draios/process_counters_by_exe/draios-process_counters_by_exe-jb-231-Data.db (deleted) {noformat} I'm attaching the log of the exception. You can see the exceptions, and then Cassandra eventually shuts down. We had to temporarily downgrade to 1.2.11 > 2.0.x leaks file handles > ------------------------ > > Key: CASSANDRA-6275 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6275 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: java version "1.7.0_25" > Java(TM) SE Runtime Environment (build 1.7.0_25-b15) > Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode) > Linux cassandra-test1 2.6.32-279.el6.x86_64 #1 SMP Thu Jun 21 15:00:18 EDT > 2012 x86_64 x86_64 x86_64 GNU/Linux > Reporter: Mikhail Mazursky > Attachments: cassandra_jstack.txt, leak.log, slog.gz > > > Looks like C* is leaking file descriptors when doing lots of CAS operations. > {noformat} > $ sudo cat /proc/15455/limits > Limit Soft Limit Hard Limit Units > Max cpu time unlimited unlimited seconds > Max file size unlimited unlimited bytes > Max data size unlimited unlimited bytes > Max stack size 10485760 unlimited bytes > Max core file size 0 0 bytes > Max resident set unlimited unlimited bytes > Max processes 1024 unlimited processes > Max open files 4096 4096 files > Max locked memory unlimited unlimited bytes > Max address space unlimited unlimited bytes > Max file locks unlimited unlimited locks > Max pending signals 14633 14633 signals > Max msgqueue size 819200 819200 bytes > Max nice priority 0 0 > Max realtime priority 0 0 > Max realtime timeout unlimited unlimited us > {noformat} > Looks like the problem is not in limits. > Before load test: > {noformat} > cassandra-test0 ~]$ lsof -n | grep java | wc -l > 166 > cassandra-test1 ~]$ lsof -n | grep java | wc -l > 164 > cassandra-test2 ~]$ lsof -n | grep java | wc -l > 180 > {noformat} > After load test: > {noformat} > cassandra-test0 ~]$ lsof -n | grep java | wc -l > 967 > cassandra-test1 ~]$ lsof -n | grep java | wc -l > 1766 > cassandra-test2 ~]$ lsof -n | grep java | wc -l > 2578 > {noformat} > Most opened files have names like: > {noformat} > java 16890 cassandra 1636r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > java 16890 cassandra 1637r REG 202,17 161158485 > 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db > java 16890 cassandra 1638r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > java 16890 cassandra 1639r REG 202,17 161158485 > 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db > java 16890 cassandra 1640r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > java 16890 cassandra 1641r REG 202,17 161158485 > 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db > java 16890 cassandra 1642r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > java 16890 cassandra 1643r REG 202,17 161158485 > 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db > java 16890 cassandra 1644r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > java 16890 cassandra 1645r REG 202,17 161158485 > 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db > java 16890 cassandra 1646r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > java 16890 cassandra 1647r REG 202,17 161158485 > 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db > java 16890 cassandra 1648r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > java 16890 cassandra 1649r REG 202,17 161158485 > 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db > java 16890 cassandra 1650r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > java 16890 cassandra 1651r REG 202,17 161158485 > 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db > java 16890 cassandra 1652r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > java 16890 cassandra 1653r REG 202,17 161158485 > 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db > java 16890 cassandra 1654r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > java 16890 cassandra 1655r REG 202,17 161158485 > 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db > java 16890 cassandra 1656r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > {noformat} > Also, when that happens it's not always possible to shutdown server process > via SIGTERM. Have to use SIGKILL. > p.s. See mailing thread for more context information > https://www.mail-archive.com/user@cassandra.apache.org/msg33035.html -- This message was sent by Atlassian JIRA (v6.1#6144)