[ 
https://issues.apache.org/jira/browse/CASSANDRA-11594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427488#comment-15427488
 ] 

Stefania commented on CASSANDRA-11594:
--------------------------------------

I've assigned this ticket to myself, what I plan to do is create a distributed 
test with the same topology and schema, that runs repair and monitors the file 
descriptors, cc [~cassandra-te] in case they have resources to assist with this.

If we cannot reproduce it that way, we could analyze a heap dump taken while 
the problem occurs. We would search for any file input stream instances and 
look at the GC roots. Let us know if you would be able to share a heap dump 
[~n0rad] as that might speed things up.


> Too many open files on directories
> ----------------------------------
>
>                 Key: CASSANDRA-11594
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11594
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: n0rad
>            Assignee: Stefania
>            Priority: Critical
>         Attachments: openfiles.zip, screenshot.png
>
>
> I have a 6 nodes cluster in prod in 3 racks.
> each node :
> - 4Gb commitlogs on 343 files
> - 275Gb data on 504 files 
> On saturday, 1 node in each rack crash with with too many open files (seems 
> to be the similar node in each rack).
> {code}
> lsof -n -p $PID give me 66899 out of 65826 max
> {code}
> it contains 64527 open directories (2371 uniq)
> a part of the list :
> {code}
> java    19076 root 2140r      DIR   8,17      143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java    19076 root 2141r      DIR   8,17      143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java    19076 root 2142r      DIR   8,17      143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java    19076 root 2143r      DIR   8,17      143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java    19076 root 2144r      DIR   8,17      143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java    19076 root 2145r      DIR   8,17      143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java    19076 root 2146r      DIR   8,17      143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java    19076 root 2147r      DIR   8,17      143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java    19076 root 2148r      DIR   8,17      143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java    19076 root 2149r      DIR   8,17      143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java    19076 root 2150r      DIR   8,17      143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java    19076 root 2151r      DIR   8,17      143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java    19076 root 2152r      DIR   8,17      143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java    19076 root 2153r      DIR   8,17      143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java    19076 root 2154r      DIR   8,17      143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> java    19076 root 2155r      DIR   8,17      143360 4386718705 
> /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
> {code}
> The 3 others nodes crashes 4 hours later



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to