[jira] [Updated] (CASSANDRA-12114) Cassandra startup takes an hour because of java.io.File.listFiles

Tom van der Woerdt (JIRA) Thu, 30 Jun 2016 07:28:35 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-12114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tom van der Woerdt updated CASSANDRA-12114:
-------------------------------------------
    Description: 
A Cassandra cluster of ours has nodes with up to 4TB of data, in a single table 
using leveled compaction having 200k files. While upgrading from 2.2.6 to 3.0.7 
we noticed that it took a while to restart a node. And with "a while" I mean we 
measured it at more than 60 minutes.

jstack shows something interesting :
{code}
"main" #1 prio=5 os_prio=0 tid=0x00007f30db0ea400 nid=0xdb22 runnable 
[0x00007f30de122000]
   java.lang.Thread.State: RUNNABLE
    at java.io.UnixFileSystem.list(Native Method)
    at java.io.File.list(File.java:1122)
    at java.io.File.listFiles(File.java:1248)
    at 
org.apache.cassandra.io.sstable.Descriptor.getTemporaryFiles(Descriptor.java:172)
    at 
org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:599)
    at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:245)
    at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:557)
    at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:685)
{code}

Going by the source of File.listFiles, it puts every file in a directory into 
an array and *then* applies the filter.

This is actually a known Java issue from 1999: 
http://bugs.java.com/view_bug.do?bug_id=4285834 -- their "solution" was to 
introduce new APIs in JRE7. I guess that makes listFiles deprecated for larger 
directories (like when using LeveledCompactionStrategy).


tl;dr: because Cassandra uses java.io.File.listFiles, service startup can take 
an hour for larger data sets.

  was:
A Cassandra cluster of ours has nodes with up to 4TB of data, in a single table 
using leveled compaction having 200k files. While upgrading from 2.2.6 to 3.0.7 
we noticed that it took a while to restart a node. And with "a while" I mean we 
measured it at more than 60 minutes.

jstack shows something interesting :
{code}
"main" #1 prio=5 os_prio=0 tid=0x00007f30db0ea400 nid=0xdb22 runnable 
[0x00007f30de122000]
   java.lang.Thread.State: RUNNABLE
    at java.io.UnixFileSystem.list(Native Method)
    at java.io.File.list(File.java:1122)
    at java.io.File.listFiles(File.java:1248)
    at 
org.apache.cassandra.io.sstable.Descriptor.getTemporaryFiles(Descriptor.java:172)
    at 
org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:599)
    at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:245)
    at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:557)
    at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:685)
{code}

Going by the source of File.listFiles, it puts every file in a directory into 
an array and *then* applies the filter.

This is actually a known Java issue from 1999: 
http://bugs.java.com/view_bug.do;jsessionid=db7fcf25bcce13541c4289edeb4?bug_id=4285834
 -- their "solution" was to introduce new APIs in JRE7. I guess that makes 
listFiles deprecated for larger directories (like when using 
LeveledCompactionStrategy).


tl;dr: because Cassandra uses java.io.File.listFiles, service startup can take 
an hour for larger data sets.


> Cassandra startup takes an hour because of java.io.File.listFiles
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-12114
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12114
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Tom van der Woerdt
>
> A Cassandra cluster of ours has nodes with up to 4TB of data, in a single 
> table using leveled compaction having 200k files. While upgrading from 2.2.6 
> to 3.0.7 we noticed that it took a while to restart a node. And with "a 
> while" I mean we measured it at more than 60 minutes.
> jstack shows something interesting :
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x00007f30db0ea400 nid=0xdb22 runnable 
> [0x00007f30de122000]
>    java.lang.Thread.State: RUNNABLE
>     at java.io.UnixFileSystem.list(Native Method)
>     at java.io.File.list(File.java:1122)
>     at java.io.File.listFiles(File.java:1248)
>     at 
> org.apache.cassandra.io.sstable.Descriptor.getTemporaryFiles(Descriptor.java:172)
>     at 
> org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:599)
>     at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:245)
>     at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:557)
>     at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:685)
> {code}
> Going by the source of File.listFiles, it puts every file in a directory into 
> an array and *then* applies the filter.
> This is actually a known Java issue from 1999: 
> http://bugs.java.com/view_bug.do?bug_id=4285834 -- their "solution" was to 
> introduce new APIs in JRE7. I guess that makes listFiles deprecated for 
> larger directories (like when using LeveledCompactionStrategy).
> tl;dr: because Cassandra uses java.io.File.listFiles, service startup can 
> take an hour for larger data sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-12114) Cassandra startup takes an hour because of java.io.File.listFiles

Reply via email to