[ https://issues.apache.org/jira/browse/CASSANDRA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Edward Capriolo updated CASSANDRA-1526: --------------------------------------- Attachment: snitcherror.txt Dynamic snitch causes failure with patch. > Make cassandra sampling and startup faster > ------------------------------------------ > > Key: CASSANDRA-1526 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1526 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Edward Capriolo > Assignee: Stu Hood > Priority: Minor > Fix For: 0.6.7, 0.7.0 > > Attachments: > 0.6-0001-Add-AggregateFuture-to-wait-for-a-batch-of-futures-a.patch, > 0.6-0002-Parallelize-SSTable-open.patch, 1526.txt, cpu.txt, io.txt, > snitcherror.txt > > > http://wiki.apache.org/cassandra/CassandraHardware makes mention of very > large disks I do not see how that would be possible. > We have a server class system have 4x processors 16GB RAM a 6 DISK RAID5 (yes > RAID0 would be faster but still) > {noformat} > INFO [main] 2010-09-21 12:58:26,348 SSTableReader.java (line 120) Sampling > index for /var/lib/cassandra/data/system/LocationInfo-699-Data.db > ... > INFO [main] 2010-09-21 13:05:51,333 CassandraDaemon.java (line 124) Binding > thrift service to cdbsd07/10.71.71.57:9160 > {noformat} > This node has 200GB of data in two column families and the time to sample all > tables and startup is 7+ minutes. The logging suggests this process is > happening a single SSTable at a time. Additionally the normal system vitals > mainly DISK and CPU do not look overtaxed. > * Since SSTables are immutable is there a way the sampling of the tables > could be saved? > * Could this process be done in parallel for speedup? > * Can multiple column families be processed at once? > Unless someone has an insanely powerful disk pack making mention of 2TB > limitations seem out of place. Unless my calculations are wrong (which they > usually are), I have a pretty decent hardware, and if I had 2 TB of data I > would have a 95 minute node start up? > I hope that maybe sampling multiple ColumnFamilies at once would make nodes > of at least a few hundred GB startup reasonably fast. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.