[ https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597381#comment-14597381 ]
Marcus Eriksson edited comment on CASSANDRA-8460 at 6/23/15 9:04 AM: --------------------------------------------------------------------- bq. 1) If compaction strategy calls for archive, but no archive disk is available (not defined or otherwise full), I'm falling back to standard disk. Agree? Can't we check before starting an archive compaction if there are any archive locations available? If there are none, we shouldn't compact, right? bq. 2) I originally planned to explicitly prohibit compaction of N files in archival disk, but I couldn't convince myself if that made sense. Instead, I'm allowing it if sstable_max_age_days allows it (if you set archive lower than max age, you could conceivably compact on archival disk tier). Agree? The way I originally envisioned this was that once an sstable hits max_sstable_age_days, we trigger a compaction that puts it on the slow disk, and then we never need to look at those sstables again (unless they eventually expire due to TTL). The idea behind max_sstable_age_days is that this is the point where we don't expect to do many reads anymore, so it would also be a good point to put them on slow disks I guess it could be a problem if users increase max_sstable_age_days and we move the data back to the fast disks though, thoughts? bq. 3) In the case where archived sstables can still be compacted, it's possible in some windows to have them compacted with sstables on the faster standard disk. In those cases, I'm making a judgement call that if any of the source sstables were archived, the resulting sstable will also be archived. Agree? As in 2), I think we should never compact the sstables on the slow disks. bq. 4) Finally, I was trying to determine the right way to tell if an sstable was already archived. The logic I eventually used was simply parsing the path of the sstable and seeing if it was in the array of archive directories ( https://github.com/jeffjirsa/cassandra/commit/079b22136d178937b28b82326f132e33e96f6cad#diff-894e091348f28001de5b7fe88e65733fR1665 ) . I'm not convinced this is best, but I didn't know if it was appropriate to extend sstablemetadata or similar to avoid this. Thoughts? We do something similar in Directories.java: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Directories.java#L242 - you should probably check absolute paths and use startsWith? was (Author: krummas): bq. 1) If compaction strategy calls for archive, but no archive disk is available (not defined or otherwise full), I'm falling back to standard disk. Agree? Can't we check before starting an archive compaction if there are any archive locations available? If there are none, we shouldn't compact, right? bq. 2) I originally planned to explicitly prohibit compaction of N files in archival disk, but I couldn't convince myself if that made sense. Instead, I'm allowing it if sstable_max_age_days allows it (if you set archive lower than max age, you could conceivably compact on archival disk tier). Agree? The way I originally envisioned this was that once an sstable hits max_sstable_age_days, we trigger a compaction that puts it on the slow disk, and then we never need to look at those sstables again (unless they eventually expire due to TTL). The idea behind max_sstable_age_days is that this is the point where we don't expect to do many reads anymore, so it would also be a good point to put them on slow disks I guess it could be a problem if users increase max_sstable_age_days and we move the data back to the fast disks though, thoughts? 3) In the case where archived sstables can still be compacted, it's possible in some windows to have them compacted with sstables on the faster standard disk. In those cases, I'm making a judgement call that if any of the source sstables were archived, the resulting sstable will also be archived. Agree? As in 2), I think we should never compact the sstables on the slow disks. 4) Finally, I was trying to determine the right way to tell if an sstable was already archived. The logic I eventually used was simply parsing the path of the sstable and seeing if it was in the array of archive directories ( https://github.com/jeffjirsa/cassandra/commit/079b22136d178937b28b82326f132e33e96f6cad#diff-894e091348f28001de5b7fe88e65733fR1665 ) . I'm not convinced this is best, but I didn't know if it was appropriate to extend sstablemetadata or similar to avoid this. Thoughts? We do something similar in Directories.java: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Directories.java#L242 - you should probably check absolute paths and use startsWith? > Make it possible to move non-compacting sstables to slow/big storage in DTCS > ---------------------------------------------------------------------------- > > Key: CASSANDRA-8460 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8460 > Project: Cassandra > Issue Type: Improvement > Reporter: Marcus Eriksson > Assignee: Jeff Jirsa > Labels: dtcs > Fix For: 3.x > > > It would be nice if we could configure DTCS to have a set of extra data > directories where we move the sstables once they are older than > max_sstable_age_days. > This would enable users to have a quick, small SSD for hot, new data, and big > spinning disks for data that is rarely read and never compacted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)