[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597381#comment-14597381
 ] 

Marcus Eriksson edited comment on CASSANDRA-8460 at 6/23/15 9:04 AM:
---------------------------------------------------------------------

bq. 1) If compaction strategy calls for archive, but no archive disk is 
available (not defined or otherwise full), I'm falling back to standard disk. 
Agree?
Can't we check before starting an archive compaction if there are any archive 
locations available? If there are none, we shouldn't compact, right?
bq. 2) I originally planned to explicitly prohibit compaction of N files in 
archival disk, but I couldn't convince myself if that made sense. Instead, I'm 
allowing it if sstable_max_age_days allows it (if you set archive lower than 
max age, you could conceivably compact on archival disk tier). Agree?
The way I originally envisioned this was that once an sstable hits 
max_sstable_age_days, we trigger a compaction that puts it on the slow disk, 
and then we never need to look at those sstables again (unless they eventually 
expire due to TTL). The idea behind max_sstable_age_days is that this is the 
point where we don't expect to do many reads anymore, so it would also be a 
good point to put them on slow disks

I guess it could be a problem if users increase max_sstable_age_days and we 
move the data back to the fast disks though, thoughts?

bq. 3) In the case where archived sstables can still be compacted, it's 
possible in some windows to have them compacted with sstables on the faster 
standard disk. In those cases, I'm making a judgement call that if any of the 
source sstables were archived, the resulting sstable will also be archived. 
Agree?
As in 2), I think we should never compact the sstables on the slow disks.

bq. 4) Finally, I was trying to determine the right way to tell if an sstable 
was already archived. The logic I eventually used was simply parsing the path 
of the sstable and seeing if it was in the array of archive directories ( 
https://github.com/jeffjirsa/cassandra/commit/079b22136d178937b28b82326f132e33e96f6cad#diff-894e091348f28001de5b7fe88e65733fR1665
 ) . I'm not convinced this is best, but I didn't know if it was appropriate to 
extend sstablemetadata or similar to avoid this. Thoughts?
We do something similar in Directories.java: 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Directories.java#L242
 - you should probably check absolute paths and use startsWith?


was (Author: krummas):
bq. 1) If compaction strategy calls for archive, but no archive disk is 
available (not defined or otherwise full), I'm falling back to standard disk. 
Agree?
Can't we check before starting an archive compaction if there are any archive 
locations available? If there are none, we shouldn't compact, right?
bq. 2) I originally planned to explicitly prohibit compaction of N files in 
archival disk, but I couldn't convince myself if that made sense. Instead, I'm 
allowing it if sstable_max_age_days allows it (if you set archive lower than 
max age, you could conceivably compact on archival disk tier). Agree?
The way I originally envisioned this was that once an sstable hits 
max_sstable_age_days, we trigger a compaction that puts it on the slow disk, 
and then we never need to look at those sstables again (unless they eventually 
expire due to TTL). The idea behind max_sstable_age_days is that this is the 
point where we don't expect to do many reads anymore, so it would also be a 
good point to put them on slow disks

I guess it could be a problem if users increase max_sstable_age_days and we 
move the data back to the fast disks though, thoughts?

3) In the case where archived sstables can still be compacted, it's possible in 
some windows to have them compacted with sstables on the faster standard disk. 
In those cases, I'm making a judgement call that if any of the source sstables 
were archived, the resulting sstable will also be archived. Agree?
As in 2), I think we should never compact the sstables on the slow disks.

4) Finally, I was trying to determine the right way to tell if an sstable was 
already archived. The logic I eventually used was simply parsing the path of 
the sstable and seeing if it was in the array of archive directories ( 
https://github.com/jeffjirsa/cassandra/commit/079b22136d178937b28b82326f132e33e96f6cad#diff-894e091348f28001de5b7fe88e65733fR1665
 ) . I'm not convinced this is best, but I didn't know if it was appropriate to 
extend sstablemetadata or similar to avoid this. Thoughts?
We do something similar in Directories.java: 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Directories.java#L242
 - you should probably check absolute paths and use startsWith?

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8460
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Eriksson
>            Assignee: Jeff Jirsa
>              Labels: dtcs
>             Fix For: 3.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to