[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470655#comment-13470655 ] Chris Chiappone commented on CASSANDRA-2749: This actually causes a bunch of problems for customers that have used longer the 48 character keyspace and cf names. How will this affect the upgrade path for those customers? > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Assignee: Sylvain Lebresne >Priority: Minor > Fix For: 1.1.0 > > Attachments: 0001-2749.patch, > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, > 0002-fix-unit-tests.patch, 0003-Fixes.patch, > 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, > 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, > 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, > 2749_proper.tar.gz, 2749.tar.gz > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244746#comment-13244746 ] Jonathan Ellis commented on CASSANDRA-2749: --- Did some more research on path limitations: NTFS is technically okay with paths up to 32K long[1], but the windows api is limited to 256[2]. Common Linux filesystems have a limit of 255 bytes per path *component* (i.e. directory or filename) but no total path limit. However, Linux defines PATH_MAX and FILENAME_MAX, both 4096. [3] [1] http://en.wikipedia.org/wiki/Comparison_of_file_systems [2] http://msdn.microsoft.com/en-us/library/aa365247.aspx [3] http://serverfault.com/questions/9546/filename-length-limits-on-linux In short: restricting KS and CF names to 32 characters is a good idea for the benefit of Windows portability. However, we may want to exempt Linux systems from the startup length check to allow easier upgrades. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Assignee: Sylvain Lebresne >Priority: Minor > Fix For: 1.1.0 > > Attachments: 0001-2749.patch, > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, > 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, > 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, > 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, > 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, > 2749_proper.tar.gz > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178822#comment-13178822 ] Pavel Yaskevich commented on CASSANDRA-2749: +1 with last nit - Directories.SSTableLister methods skip{Compacted,Temporary}(boolean) and includeBackups(boolean) ignore argument and always set corresponding option to "true". > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: 0001-2749.patch, > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, > 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, > 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, > 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, > 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, > 2749_proper.tar.gz > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175460#comment-13175460 ] Pavel Yaskevich commented on CASSANDRA-2749: That my be the case :) I will re-test as part of the review anyway. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: 0001-2749.patch, > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, > 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, > 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, > 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, > 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, > 2749_proper.tar.gz > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175447#comment-13175447 ] Sylvain Lebresne commented on CASSANDRA-2749: - Weird. I just tried the same scenario and everything worked correctly. I should mention that when moving the snapshots/backups, the migration process rename them to the new filename convention, so they will be called Keyspace1-Standard1.Idx-*. Or maybe I fixed it with the last version of the patch without realizing it. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: 0001-2749.patch, > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, > 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, > 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, > 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, > 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, > 2749_proper.tar.gz > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175390#comment-13175390 ] Pavel Yaskevich commented on CASSANDRA-2749: bq. I'm not sure I see what this one is. Are we talking of the migration process? I was testing it like this : # run 1.1 *without* modifications # ./tools/stress/bin/stress -n 5 -S 512 -x KEYS # ./bin/nodetool -h localhost flush Keyspace1 Standard1 # ./bin/nodetool -h localhost snapshot Keyspace1 # made sure that Standard1.Idx-* SSTables are in the snapshots/ directory # run 1.1 *with* you patch applied # checked if snapshots directory was moved and what files did it include - it was lucking Standard1.Idx-* files # cleaned data directory # repeated steps 1 - 5 but this time *with* your patch applied and it didn't include Standard1.Idx-* into snapshot bq. Maybe it is more "natural" to have secondary indexes sstables be in the same directory than the base cfs? +1 > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: 0001-2749.patch, > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, > 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, > 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, > 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, > 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, > 2749_proper.tar.gz > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175070#comment-13175070 ] Pavel Yaskevich commented on CASSANDRA-2749: Thanks, Sylvain! We are getting really close :) Here are problems I found: - o.a.c.db.Directories comment should be updated because it still uses SSTable file name without keyspace. - o.a.c.io.sstable.SSTableReaderTest won't compile - if you start with empty data directory you get following exception and process exits {code} INFO 23:51:11,987 Upgrade from pre-1.1 version detected: migrating sstables to new directory layout ERROR 23:51:11,988 Exception encountered during startup java.lang.NullPointerException at org.apache.cassandra.db.Directories.migrateSSTables(Directories.java:416) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:164) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:360) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107) java.lang.NullPointerException at org.apache.cassandra.db.Directories.migrateSSTables(Directories.java:416) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:164) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:360) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107) Exception encountered during startup: null {code} - on snapshot doesn't create or move (from older schema) index SSTables related to CF - shouldn't old "snapshots" directory be removed after move? > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: 0001-2749-v2.patch, > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 0001-add-new-directory-layout.patch, > 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, > 0002-fix-unit-tests-v2.patch, 0002-fix-unit-tests.patch, 2749.tar.gz, > 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, > 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, > 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, > 2749_proper.tar.gz > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170487#comment-13170487 ] Jonathan Ellis commented on CASSANDRA-2749: --- It might be worth adding a "are my filenames going to be too large" check against all KS + CF combinations before starting to migrate data files around, though. It would suck to end up with a partially converted database if some short CF names complete early on, before erroring out on a long one. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 0001-add-new-directory-layout.patch, > 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, > 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, > 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, > 2749_backwards_compatible_v4.patch, > 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, > 2749_proper.tar.gz > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170480#comment-13170480 ] Sylvain Lebresne commented on CASSANDRA-2749: - bq. guess we need to supply a tool to rename sstables files if anyone is on longer names? We probably don't need to do anything. I don't think anyone is really using names long enough for them to it the file system limit, the goal of limiting the names is just so to prevent this from happening but there will be no other assumption that the names are short from the code. I also don't think anything will prevent rolling upgrades, do you had something in mind? Note: I have a long flight ahead of me so I plan to update my last patch with both those changes, as I still like the moving of all the directories handling in a dedicated class, even if we don't support both layout. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 0001-add-new-directory-layout.patch, > 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, > 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, > 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, > 2749_backwards_compatible_v4.patch, > 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, > 2749_proper.tar.gz > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170475#comment-13170475 ] Pavel Yaskevich commented on CASSANDRA-2749: I think if we would be just supporting new style layout we can convert on startup. +1 on both ideas tho. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 0001-add-new-directory-layout.patch, > 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, > 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, > 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, > 2749_backwards_compatible_v4.patch, > 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, > 2749_proper.tar.gz > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170467#comment-13170467 ] Marcus Eriksson commented on CASSANDRA-2749: sounds great (both just supporting new-style-layout and limiting names to 32chars) guess we need to supply a tool to rename sstables files if anyone is on longer names? and rolling upgrades are out of the question then right? (maybe the already are?) > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 0001-add-new-directory-layout.patch, > 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, > 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, > 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, > 2749_backwards_compatible_v4.patch, > 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, > 2749_proper.tar.gz > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170394#comment-13170394 ] Jonathan Ellis commented on CASSANDRA-2749: --- +1 limiting to 32 > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 0001-add-new-directory-layout.patch, > 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, > 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, > 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, > 2749_backwards_compatible_v4.patch, > 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, > 2749_proper.tar.gz > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170385#comment-13170385 ] Sylvain Lebresne commented on CASSANDRA-2749: - {quote} On the other hand, we allow arbitrary-length KS + CF names (up to 64K iirc) so UUID aside we're already in trouble on ext3/ext4, xfs, and ntfs, which all support max filename length of ~256. I'm starting to think we should move these into the metadata component instead of the filename. {quote} The thing with the metadata component is that from a code perspective, there is lots of places where we want to create a Descriptor, which involves extracting the keyspace/cf names only based on the filename. Adding the necessity to locate and read the metadata in those places will likely don't be very fun. So I'd be in favor of just limiting the keyspace and column family names. It's one for which there is no real point to have very long names. Limiting each one to 32 characters shouldn't be a strong limitation. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 0001-add-new-directory-layout.patch, > 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, > 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, > 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, > 2749_backwards_compatible_v4.patch, > 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, > 2749_proper.tar.gz > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170364#comment-13170364 ] Jonathan Ellis commented on CASSANDRA-2749: --- bq. we cannot stream between two nodes, one using separate cf directory I don't see any reason to continue to support the old-style directory layout. That adds complexity (operationally as well as in the code) for no benefit that I can think of. I think we should migrate from old layout to new on the first startup under 1.1. bq. regarding keyspaces in file names, sure, why not, guess having a header with this info in the file is out of the question, then the only meta data we have is the file name, right? A problem could be if we want to do CASSANDRA-1983 later, that would increase the file name length even more I'm on the fence here -- on the one hand having ks + cf in the filename simplifies some things. On the other hand, we allow arbitrary-length KS + CF names (up to 64K iirc) so UUID aside we're already in trouble on ext3/ext4, xfs, and ntfs, which all support max filename length of ~256. I'm starting to think we should move these into the metadata component instead of the filename. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 0001-add-new-directory-layout.patch, > 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, > 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, > 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, > 2749_backwards_compatible_v4.patch, > 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, > 2749_proper.tar.gz > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163571#comment-13163571 ] Sylvain Lebresne commented on CASSANDRA-2749: - bq. i had on algorithm for detecting what kind of file it is in the old backwards compatible patch, it iterates over all data directories and figures out which data directory the file is in, then it knows the keyspace is the next part of the filename, and can check if there are files or directories in that directory. I tried that too, but this doesn't work when say you're opening a sstable by the sstableloader, because the files are not in a data directories then. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 0001-add-new-directory-layout.patch, > 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, > 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, > 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, > 2749_backwards_compatible_v4.patch, > 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, > 2749_proper.tar.gz > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163570#comment-13163570 ] Marcus Eriksson commented on CASSANDRA-2749: agree that the file structure is all over the place, that's why this patch was so incredibly painful to do seems i only did the disk space checking in sub directories in the backwards compatible patch i had on algorithm for detecting what kind of file it is in the old backwards compatible patch, it iterates over all data directories and figures out which data directory the file is in, then it knows the keyspace is the next part of the filename, and can check if there are files or directories in that directory. but i agree, it is incredibly ugly, your modifications make it alot better, and yes, this would be a good time to do this properly, not like me just trying to do a minimal patch that "works". regarding keyspaces in file names, sure, why not, guess having a header with this info in the file is out of the question, then the only meta data we have is the file name, right? A problem could be if we want to do CASSANDRA-1983 later, that would increase the file name length even more. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 0001-add-new-directory-layout.patch, > 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, > 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, > 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, > 2749_backwards_compatible_v4.patch, > 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, > 2749_proper.tar.gz > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155218#comment-13155218 ] Pavel Yaskevich commented on CASSANDRA-2749: +1 first few additions: - You forgot to set `chmod +x ./bin/sstablemover`. - Good idea would be to use `mv` instead of `cp` when -d option is specified. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, > 2749.tar.gz, 2749_backwards_compatible_v1.patch, > 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, > 2749_backwards_compatible_v4.patch, > 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153620#comment-13153620 ] Pavel Yaskevich commented on CASSANDRA-2749: Thanks a lot for taking care about this, Marcus! I have tested your latest patch using stress and sstablemover and then stress (+ flush/scrub/snapshot) again for Standard (+ secondary indexes) and Super ColumnFamilies and everything worked just fine for me. Here are my last comments and I think after this gets done we are ready to push your changes: 1). a). Tool should be made more user-friendly by adding at least --help and add an option to move sstables back to the keyspace directory (as we have an two states in the config); b). I think it should delete old sstables after copy automatically or by asking user using --delete-old option? c). Move sstablemover tool the ./bin instead of ./tools/sstablemover. 2). LegacySSTableTest fails on my machine can you, please, check that? 3). I think we should extend comment in the config describing what should be done before it is safe to change to change the option e.g. "Note: before setting this option to 'true' you should move existing SSTable files to the separate directories manually by using ./bin/sstablemover tool, the same applies for 'false' state - ./bin/sstablemover could be used to restore original directory structure (Please use ./bin/sstablemover --help for help and information)."... Also please remove/implement remaining // TODO: comments. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, > 2749.tar.gz, 2749_backwards_compatible_v1.patch, > 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, > 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146556#comment-13146556 ] Pavel Yaskevich commented on CASSANDRA-2749: I don't really like the idea of keeping a map of all generates for each of ks/cf and updating it all the time (this also implies that we need to resize it after cf is created/dropped). After re-thinking this once again it feels like we really should just make a tool to transfer SSTable to separate directories and back, make a flag in config that will indicate old/new style of file structure, because I don't see a error-prone way to handle this when old/new style SSTables are mixed... > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Assignee: Marcus Eriksson >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 2749.tar.gz, > 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, > 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, > 2749_backwards_compatible_v4_rebase1.patch > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146250#comment-13146250 ] Pavel Yaskevich commented on CASSANDRA-2749: Tested v4 on trunk and I see few test failures - LegacySSTableTest and ColumnFamilyStoreTest, it is related to how you determine what placement style is used for a given SSTable, feels like !DatabaseDescriptor.useSeparateCFDirectories() condition is insufficient. Also I think it would be correct to rename "one_directory_per_column_family" option to "use_separate_column_family_directories" with comment like "Useful when you have disks with different speeds (HDD/SSD) and you want explicitly distribute Column Families between them". > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Assignee: Marcus Eriksson >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, > 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, > 2749_backwards_compatible_v4_rebase1.patch > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143863#comment-13143863 ] Pavel Yaskevich commented on CASSANDRA-2749: v3 looks good (needs rebase), here is my comments: - DatabaseDescriptor.getPerCFDirectory() should be renamed to something like "useSeparateCFDirectories" - in the ColumnFamilyStory.accept method file name should be checked depending on case because we can CF names are case-sensitive. - we don't have ColumnFamilyStore.rename method anymore which is a good thing for this patch, you can just rebase and remove related code. - in the Table.snapshotExists I think we should be more careful determining if snapshot actually exists. - TODO should be cleaned up. Wish: if it's possible I think we can remove ColumnFamily name from the SSTable files if those are in the CF directory already. I think we are almost done in here. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Assignee: Marcus Eriksson >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, > 2749_backwards_compatible_v3.patch > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127329#comment-13127329 ] Marcus Eriksson commented on CASSANDRA-2749: Biggest part left is figuring out how to estimate disk space to know where to write an sstable, ill work on that this weekend Other than that small cleanups (codestyle), unit tests and adding the config parameter > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126980#comment-13126980 ] Pavel Yaskevich commented on CASSANDRA-2749: I forgot to mention - would be great to see a test for RecursiveGlob and please put a license banner on top of that file like we do everywhere else. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126954#comment-13126954 ] Pavel Yaskevich commented on CASSANDRA-2749: Good, can you briefly tell what is left? I see that ColumnFamilyStoreTest has an commented assertion and code still have some TODOs. Also please make sure you follow your CodeStyle and put space after if/for/while/... statements... > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126910#comment-13126910 ] Marcus Eriksson commented on CASSANDRA-2749: ah that is a leftover from my first non-backwards-compatible patch it works even without the cf in the new File(...) > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126899#comment-13126899 ] Pavel Yaskevich commented on CASSANDRA-2749: Looks better now but I don't like how you changed DescriptorTest especially testLegacy() "userActionUtilsKey" there is meant to be a CF name, so it seems like current version does not work correctly? > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124330#comment-13124330 ] Pavel Yaskevich commented on CASSANDRA-2749: Sorry Marcus I had no time to look at this but I will by the end of this week. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124288#comment-13124288 ] Marcus Eriksson commented on CASSANDRA-2749: Anyone got a suggestion on how to do the available disk space checking? Reason in that now we cannot simply check which dataFileLocation that has the most available disk space, we need to check the subdirs where files are actually written. My naive first approach would be to simply change DatabaseDescriptor.getDataFileLocationForTable(..) to take a column family name, which we ought to always know when calling that method (and a quick check seems to say that only compactions might prove a problem for this approach). Comments? > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121019#comment-13121019 ] Jonathan Ellis commented on CASSANDRA-2749: --- Yes, trunk is the right place for this. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118867#comment-13118867 ] Marcus Eriksson commented on CASSANDRA-2749: agree, gonna look at how much pain it is to actually build though wont need alot of downtime if it is done offline, can rsync files into the subdir right before stopping the node > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118868#comment-13118868 ] Pavel Yaskevich commented on CASSANDRA-2749: On my opinion it's more user-friendly if we support old directory structure for existing files and let compaction do replace for us. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118864#comment-13118864 ] Jonathan Ellis commented on CASSANDRA-2749: --- I don't think taking compression as a model makes sense. With compression, it's fine to set the option globally and each node can start using compression when it gets the schema update. But this affects the storage engine "below" schema. I think it's reasonable to make the change offline, especially if an upgrade tool is provided to make it simple. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118859#comment-13118859 ] Marcus Eriksson commented on CASSANDRA-2749: Ok, i'll update the patch > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118720#comment-13118720 ] Pavel Yaskevich commented on CASSANDRA-2749: This should be done like compression which compresses only newly created SSTables, we don't want to make users to stop their node for indefinite time to change config and move files. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118696#comment-13118696 ] Marcus Eriksson commented on CASSANDRA-2749: It is not backwards compatible, i figured it was not worth the extra complexity, i imagine the "upgrade"-path to be: 1. shut down node 2. edit config file 3. symlink in SSDs and move the files into the subdirectories 4. start node Of course, this could be done on startup by cassandra itself, but providing a shell-script that does it keeps it simple, what do you think? I really dont want the complexity with sstable files in two places at the same time. My solution simply changed the Descriptor class to return paths to sstable files in subdirs, and then fix everything that was affected. Many of the classes modified were due to assumptions in the unit tests. I'm currently refactoring the Descriptor class to make it cleaner and a central place to glob for files etc. Expect a patch tonight or tomorrow. I will of course change DatabaseDescriptor.getPerCFDirectory() to read the config file, kept it hard coded to make testing simpler. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118454#comment-13118454 ] Pavel Yaskevich commented on CASSANDRA-2749: Thanks for your work, Marcus! First of all, can you please write your algorithm down so people can participate without reading actual code? I don't think that this is the right way to go because I can't find how does it manage to keep backward compatibility with current directory structure (and that test SSTables should be moved is yet another confirmation) that would imply major change in the Descriptor/Component classes, all the current changes relay on DatabaseDescriptor.getPerCFDirectory() which is static "true" which makes it redundant. In my vision of things major changes should be made only in Descriptor, Component, ColumnFamilyStore and SSTable classes to change the way they create/lookup file locations and do backups and snapshoting. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118017#comment-13118017 ] Marcus Eriksson commented on CASSANDRA-2749: btw, yes, i did include the data files in the patch on purpose, they need to be in subdirs for some of the unit tests to work > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085812#comment-13085812 ] Pavel Yaskevich commented on CASSANDRA-2749: After crushing my head again this for a few days I can say that this is more complicated that it sounds for a few reasons: - We will need to support both old/new directory structures which requires major changes in the way how Descriptor class works and how CFS and SSTable classes do file lookup and path generation. - Adds additional complexity to the way how we do backups, snapshots and recover which could potentially lead to some nasty bugs. - As Peter already mentioned "Cassandra won't be able to distinguish between an actual empty CF and a directory that wasn't mounted (or a symlink pointing to a non-mounted directory)". There are more but those mentioned are major ones. Let's skip this for now. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Assignee: Pavel Yaskevich >Priority: Minor > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081679#comment-13081679 ] Chris Burroughs commented on CASSANDRA-2749: It would also be cool (but this is obviously speculative) to have the ability to keep Index files on an SSD, and the larger data files on rotating disks. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Assignee: Pavel Yaskevich >Priority: Minor > Fix For: 1.0 > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046102#comment-13046102 ] Peter Schuller commented on CASSANDRA-2749: --- +1 on that. We have been discussing the same thing, for the same purpose. The only kink is that you don't want to do something like having a per-cf setting that is tied to local node details like paths. But simply placing CF:s in a named subdirectory (similar to the pg tablespace) which can, on a per-node basis, by a symlink or a mountpoint, avoids that. This means there's no problem doing a rolling re-configuration of a cluster, and there is no need to realize before hand that you might want to move some particular CF and do something like assign it to a tablespace (to get the level of indirection). It all just works by default, and you can move CF:s at any time on any node without co-ordination other than the node being down for a bit. I can foresee it being easier to accidentally start a node which seems to work but has some CF:s be completely empty, because Cassandra won't be able to distinguish between an actual empty CF and a directory that wasn't mounted (or a symlink pointing to a non-mounted directory). Something simple like creating a marker of some kind on CF creation might help with that; on start-up CF:s that are missing the marker could be rejected. But - I suppose this is overkill at least initially. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046093#comment-13046093 ] Jonathan Ellis commented on CASSANDRA-2749: --- Ryan's idea sounds like the simplest way to get something "good enough" to me. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046069#comment-13046069 ] Ryan King commented on CASSANDRA-2749: -- Since each keyspace is stored in a different sub-directory of the DataDiretories, you can already split the storage of different keyspaces with some clever mount options. Maybe we could give column families the same treatment? > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046045#comment-13046045 ] Héctor Izquierdo commented on CASSANDRA-2749: - What about being configurable in a separate file like the network topology? Could that work as a first approximation? > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045956#comment-13045956 ] Jonathan Ellis commented on CASSANDRA-2749: --- There's some tension between managing this cluster-wide and the actual data directory definitions being per-machine. Not sure what the best solution there is. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045953#comment-13045953 ] Jonathan Ellis commented on CASSANDRA-2749: --- We could also have a "memory" location that would be useful for temporary data. > fine-grained control over data directories > -- > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Priority: Minor > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira