[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sylvain Lebresne updated CASSANDRA-2749: ---------------------------------------- Attachment: 0002-fix-unit-tests.patch 0001-add-new-directory-layout.patch I've only look at the last posted patchset (2749_proper.tar.gz) but I've found the following few problems: * The disk space free space checks doesn't take the new layout into account. In other words, it always checks the keyspace directory for free space. But in case we use CF directories, the keyspace directory could likely be on a disk with almost no freespace (and in any case, this won't always be the right disk to check). * The check that a snapshot is existing and the clearing of snapshot is broken for the CF directories layout (in Table.java) * The leveled manifest is not put into it's CF directory (it follows that its snapshot is not in the right place either). I'm also not a fan of having the directory argument given to the Descriptor constructors not be the actual directory of the sstable it describes. I think it's error prone, making Descriptor harder to use (the fact that Descriptor.getDirectory() doesn't necessarily returns the directory you've fed to the constructor is confusing). More generally, the current code dealing with the paths to the data directories is currently very badly encapsulated (not this patch fault at all). Which means that adding a new "parallel" data directory layout leaks everywhere and make the code hard to read/maintain. I strongly believe we should use the opportunity to improve that code encapsulation. So attaching a patch that refactor a bit more the code to improve encapsulation. It essentially consists in adding a new Directories class that abstracts the actual layout of the data directory (including the fact that we can use multiple locations, which imho greatly clean the code). Aside from that, the patch reuse the sstablemover of the preceding patch (with an added verbose option). The patch also includes fairly extensive unit tests for the new class. Now there is one thing that really bother me with this patch (the preceding patches have the same problem). It's the fact that the Descriptor.fromFilename method needs to query the global useSeparateCFDirectories option to work correctly. This has a number of practical consequences: # it is the reason for a number of super annoying gymnastic in the unit tests, like the need to duplicate the sstables used by LegacySSTableTest for instance. I've put all of this gymnastic in a separate patch (0002-fix-unit-tests.patch). # this means we cannot stream between two nodes, one using separate cf directory, one that don't (that limitation could be solve independently for streaming only, but it's worth considering a more general fix imo, in particular because a specific fix would be a tad ugly). # tools like the sstableloader will require that you put the sstable to load into a ksname/cfname/ directory when you use the separate directory option. It's not a big deal, but it feels arbitrary and annoying. At first, I had hoped that we could make Descriptor.fromFilename 'detect' whether or not we were using the separate directory layout by using the fact that the file name contains the column family name already, but that doesn't work when the ksname == cfname (which is allowed afaik). One option would be to stop relying on the directory structure the sstable is in to detect the keyspace name. We could do that if we included the keyspace name in the filename for instance. The pros would be to fix all the problem listed above, and in particular we would be able to allow stuffs like {{sstableloader <filename>}}, instead of requiring that the files are put inside a specifically name directory, which would be nice. I don't see much cons except making the file names longer. Any opinion on that idea? > fine-grained control over data directories > ------------------------------------------ > > Key: CASSANDRA-2749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Jonathan Ellis > Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, > 0001-add-new-directory-layout.patch, > 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, > 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, > 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, > 2749_backwards_compatible_v4.patch, > 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, > 2749_proper.tar.gz > > > Currently Cassandra supports multiple data directories but no way to control > what sstables are placed where. Particularly for systems with mixed SSDs and > rotational disks, it would be nice to pin frequently accessed columnfamilies > to the SSDs. > Postgresql does this with tablespaces > (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we > should probably avoid using that name because of confusing similarity to > "keyspaces." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira