[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2749:
----------------------------------------

    Attachment: 0002-fix-unit-tests.patch
                0001-add-new-directory-layout.patch

I've only look at the last posted patchset (2749_proper.tar.gz) but I've found 
the following few problems:
* The disk space free space checks doesn't take the new layout into account.  
In other words, it always checks the keyspace directory for free space. But in 
case we use CF directories, the keyspace directory could likely be on a disk 
with almost no freespace (and in any case, this won't always be the right disk 
to check).
* The check that a snapshot is existing and the clearing of snapshot is broken 
for the CF directories layout (in Table.java)
* The leveled manifest is not put into it's CF directory (it follows that its 
snapshot is not in the right place either).

I'm also not a fan of having the directory argument given to the Descriptor 
constructors not be the actual directory of the sstable it describes. I think 
it's error prone, making Descriptor harder to use (the fact that 
Descriptor.getDirectory() doesn't necessarily returns the directory you've fed 
to the constructor is confusing).

More generally, the current code dealing with the paths to the data directories 
is currently very badly encapsulated (not this patch fault at all). Which means 
that adding a new "parallel" data directory layout leaks everywhere and make 
the code hard to read/maintain. I strongly believe we should use the 
opportunity to improve that code encapsulation.

So attaching a patch that refactor a bit more the code to improve 
encapsulation. It essentially consists in adding a new Directories class that 
abstracts the actual layout of the data directory (including the fact that we 
can use multiple locations, which imho greatly clean the code). Aside from 
that, the patch reuse the sstablemover of the preceding patch (with an added 
verbose option). The patch also includes fairly extensive unit tests for the 
new class.

Now there is one thing that really bother me with this patch (the preceding 
patches have the same problem). It's the fact that the Descriptor.fromFilename 
method needs to query the global useSeparateCFDirectories option to work 
correctly. This has a number of practical consequences:
# it is the reason for a number of super annoying gymnastic in the unit tests, 
like the need to duplicate the sstables used by LegacySSTableTest for instance. 
I've put all of this gymnastic in a separate patch (0002-fix-unit-tests.patch).
# this means we cannot stream between two nodes, one using separate cf 
directory, one that don't (that limitation could be solve independently for 
streaming only, but it's worth considering a more general fix imo, in 
particular because a specific fix would be a tad ugly).
# tools like the sstableloader will require that you put the sstable to load 
into a ksname/cfname/ directory when you use the separate directory option.  
It's not a big deal, but it feels arbitrary and annoying.

At first, I had hoped that we could make Descriptor.fromFilename 'detect' 
whether or not we were using the separate directory layout by using the fact 
that the file name contains the column family name already, but that doesn't 
work when the ksname == cfname (which is allowed afaik).

One option would be to stop relying on the directory structure the sstable is 
in to detect the keyspace name. We could do that if we included the keyspace 
name in the filename for instance. The pros would be to fix all the problem 
listed above, and in particular we would be able to allow stuffs like 
{{sstableloader <filename>}}, instead of requiring that the files are put 
inside a specifically name directory, which would be nice. I don't see much 
cons except making the file names longer. Any opinion on that idea?

                
> fine-grained control over data directories
> ------------------------------------------
>
>                 Key: CASSANDRA-2749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 
> 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
> 0001-add-new-directory-layout.patch, 
> 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
> 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 
> 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 
> 2749_backwards_compatible_v4.patch, 
> 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
> 2749_proper.tar.gz
>
>
> Currently Cassandra supports multiple data directories but no way to control 
> what sstables are placed where. Particularly for systems with mixed SSDs and 
> rotational disks, it would be nice to pin frequently accessed columnfamilies 
> to the SSDs.
> Postgresql does this with tablespaces 
> (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
> should probably avoid using that name because of confusing similarity to 
> "keyspaces."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to