[jira] [Comment Edited] (CASSANDRA-16772) User Defined nodetool cleanup only processes one SSTable per table

Scott Carey (Jira) Tue, 29 Jun 2021 11:37:04 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-16772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371574#comment-17371574
 ]


Scott Carey edited comment on CASSANDRA-16772 at 6/29/21, 6:36 PM:
-------------------------------------------------------------------

Looking at the code more closely -- 'forceUserDefinedCleanup' is not used 
outside of tests, nor wired up to nodetool.

So perhaps this is dead code.  Does it make sense to have user defined cleanup? 
 Whould an operator know which subset of files need to be cleaned?  I suppose 
one might purposely do one data directory at a time, or something like that.   

Either way, the existing code can be fixed to use the right tool to group files 
by table.


was (Author: scottcarey):
Looking at the code more closely -- 'forceUserDefinedCleanup' is not used 
outside of tests, nor wired up to nodetool.

So perhaps this is dead code.  Does it make sense to have user defined cleanup? 
 Whould an operator know which subset of files need to be cleaned?  I suppose 
one might purposely do one data directory at a time, or something like that.   

> User Defined nodetool cleanup only processes one SSTable per table
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-16772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16772
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>            Priority: Normal
>             Fix For: 3.11.x
>
>
> User defined nodetool cleanup uses a HashMap instead of a MultiMap to group 
> the user provided SSTables by table.  This means it only keeps one file per 
> source table.
> It also means the unit test for this component is not sufficient.
> As part of https://issues.apache.org/jira/browse/CASSANDRA-16767  I 
> introduced a helper method on Descriptor:
> {code:java}
> public static Multimap<ColumnFamilyStore, Descriptor> 
> fromFilenamesGrouped(Collection<String> filenames) {code}
> That should be used instead of the custom logic in 
> CompactionManager.forceUserDefinedCleanup.
>  
> Broken existing code:
> {code:java}
>         HashMap<ColumnFamilyStore, Descriptor> descriptors = 
> Maps.newHashMap();        for (String filename : filenames)
>         {
>             // extract keyspace and columnfamily name from filename
>             Descriptor desc = Descriptor.fromFilename(filename.trim());
>             if (Schema.instance.getCFMetaData(desc) == null)
>             {
>                 logger.warn("Schema does not exist for file {}. Skipping.", 
> filename);
>                 continue;
>             }
>             // group by keyspace/columnfamily
>             ColumnFamilyStore cfs = 
> Keyspace.open(desc.ksname).getColumnFamilyStore(desc.cfname);
>             desc = cfs.getDirectories().find(new 
> File(filename.trim()).getName());
>             if (desc != null)
>                 descriptors.put(cfs, desc);
>         } {code}
>  
> Contents of helper method introduced in other ticket:
> {code:java}
>  public static Multimap<ColumnFamilyStore, Descriptor> 
> fromFilenamesGrouped(Collection<String> filenames) {
>       Multimap<ColumnFamilyStore, Descriptor> descriptors = 
> ArrayListMultimap.create();      for (String filename : filenames)
>       {
>           // extract keyspace and columnfamily name from filename
>           Descriptor desc = Descriptor.fromFilename(filename.trim());
>           if (Schema.instance.getCFMetaData(desc) == null)
>           {
>               logger.warn("Schema does not exist for file {}. Skipping.", 
> filename);
>               continue;
>           }
>           // group by keyspace/columnfamily
>           ColumnFamilyStore cfs = 
> Keyspace.open(desc.ksname).getColumnFamilyStore(desc.cfname);
>           desc = cfs.getDirectories().find(new 
> File(filename.trim()).getName());
>           if (desc != null)
>             descriptors.put(cfs, desc);
>       }
>       return descriptors;
>     } {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-16772) User Defined nodetool cleanup only processes one SSTable per table

Reply via email to