[ https://issues.apache.org/jira/browse/CASSANDRA-16772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Scott Carey updated CASSANDRA-16772: ------------------------------------ Description: User defined nodetool cleanup uses a HashMap instead of a MultiMap to group the user provided SSTables by table. This means it only keeps one file per source table. It also means the unit test for this component is not sufficient. As part of https://issues.apache.org/jira/browse/CASSANDRA-16767 I introduced a helper method on Descriptor: {code:java} public static Multimap<ColumnFamilyStore, Descriptor> fromFilenamesGrouped(Collection<String> filenames) {code} That should be used instead of the custom logic in CompactionManager.forceUserDefinedCleanup. Broken existing code: {code:java} HashMap<ColumnFamilyStore, Descriptor> descriptors = Maps.newHashMap(); for (String filename : filenames) { // extract keyspace and columnfamily name from filename Descriptor desc = Descriptor.fromFilename(filename.trim()); if (Schema.instance.getCFMetaData(desc) == null) { logger.warn("Schema does not exist for file {}. Skipping.", filename); continue; } // group by keyspace/columnfamily ColumnFamilyStore cfs = Keyspace.open(desc.ksname).getColumnFamilyStore(desc.cfname); desc = cfs.getDirectories().find(new File(filename.trim()).getName()); if (desc != null) descriptors.put(cfs, desc); } {code} Contents of helper method introduced in other ticket: {code:java} public static Multimap<ColumnFamilyStore, Descriptor> fromFilenamesGrouped(Collection<String> filenames) { Multimap<ColumnFamilyStore, Descriptor> descriptors = ArrayListMultimap.create(); for (String filename : filenames) { // extract keyspace and columnfamily name from filename Descriptor desc = Descriptor.fromFilename(filename.trim()); if (Schema.instance.getCFMetaData(desc) == null) { logger.warn("Schema does not exist for file {}. Skipping.", filename); continue; } // group by keyspace/columnfamily ColumnFamilyStore cfs = Keyspace.open(desc.ksname).getColumnFamilyStore(desc.cfname); desc = cfs.getDirectories().find(new File(filename.trim()).getName()); if (desc != null) descriptors.put(cfs, desc); } return descriptors; } {code} was: User defined nodetool cleanup uses a HashMap instead of a MultiMap to group the user provided SSTables by table. This means it only keeps one file per source table. It also means the unit test for this component is not sufficient. As part of https://issues.apache.org/jira/browse/CASSANDRA-16767 I introduced a helper method on Descriptor: {code:java} public static Multimap<ColumnFamilyStore, Descriptor> fromFilenamesGrouped(Collection<String> filenames) {code} That should be used instead of the custom logic in CompactionManager.forceUserDefinedCleanup > User Defined nodetool cleanup only processes one SSTable per table > ------------------------------------------------------------------ > > Key: CASSANDRA-16772 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16772 > Project: Cassandra > Issue Type: Bug > Reporter: Scott Carey > Assignee: Scott Carey > Priority: Normal > > User defined nodetool cleanup uses a HashMap instead of a MultiMap to group > the user provided SSTables by table. This means it only keeps one file per > source table. > It also means the unit test for this component is not sufficient. > As part of https://issues.apache.org/jira/browse/CASSANDRA-16767 I > introduced a helper method on Descriptor: > {code:java} > public static Multimap<ColumnFamilyStore, Descriptor> > fromFilenamesGrouped(Collection<String> filenames) {code} > That should be used instead of the custom logic in > CompactionManager.forceUserDefinedCleanup. > > Broken existing code: > {code:java} > HashMap<ColumnFamilyStore, Descriptor> descriptors = > Maps.newHashMap(); for (String filename : filenames) > { > // extract keyspace and columnfamily name from filename > Descriptor desc = Descriptor.fromFilename(filename.trim()); > if (Schema.instance.getCFMetaData(desc) == null) > { > logger.warn("Schema does not exist for file {}. Skipping.", > filename); > continue; > } > // group by keyspace/columnfamily > ColumnFamilyStore cfs = > Keyspace.open(desc.ksname).getColumnFamilyStore(desc.cfname); > desc = cfs.getDirectories().find(new > File(filename.trim()).getName()); > if (desc != null) > descriptors.put(cfs, desc); > } {code} > > Contents of helper method introduced in other ticket: > {code:java} > public static Multimap<ColumnFamilyStore, Descriptor> > fromFilenamesGrouped(Collection<String> filenames) { > Multimap<ColumnFamilyStore, Descriptor> descriptors = > ArrayListMultimap.create(); for (String filename : filenames) > { > // extract keyspace and columnfamily name from filename > Descriptor desc = Descriptor.fromFilename(filename.trim()); > if (Schema.instance.getCFMetaData(desc) == null) > { > logger.warn("Schema does not exist for file {}. Skipping.", > filename); > continue; > } > // group by keyspace/columnfamily > ColumnFamilyStore cfs = > Keyspace.open(desc.ksname).getColumnFamilyStore(desc.cfname); > desc = cfs.getDirectories().find(new > File(filename.trim()).getName()); > if (desc != null) > descriptors.put(cfs, desc); > } > return descriptors; > } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org