I think that's right, using a closed range makes sense to consume the data provided by "sstablemetadata", which also provides closed ranges. Especially because with half-open ranges we couldn't compact a sstable with a single big partition, of which we might only know the token but no the partition key.
So probably we should just add documentation about both -st and -et being inclusive, and live with a different meaning of -st in repair and compact. Also, the reason why this is so confusing in the test that started the discussion is that those closed token ranges are internally represented as "Range<Token>" objects, which are half-open by definition. So we should document those methods, and maybe do some minor changes to avoid the use of "Range<Token>" to silently represent closed token ranges. On Tue, 26 Jul 2022 at 16:27, Jeremiah D Jordan <jeremiah.jor...@gmail.com> wrote: > Reading the responses here and taking a step back, I think the current > behavior of nodetool compact is probably the correct behavior. The main > use case I can see for using nodetool compact is someone wants to take some > sstable and compact it with all the overlapping sstables. So you run > “sstablemetadata” on the sstable and get the min and max tokens, and then > you pass those in to nodetool compact. In that case you do want the closed > range. > > This is different from running repair where you get the tokens from the > nodes/nodetool ring and node those level token ranges ownership is half > open when going from “token owned by node a to token owned by node b”. > > So my initial thought/gut reaction that it should work like repair is > misleading, because you don’t get the tokens from the same place you get > them when running repair. > > Making the command line options more explicit and documented does seem > like it could be useful. > > -Jeremiah Jordan > > On Jul 26, 2022, at 9:16 AM, Derek Chen-Becker <de...@chen-becker.org> > wrote: > > +1 to new flags. A released, albeit undocumented, behavior is still a > contract with the end user. Flags (and documentation) seem like the right > path to address the situation. > > Cheers, > > Derek > > On Tue, Jul 26, 2022 at 7:28 AM Benedict Elliott Smith < > bened...@apache.org> wrote: > >> >> I think a change like this could be dangerous for a lot of existing >> automation built atop nodetool. >> >> I’m not sure this change is worthwhile. I think it would be better to >> introduce e.g. -ste and -ete for “start token exclusive” and “end token >> exclusive” so that users can opt-in to whichever scheme they prefer for >> their tooling, without breaking existing users. >> >> > On 26 Jul 2022, at 14:22, Brandon Williams <dri...@gmail.com> wrote: >> > >> > +1, I think that makes the most sense. >> > >> > Kind Regards, >> > Brandon >> > >> > On Tue, Jul 26, 2022 at 8:19 AM J. D. Jordan <jeremiah.jor...@gmail.com> >> wrote: >> >> >> >> I like the third option, especially if it makes it consistent with >> repair, which has supported ranges longer and I would guess most people >> would think the compact ranges work the same as the repair ranges. >> >> >> >> -Jeremiah Jordan >> >> >> >>> On Jul 26, 2022, at 6:49 AM, Andrés de la Peña <adelap...@apache.org> >> wrote: >> >>> >> >>> >> >>> Hi all, >> >>> >> >>> CASSANDRA-17575 has detected that token ranges in nodetool compact >> are interpreted as closed on both sides. For example, the command "nodetool >> compact -st 10 -et 50" will compact the tokens in [10, 50]. This way of >> interpreting token ranges is unusual since token ranges are usually >> half-open, and I think that in the previous example one would expect that >> the compacted tokens would be in (10, 50]. That's for example the way >> nodetool repair works, and indeed the class org.apache.cassandra.dht.Range >> is always half-open. >> >>> >> >>> It's worth mentioning that, differently from nodetool repair, the >> help and doc for nodetool compact doesn't specify whether the supplied >> start/end tokens are inclusive or exclusive. >> >>> >> >>> I think that ideally nodetool compact should interpret the provided >> token ranges as half-open, to be consistent with how token ranges are >> usually interpreted. However, this would change the way the tool has worked >> until now. This change might be problematic for existing users relying on >> the old behaviour. That would be especially severe for the case where the >> begin and end token are the same, because interpreting [x, x] we would >> compact a single token, whereas I think that interpreting (x, x] would >> compact all the tokens. As for compacting ranges including multiple tokens, >> I think the change wouldn't be so bad, since probably the supplied token >> ranges come from tools that are already presenting the ranges as half-open. >> Also, if we are splitting the full ring into smaller ranges, half-open >> intervals would still work and would save us some repetitions. >> >>> >> >>> So my question is: Should we change the behaviour of nodetool compact >> to interpret the token ranges as half-opened, aligning it with the usual >> interpretation of ranges? Or should we just document the current odd >> behaviour to prevent compatibility issues? >> >>> >> >>> A third option would be changing to half-opened ranges and also >> forbidding ranges where the begin and end token are the same, to prevent >> the accidental compaction of the entire ring. Note that nodetool repair >> also forbids this type of token ranges. >> >>> >> >>> What do you think? >> >> >> > > -- > +---------------------------------------------------------------+ > | Derek Chen-Becker | > | GPG Key available at https://keybase.io/dchenbecker and | > | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | > | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | > +---------------------------------------------------------------+ > > >