Re: Cassandra project biweekly status update 2022-06-14
> I don’t think it has to be all that complicated? Definitely not. We've just never documented it afaict. On Tue, Jun 28, 2022, at 2:58 PM, Benedict wrote: > > I don’t think it has to be all that complicated? > > If it’s a part of our UX it’s probably something we should maintain backwards > compatibility for. > > If it’s part of our internal codebase, probably not. The only two “public” > APIs we have inside the codebase that I’m aware of are triggers and secondary > indexes, and these are provided with limited warranty and an expectation of > technical sophistication for their users. I think there has always been an > expectation that users of these features will bear the cost of migration to > any new API versions we might introduce between majors. > > >> On 28 Jun 2022, at 19:39, Josh McKenzie wrote: >> >>> I think it is good to document further things and keep on doing it in time >>> when discussions happen. I can see this being a benefit both for users and >>> Cassandra developers. >> Strong +1 from me here. Having guidance for people working on the codebase >> to understand what is and isn't considered a public API will help inform how >> we shape these things and keep things stable for our userbase. >> >> On Sun, Jun 26, 2022, at 12:58 PM, Ekaterina Dimitrova wrote: >>> “+1 to always, by default, maintaining compatibility.” >>> +1 >>> >>> “We also have the discussion wrt what are breaking changes. Are we >>> intending to evolve what interfaces and behaviour we provide, and to what >>> degree, compatibility over via these discussions/votes?” >>> >>> While I do agree we cannot really have a fully exhaustive list, I think it >>> is good to document further things and keep on doing it in time when >>> discussions happen. I can see this being a benefit both for users and >>> Cassandra developers. >>> >>> >>> On Thu, 16 Jun 2022 at 18:25, Mick Semb Wever wrote: > We've agreed in the past that we want to maintain compatibility and that > all changes will be done with compatibility in mind (see > https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle), > but we haven't clarified how we make the call on when to bump to next > major. +1 to always, by default, maintaining compatibility. Note, a major bump isn't only about compatibility breakages per se, but a) time to clean up deprecated code, and b) delineating upgrade paths. > The Release Lifecycle states "Introducing a backward incompatibility > change needs dev community approval via voting [voting open for 48 > hours]." But this is under a section called "Outstanding questions beyond > the scope of this document", maybe it's about time we finalize this and > update this document? IIRC, though i can easily be wrong, this was meant for breaking changes within a major, e.g. after a beta release. Not that the same formality cannot also be applied to trunk dev, as it ensures a desired visibility, though I wonder if we will solve it in practice most of the time with the preceding [DISCUSS] thread. We also have the discussion wrt what are breaking changes. Are we intending to evolve what interfaces and behaviour we provide, and to what degree, compatibility over via these discussions/votes? >>
Re: help for a side project
Hi Norman, In short, any time you add a node to a cluster there will be a redistribution of data and it will be proportional to the total number of nodes you have in the cluster. VNodes just create smaller chunks and distribute them around the cluster more. If you have a 3 node cluster with a RF=1(for simplicity's sake) and add 1 node, every existing node has to reduce its responsibility from 1/3 of the cluster data to 1/4. The new node will need to accept 1/4 of the total cluster data as a part of joining. That's the basics but you can extrapolate from there. I would be happy to get on zoom and talk it over. Here's my scheduling link: https://calendly.com/patrick-mcfadin/30min_zoom Patrick On Wed, Jun 29, 2022 at 5:13 AM Norman Menfel wrote: > Hi all, > > apologies for writing to this mailing list but I tried the user mailing > list, 2 slack channels, reddit and 3 discord channels and got horrible and > confused answers. > > I'm working on a school project trying to reproduce the tokens > distribution algorithm described in the dynamo db paper. All I want to > build is a cluster where nodes can join/leave managing vnodes distribution > just like in Cassandra (I don't care about r/w, replication,) > > I believe I understand how everything works without vnodes. But everything > stops making sense when introducing vnodes. For example, when a new node > joins a cluster > new vnodes need to be created. Why adding vnodes does not create a massive > redistribution of data in the cluster? afterall, adding vnodes means that > every vnode in the cluster has to "give up" some data to other vnodes in > order to keep a balanced load across the cluster. > > From the documentation it seems like only the portion of the ring > associated with the node should soffer this redistribution but why does a > node have a portion of the partition ring associated with it when the > vnodes stored on the node may be from any portion of the ring? > > As you can see, I'm quite confused! I understand that to give me a full > answer may take you too much time but if you could just point me in the > right direction, tell me where should I look in the source code, or share > some links (I've already read anything on the apache > website/datastax.Ive even read riak documentation trying to find clues) > that would be amazing! > > Thanks a lot for your time and keep up the great work, I love Cassandra! > Norman >
help for a side project
Hi all, apologies for writing to this mailing list but I tried the user mailing list, 2 slack channels, reddit and 3 discord channels and got horrible and confused answers. I'm working on a school project trying to reproduce the tokens distribution algorithm described in the dynamo db paper. All I want to build is a cluster where nodes can join/leave managing vnodes distribution just like in Cassandra (I don't care about r/w, replication,) I believe I understand how everything works without vnodes. But everything stops making sense when introducing vnodes. For example, when a new node joins a cluster new vnodes need to be created. Why adding vnodes does not create a massive redistribution of data in the cluster? afterall, adding vnodes means that every vnode in the cluster has to "give up" some data to other vnodes in order to keep a balanced load across the cluster. >From the documentation it seems like only the portion of the ring associated with the node should soffer this redistribution but why does a node have a portion of the partition ring associated with it when the vnodes stored on the node may be from any portion of the ring? As you can see, I'm quite confused! I understand that to give me a full answer may take you too much time but if you could just point me in the right direction, tell me where should I look in the source code, or share some links (I've already read anything on the apache website/datastax.Ive even read riak documentation trying to find clues) that would be amazing! Thanks a lot for your time and keep up the great work, I love Cassandra! Norman