Re: Cassandra project biweekly status update 2022-06-14

2022-06-29 Thread Josh McKenzie
> I don’t think it has to be all that complicated?
Definitely not. We've just never documented it afaict.

On Tue, Jun 28, 2022, at 2:58 PM, Benedict wrote:
> 
> I don’t think it has to be all that complicated?
> 
> If it’s a part of our UX it’s probably something we should maintain backwards 
> compatibility for.
> 
> If it’s part of our internal codebase, probably not. The only two “public” 
> APIs we have inside the codebase that I’m aware of are triggers and secondary 
> indexes, and these are provided with limited warranty and an expectation of 
> technical sophistication for their users. I think there has always been an 
> expectation that users of these features will bear the cost of migration to 
> any new API versions we might introduce between majors.
> 
> 
>> On 28 Jun 2022, at 19:39, Josh McKenzie  wrote:
>> 
>>> I think it is good to document further things and keep on doing it in time 
>>> when discussions happen. I can see this being a benefit both for users and 
>>> Cassandra developers.
>> Strong +1 from me here. Having guidance for people working on the codebase 
>> to understand what is and isn't considered a public API will help inform how 
>> we shape these things and keep things stable for our userbase.
>> 
>> On Sun, Jun 26, 2022, at 12:58 PM, Ekaterina Dimitrova wrote:
>>> “+1 to always, by default, maintaining compatibility.”
>>>  +1
>>> 
>>> “We also have the discussion wrt what are breaking changes. Are we 
>>> intending to evolve what interfaces and behaviour we provide, and to what 
>>> degree, compatibility over via these discussions/votes?”
>>> 
>>> While I do agree we cannot really have a fully exhaustive list, I think it 
>>> is good to document further things and keep on doing it in time when 
>>> discussions happen. I can see this being a benefit both for users and 
>>> Cassandra developers.
>>> 
>>> 
>>> On Thu, 16 Jun 2022 at 18:25, Mick Semb Wever  wrote:
 
 
> We've agreed in the past that we want to maintain compatibility and that 
> all changes will be done with compatibility in mind (see 
> https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle), 
> but we haven't clarified how we make the call on when to bump to next 
> major. 
 
 
 +1 to always, by default, maintaining compatibility.
 
 Note, a major bump isn't only about compatibility breakages per se, but a) 
 time to clean up deprecated code, and b) delineating upgrade paths. 
  
> The Release Lifecycle states "Introducing a backward incompatibility 
> change needs dev community approval via voting [voting open for 48 
> hours]." But this is under a section called "Outstanding questions beyond 
> the scope of this document", maybe it's about time we finalize this and 
> update this document?
 
 
 IIRC, though i can easily be wrong, this was meant for breaking changes 
 within a major, e.g. after a beta release. Not that the same formality 
 cannot also be applied to trunk dev, as it ensures a desired visibility, 
 though I wonder if we will solve it in practice most of the time with the 
 preceding [DISCUSS] thread.
 
 We also have the discussion wrt what are breaking changes. Are we 
 intending to evolve what interfaces and behaviour we provide, and to what 
 degree, compatibility over via these discussions/votes?
>> 


Re: help for a side project

2022-06-29 Thread Patrick McFadin
Hi Norman,

In short, any time you add a node to a cluster there will be a
redistribution of data and it will be proportional to the total number of
nodes you have in the cluster. VNodes just create smaller chunks and
distribute them around the cluster more. If you have a 3 node cluster with
a RF=1(for simplicity's sake) and add 1 node, every existing node has to
reduce its responsibility from 1/3 of the cluster data to 1/4. The new node
will need to accept 1/4 of the total cluster data as a part of joining.
That's the basics but you can extrapolate from there.

I would be happy to get on zoom and talk it over. Here's my scheduling
link: https://calendly.com/patrick-mcfadin/30min_zoom

Patrick

On Wed, Jun 29, 2022 at 5:13 AM Norman Menfel  wrote:

> Hi all,
>
> apologies for writing to this mailing list but I tried the user mailing
> list, 2 slack channels, reddit and 3 discord channels and got horrible and
> confused answers.
>
> I'm working on a school project trying to reproduce the tokens
> distribution algorithm described in the dynamo db paper. All I want to
> build is a cluster where nodes can join/leave managing vnodes distribution
> just like in Cassandra (I don't care about r/w, replication,)
>
> I believe I understand how everything works without vnodes. But everything
> stops making sense when introducing vnodes. For example, when a new node
> joins a cluster
> new vnodes need to be created. Why adding vnodes does not create a massive
> redistribution of data in the cluster? afterall, adding vnodes means that
> every vnode in the cluster has to "give up" some data to other vnodes in
> order to keep a balanced load across the cluster.
>
> From the documentation it seems like only the portion of the ring
> associated with the node should soffer this redistribution but why does a
> node have a portion of the partition ring associated with it when the
> vnodes stored on the node may be from any portion of the ring?
>
> As you can see, I'm quite confused! I understand that to give me a full
> answer may take you too much time but if you could just point me in the
> right direction, tell me where should I look in the source code, or share
> some links (I've already read anything on the apache
> website/datastax.Ive even read riak documentation trying to find clues)
> that would be amazing!
>
> Thanks a lot for your time and keep up the great work, I love Cassandra!
> Norman
>


help for a side project

2022-06-29 Thread Norman Menfel
Hi all,

apologies for writing to this mailing list but I tried the user mailing
list, 2 slack channels, reddit and 3 discord channels and got horrible and
confused answers.

I'm working on a school project trying to reproduce the tokens distribution
algorithm described in the dynamo db paper. All I want to build is a
cluster where nodes can join/leave managing vnodes distribution just like
in Cassandra (I don't care about r/w, replication,)

I believe I understand how everything works without vnodes. But everything
stops making sense when introducing vnodes. For example, when a new node
joins a cluster
new vnodes need to be created. Why adding vnodes does not create a massive
redistribution of data in the cluster? afterall, adding vnodes means that
every vnode in the cluster has to "give up" some data to other vnodes in
order to keep a balanced load across the cluster.

>From the documentation it seems like only the portion of the ring
associated with the node should soffer this redistribution but why does a
node have a portion of the partition ring associated with it when the
vnodes stored on the node may be from any portion of the ring?

As you can see, I'm quite confused! I understand that to give me a full
answer may take you too much time but if you could just point me in the
right direction, tell me where should I look in the source code, or share
some links (I've already read anything on the apache
website/datastax.Ive even read riak documentation trying to find clues)
that would be amazing!

Thanks a lot for your time and keep up the great work, I love Cassandra!
Norman