Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

David Capwell Wed, 24 May 2023 10:15:42 -0700

> The time spent on getting that running has been a fair few hours, where we 
> could have cut many manual module releases in that time.


We spent a few hours getting submodules working, and we no longer need to 
release for every single commit…

$ git log b9025e59395f47535e4ed1fec20b1186cdb07db8..HEAD | grep 'commit ' | wc 
-l
      12

So looking at accord trunk, we needed 12 votes for a release, and each vote is 
a min of 3 days, so 36 days of overhead vs 5 hours of work?

There are some hiccups, but this is mostly in the “never did this before, how 
do I setup” case, so something that prob can be improved… once you do your 
first patch the issues kinda go away.  

One thing that can be annoying is for people who don’t use work trees and 
switch between trunk and cassandra-4.x in the same directory… I am not sure if 
the issues here are my scripts, or git getting confused…. If you use work trees 
(I strongly recommend regardless of submodules or not) you don’t have these 
issues (my disk layout is below [1]).


> I'd like to discuss bringing cassandra-harry in-tree as a submodule

For accord, the main reason to keep it out of tree was to allow other projects 
to use the library (similar to RAFT libraries that exist for projects to use), 
but my mental model for Harry is that most of the code is Cassandra specific 
(models, converting timestamps to Cassandra data, etc.), so wondering if it 
makes sense in its own repo vs being in trunk directly?  Submodules do have 
their own overhead and edge cases, so I am mostly in favor of using for cases 
where the code must live outside of tree (such as jvm-dtest that lives out of 
tree as all branches need the same interfaces)



[1] I have a single git repo and use git worktrees to keep each branch in a 
isolated directory (this avoids the .git overhead in every directory)… my 
layout is

$ ls
3.0             3.11            4.0             4.1             cep-15-accord   
prs             trunk
$ ls prs
prs:
4.1             cep-15-accord   trunk


> On May 24, 2023, at 7:53 AM, Josh McKenzie <jmcken...@apache.org> wrote:
> 
>> I have submitted a proposal to Cassandra Summit for a 4-hour Harry workshop,
> I'm about to need to harry test for the paging across tombstone work for 
> https://issues.apache.org/jira/browse/CASSANDRA-18424 (that's where my own 
> overlapping fuzzing came in). In the process, I'll see if I can't distill 
> something really simple along the lines of how React approaches it 
> (https://react.dev/learn).
> 
> Ideally we'd be able to get something together that's a high level "In the 
> next 15 minutes, you will know and understand A-G and have access to N% of 
> the power of harry" kind of offer.
> 
> Honestly, there's a lot in our ecosystem where we could benefit from taking a 
> page from their book in terms of onboarding and getting started IMO.
> 
> On Wed, May 24, 2023, at 10:31 AM, Alex Petrov wrote:
>> > I wonder if a mini-onboarding session would be good as a community session 
>> > - go over Harry, how to run it, how to add a test?  Would that be the 
>> > right venue?  I just would like to see how we can not only plug it in to 
>> > regular CI but get everyone that wants to add a test be able to know how 
>> > to get started with it.
>> 
>> I have submitted a proposal to Cassandra Summit for a 4-hour Harry workshop, 
>> but unfortunately it got declined. Goes without saying, we can still do it 
>> online, time and resources permitting. But again, I do not think it should 
>> be barring us from making Harry a part of the codebase, as it already is. In 
>> fact, we can be iterating on the development quicker having it in-tree. 
>> 
>> We could go over some interesting examples such as testing 2i (SAI), 
>> modelling Group By tests, or testing repair. If there is enough appetite and 
>> collaboration in the community, I will see if we can pull something like 
>> that together. Input on _what_ you would like to see / hear / tested is also 
>> appreciated. Harry was developed out of a strong need for large-scale 
>> testing, which also has informed many of its APIs, but we can make it easier 
>> to access for interactive testing / unit tests. We have been doing a lot of 
>> that with Transactional Metadata, too. 
>> 
>> > I'll hold off on this until Alex Petrov chimes in. @Alex -> got any 
>> > thoughts here?
>> 
>> Yes, sorry for not responding on this thread earlier. I can not understate 
>> how excited I am about this, and how important I think this is. Time 
>> constraints are somehow hard to overcome, but I hope the results brought by 
>> TCM will make it all worth it.
>> 
>> On Wed, May 24, 2023, at 4:23 PM, Alex Petrov wrote:
>>> I think pulling Harry into the tree will make adoption easier for the 
>>> folks. I have been a bit swamped with Transactional Metadata work, but I 
>>> wanted to make some of the things we were using for testing TCM available 
>>> outside of TCM branch. This includes a bunch of helper methods to perform 
>>> operations on the clusters, data generation, and more useful stuff. Of 
>>> course, the question always remains about how much time I want to spend 
>>> porting it all to Gossip, but I think we can find a reasonable compromise. 
>>> 
>>> I would not set this improvement as a prerequisite to pulling Harry into 
>>> the main branch, but rather interpret it as a commitment from myself to 
>>> take community input and make it more approachable by the day. 
>>> 
>>> On Wed, May 24, 2023, at 2:44 PM, Josh McKenzie wrote:
>>>>> importantly it’s a million times better than the dtest-api process - 
>>>>> which stymies development due to the friction.
>>>> This is my major concern.
>>>> 
>>>> What prompted this thread was harry being external to the core codebase 
>>>> and the lack of adoption and usage of it having led to atrophy of certain 
>>>> aspects of it, which then led to redundant implementation of some fuzz 
>>>> testing and lost time.
>>>> 
>>>> We'd all be better served to have this closer to the main codebase as a 
>>>> forcing function to smooth out the rough edges, integrate it, and make it 
>>>> a collective artifact and first class citizen IMO.
>>>> 
>>>> I have similar opinions about the dtest-api.
>>>> 
>>>> 
>>>> On Wed, May 24, 2023, at 4:05 AM, Benedict wrote:
>>>>> 
>>>>> It’s not without hiccups, and I’m sure we have more to learn. But it 
>>>>> mostly just works, and importantly it’s a million times better than the 
>>>>> dtest-api process - which stymies development due to the friction.
>>>>> 
>>>>>> On 24 May 2023, at 08:39, Mick Semb Wever <m...@apache.org> wrote:
>>>>>> 
>>>>>> 
>>>>>> WRT git submodules and CASSANDRA-18204, are we happy with how it is 
>>>>>> working for accord ? 
>>>>>> 
>>>>>> The time spent on getting that running has been a fair few hours, where 
>>>>>> we could have cut many manual module releases in that time. 
>>>>>> 
>>>>>> David and folks working on accord ? 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, 23 May 2023 at 20:09, Josh McKenzie <jmcken...@apache.org 
>>>>>> <mailto:jmcken...@apache.org>> wrote:
>>>>>> 
>>>>>> I'll hold off on this until Alex Petrov chimes in. @Alex -> got any 
>>>>>> thoughts here?
>>>>>> 
>>>>>> On Tue, May 16, 2023, at 5:17 PM, Jeremy Hanna wrote:
>>>>>>> I think it would be great to onboard Harry more officially into the 
>>>>>>> project.  However it would be nice to perhaps do some sanity checking 
>>>>>>> outside of Apple folks to see how approachable it is.  That is, can 
>>>>>>> someone take it and just run it with the current readme without any 
>>>>>>> additional context?
>>>>>>> 
>>>>>>> I wonder if a mini-onboarding session would be good as a community 
>>>>>>> session - go over Harry, how to run it, how to add a test?  Would that 
>>>>>>> be the right venue?  I just would like to see how we can not only plug 
>>>>>>> it in to regular CI but get everyone that wants to add a test be able 
>>>>>>> to know how to get started with it.
>>>>>>> 
>>>>>>> Jeremy
>>>>>>> 
>>>>>>>> On May 16, 2023, at 1:34 PM, Abe Ratnofsky <a...@aber.io 
>>>>>>>> <mailto:a...@aber.io>> wrote:
>>>>>>>> 
>>>>>>>> Just to make sure I'm understanding the details, this would mean 
>>>>>>>> apache/cassandra-harry maintains its status as a separate repository, 
>>>>>>>> apache/cassandra references it as a submodule, and clones and builds 
>>>>>>>> Harry locally, rather than pulling a released JAR. We can then 
>>>>>>>> reference Harry as a library without maintaining public artifacts for 
>>>>>>>> it. Is that in line with what you're thinking?
>>>>>>>> 
>>>>>>>> > I'd also like to see us get a Harry run integrated as part of our 
>>>>>>>> > pre-commit CI
>>>>>>>> 
>>>>>>>> I'm a strong supporter of this, of course.
>>>>>>>> 
>>>>>>>>> On May 16, 2023, at 11:03 AM, Josh McKenzie <jmcken...@apache.org 
>>>>>>>>> <mailto:jmcken...@apache.org>> wrote:
>>>>>>>>> 
>>>>>>>>> Similar to what we've done with accord in 
>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-18204, I'd like to 
>>>>>>>>> discuss bringing cassandra-harry in-tree as a submodule. repo link: 
>>>>>>>>> https://github.com/apache/cassandra-harry
>>>>>>>>> 
>>>>>>>>> Given the value it's brought to the project's stabilization efforts 
>>>>>>>>> and the movement of other things in the ecosystem to being more 
>>>>>>>>> integrated (accord, build-scripts 
>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-18133), I think 
>>>>>>>>> having the testing framework better localized and integrated would be 
>>>>>>>>> a net benefit for adoption, awareness, maintenance, and tighter 
>>>>>>>>> workflows as we troubleshoot future failures it surfaces.
>>>>>>>>> 
>>>>>>>>> I'd also like to see us get a Harry run integrated as part of our 
>>>>>>>>> pre-commit CI (a 5 minute simple soak test for instance) and having 
>>>>>>>>> that local in this fashion should make that a cleaner integration as 
>>>>>>>>> well.
>>>>>>>>> 
>>>>>>>>> Thoughts?

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

Reply via email to