Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

Josh McKenzie Wed, 31 May 2023 12:37:35 -0700

Bumping into worktree + submodule pain on some harry related work; it looks 
like "git worktree" and submodules are not currently fully implemented:


https://git-scm.com/docs/git-worktree#_bugs
BUGS

Multiple checkout in general is still experimental, and the support for 
submodules is incomplete. It is NOT recommended to make multiple checkouts of a 
superproject.

I rely pretty heavily on worktrees and I know a lot of other folks who do too. 
This is a dealbreaker for me in terms of adding anything else as a submodule 
and I'd like to know if the accord folks have been running into any worktree 
related woes w/the accord integration.


On Sun, May 28, 2023, at 10:14 AM, Alex Petrov wrote:
> Regarding approachability, one of the things I thought is worth adding is a 
> DSL. I feel like there's enough functionality in Harry and there's enough 
> information for anyone who needs to write even an involved test out there, 
> but adoption doesn't usually start with complex use-cases, so it could be 
> that making it extremely simple to generate the data and validating that 
> written data is where it's supposed to be, should help adoption a lot. 
> Unfortunately, more complex use-cases such as group-by support, or SAI 
> testing will require a bit more knowledge and writing an involved model, so I 
> do not see any shortcuts we can take here.
> 
> > I do think that moving Harry in-tree would improve approachability
> 
> I think it's similar as it is with in-jvm dtest api. I feel like we wold 
> evolve it more actively if we didn't have to cut a release before every 
> commit. In other words, I think that changing Harry code and extending 
> functionality will be easier, which I think will eventually lead to quicker 
> adoption. But of course the act of moving itself does not increase adoption, 
> it just comes from better ergonomics.
> 
> 
> On Thu, May 25, 2023, at 8:03 PM, Abe Ratnofsky wrote:
>> I'm seeing a few distinct topics here:
>> 
>> 1. Harry's adoption and approachability
>> 
>> I agree that approachability is one of Harry's main improvement areas right 
>> now. If our goal is to produce a fuzz testing framework for the Cassandra 
>> project, then adoption by contributors and usage for new feature development 
>> are reasonable indicators for whether we're achieving that goal. If Harry is 
>> not getting adopted by contributors outside of Apple, and is not getting 
>> used for new feature development, then we should make an effort to 
>> understand why. I don't think that a several-hour seminar is the best point 
>> of leverage to achieve those goals.
>> 
>> Here's what I think we do need:
>> 
>> - The README should be understandable by anyone interested in writing a fuzz 
>> test
>> - Example tests should be runnable from a fresh clone of Cassandra, in an 
>> IDE or on the command line
>> - Examples of how we would test new features (like CEP-7, CEP-29, etc) with 
>> the fuzz testing framework
>> 
>> I find the JVM dtest framework accomplishes similar goals, and one reason is 
>> because there are plenty of examples, and it's relatively easy to copy and 
>> paste one example and have it do what you'd like. I believe the same 
>> approach would work for a fuzz testing framework.
>> 
>> Some of these tasks above are already done for Harry, such as better IDE 
>> support for samples. This will be available in OSS Harry shortly.
>> 
>> 2. Moving Harry in-tree vs. in submodule
>> 
>> As I understand it, making Harry a submodule of Cassandra would make it 
>> easier to deal with versioning, since we wouldn't have to do the entire 
>> release dance we need to do for dtest-api, but I don't see this as a big 
>> improvement to approachability.
>> 
>> I do think that moving Harry in-tree would improve approachability, for the 
>> same reason as the JVM dtests. It's nice to write a feature or fix, find a 
>> similar JVM dtest, copy, paste, and edit, and have something useful.
>> 
>> 3. General subdivision of Cassandra projects
>> 
>> This topic has come up quite a few times recently - around shared utilities 
>> (CEP-10 concurrency primitives, etc), dtest-api, query parser, etc. The 
>> project has tried out a few different approaches on composition of separate 
>> projects. Hopefully in the near future we find the one that works best and 
>> can start this process of splitting out libraries.
>> 
>> --
>> Abe
>> 
>>> On May 25, 2023, at 6:36 AM, Josh McKenzie <[email protected]> wrote:
>>> 
>>>> I would really like us to split out utilities into a common project
>>> +1 to the sentiment.
>>> 
>>> Would also advocate strongly for it being more tightly integrated with the 
>>> base project than what we've been doing with our ecosystem (i.e. completely 
>>> separate projects, not submodules), mostly from a discoverability and 
>>> workflow standpoint.
>>> 
>>> I'm definitely salty about having to have 4 IDE's / projects open just to 
>>> work on the entire stack.
>>> 
>>> On Thu, May 25, 2023, at 5:05 AM, Alex Petrov wrote:
>>>> This was not a talk, but rather an interactive workshop, unfortunately 
>>>> will not work in a recorded way, but I am trying to work out ways to 
>>>> preserve this.
>>>> 
>>>> On Thu, May 25, 2023, at 10:26 AM, Claude Warren, Jr via dev wrote:
>>>>> Since the talk was not accepted for Cassandra Summit, would it be 
>>>>> possible to record it as a simple youtube video and publish it so that 
>>>>> the detailed information about how to use Harry is not lost?
>>>>> 
>>>>> On Thu, May 25, 2023 at 7:36 AM Alex Petrov <[email protected]> wrote:
>>>>>> __
>>>>>> While we are at it, we may also want to pull the in-jvm dtest API as a 
>>>>>> submodule, and actually move some tests that are common between the 
>>>>>> branches there.
>>>>>> 
>>>>>> On Thu, May 25, 2023, at 6:03 AM, Caleb Rackliffe wrote:
>>>>>>> Isn’t the other reason Accord works well as a submodule that it has no 
>>>>>>> dependencies on C* proper? Harry does at the moment, right? (Not that 
>>>>>>> we couldn’t address that…just trying to think this through…)
>>>>>>> 
>>>>>>>> On May 24, 2023, at 6:54 PM, Benedict <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> In this case Harry is a testing module - it’s not something we will 
>>>>>>>> develop in tandem with C* releases, and we will want improvements to 
>>>>>>>> be applied across all branches.
>>>>>>>> 
>>>>>>>> So it seems a natural fit for submodules to me.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 24 May 2023, at 21:09, Caleb Rackliffe <[email protected]> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> > Submodules do have their own overhead and edge cases, so I am 
>>>>>>>>> > mostly in favor of using for cases where the code must live outside 
>>>>>>>>> > of tree (such as jvm-dtest that lives out of tree as all branches 
>>>>>>>>> > need the same interfaces)
>>>>>>>>> 
>>>>>>>>> Agreed. Basically where I've ended up on this topic.
>>>>>>>>> 
>>>>>>>>> > We could go over some interesting examples such as testing 2i (SAI)
>>>>>>>>> 
>>>>>>>>> +100
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, May 24, 2023 at 1:40 PM Alex Petrov <[email protected]> 
>>>>>>>>> wrote:
>>>>>>>>>> __
>>>>>>>>>> > I'm about to need to harry test for the paging across tombstone 
>>>>>>>>>> > work for https://issues.apache.org/jira/browse/CASSANDRA-18424 
>>>>>>>>>> > (that's where my own overlapping fuzzing came in). In the process, 
>>>>>>>>>> > I'll see if I can't distill something really simple along the 
>>>>>>>>>> > lines of how React approaches it (https://react.dev/learn).
>>>>>>>>>> 
>>>>>>>>>> We can pick that up as an example, sure. 
>>>>>>>>>> 
>>>>>>>>>> On Wed, May 24, 2023, at 4:53 PM, Josh McKenzie wrote:
>>>>>>>>>>>> I have submitted a proposal to Cassandra Summit for a 4-hour Harry 
>>>>>>>>>>>> workshop,
>>>>>>>>>>> I'm about to need to harry test for the paging across tombstone 
>>>>>>>>>>> work for https://issues.apache.org/jira/browse/CASSANDRA-18424 
>>>>>>>>>>> (that's where my own overlapping fuzzing came in). In the process, 
>>>>>>>>>>> I'll see if I can't distill something really simple along the lines 
>>>>>>>>>>> of how React approaches it (https://react.dev/learn).
>>>>>>>>>>> 
>>>>>>>>>>> Ideally we'd be able to get something together that's a high level 
>>>>>>>>>>> "In the next 15 minutes, you will know and understand A-G and have 
>>>>>>>>>>> access to N% of the power of harry" kind of offer.
>>>>>>>>>>> 
>>>>>>>>>>> Honestly, there's a *lot* in our ecosystem where we could benefit 
>>>>>>>>>>> from taking a page from their book in terms of onboarding and 
>>>>>>>>>>> getting started IMO.
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, May 24, 2023, at 10:31 AM, Alex Petrov wrote:
>>>>>>>>>>>> > I wonder if a mini-onboarding session would be good as a 
>>>>>>>>>>>> > community session - go over Harry, how to run it, how to add a 
>>>>>>>>>>>> > test?  Would that be the right venue?  I just would like to see 
>>>>>>>>>>>> > how we can not only plug it in to regular CI but get everyone 
>>>>>>>>>>>> > that wants to add a test be able to know how to get started with 
>>>>>>>>>>>> > it.
>>>>>>>>>>>> 
>>>>>>>>>>>> I have submitted a proposal to Cassandra Summit for a 4-hour Harry 
>>>>>>>>>>>> workshop, but unfortunately it got declined. Goes without saying, 
>>>>>>>>>>>> we can still do it online, time and resources permitting. But 
>>>>>>>>>>>> again, I do not think it should be barring us from making Harry a 
>>>>>>>>>>>> part of the codebase, as it already is. In fact, we can be 
>>>>>>>>>>>> iterating on the development quicker having it in-tree. 
>>>>>>>>>>>> 
>>>>>>>>>>>> We could go over some interesting examples such as testing 2i 
>>>>>>>>>>>> (SAI), modelling Group By tests, or testing repair. If there is 
>>>>>>>>>>>> enough appetite and collaboration in the community, I will see if 
>>>>>>>>>>>> we can pull something like that together. Input on _what_ you 
>>>>>>>>>>>> would like to see / hear / tested is also appreciated. Harry was 
>>>>>>>>>>>> developed out of a strong need for large-scale testing, which also 
>>>>>>>>>>>> has informed many of its APIs, but we can make it easier to access 
>>>>>>>>>>>> for interactive testing / unit tests. We have been doing a lot of 
>>>>>>>>>>>> that with Transactional Metadata, too. 
>>>>>>>>>>>> 
>>>>>>>>>>>> > I'll hold off on this until Alex Petrov chimes in. @Alex -> got 
>>>>>>>>>>>> > any thoughts here?
>>>>>>>>>>>> 
>>>>>>>>>>>> Yes, sorry for not responding on this thread earlier. I can not 
>>>>>>>>>>>> understate how excited I am about this, and how important I think 
>>>>>>>>>>>> this is. Time constraints are somehow hard to overcome, but I hope 
>>>>>>>>>>>> the results brought by TCM will make it all worth it.
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, May 24, 2023, at 4:23 PM, Alex Petrov wrote:
>>>>>>>>>>>>> I think pulling Harry into the tree will make adoption easier for 
>>>>>>>>>>>>> the folks. I have been a bit swamped with Transactional Metadata 
>>>>>>>>>>>>> work, but I wanted to make some of the things we were using for 
>>>>>>>>>>>>> testing TCM available outside of TCM branch. This includes a 
>>>>>>>>>>>>> bunch of helper methods to perform operations on the clusters, 
>>>>>>>>>>>>> data generation, and more useful stuff. Of course, the question 
>>>>>>>>>>>>> always remains about how much time I want to spend porting it all 
>>>>>>>>>>>>> to Gossip, but I think we can find a reasonable compromise. 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I would not set this improvement as a prerequisite to pulling 
>>>>>>>>>>>>> Harry into the main branch, but rather interpret it as a 
>>>>>>>>>>>>> commitment from myself to take community input and make it more 
>>>>>>>>>>>>> approachable by the day. 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, May 24, 2023, at 2:44 PM, Josh McKenzie wrote:
>>>>>>>>>>>>>>> importantly it’s a million times better than the dtest-api 
>>>>>>>>>>>>>>> process - which stymies development due to the friction.
>>>>>>>>>>>>>> This is my major concern.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> What prompted this thread was harry being external to the core 
>>>>>>>>>>>>>> codebase and the lack of adoption and usage of it having led to 
>>>>>>>>>>>>>> atrophy of certain aspects of it, which then led to redundant 
>>>>>>>>>>>>>> implementation of some fuzz testing and lost time.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> We'd all be better served to have this closer to the main 
>>>>>>>>>>>>>> codebase as a forcing function to smooth out the rough edges, 
>>>>>>>>>>>>>> integrate it, and make it a collective artifact and first class 
>>>>>>>>>>>>>> citizen IMO.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I have similar opinions about the dtest-api.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, May 24, 2023, at 4:05 AM, Benedict wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> It’s not without hiccups, and I’m sure we have more to learn. 
>>>>>>>>>>>>>>> But it mostly just works, and importantly it’s a million times 
>>>>>>>>>>>>>>> better than the dtest-api process - which stymies development 
>>>>>>>>>>>>>>> due to the friction.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 24 May 2023, at 08:39, Mick Semb Wever <[email protected]> 
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> WRT git submodules and CASSANDRA-18204, are we happy with how 
>>>>>>>>>>>>>>>> it is working for accord ? 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The time spent on getting that running has been a fair few 
>>>>>>>>>>>>>>>> hours, where we could have cut many manual module releases in 
>>>>>>>>>>>>>>>> that time. 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> David and folks working on accord ? 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, 23 May 2023 at 20:09, Josh McKenzie 
>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>> __
>>>>>>>>>>>>>>>>> I'll hold off on this until Alex Petrov chimes in. @Alex -> 
>>>>>>>>>>>>>>>>> got any thoughts here?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Tue, May 16, 2023, at 5:17 PM, Jeremy Hanna wrote:
>>>>>>>>>>>>>>>>>> I think it would be great to onboard Harry more officially 
>>>>>>>>>>>>>>>>>> into the project.  However it would be nice to perhaps do 
>>>>>>>>>>>>>>>>>> some sanity checking outside of Apple folks to see how 
>>>>>>>>>>>>>>>>>> approachable it is.  That is, can someone take it and just 
>>>>>>>>>>>>>>>>>> run it with the current readme without any additional 
>>>>>>>>>>>>>>>>>> context?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I wonder if a mini-onboarding session would be good as a 
>>>>>>>>>>>>>>>>>> community session - go over Harry, how to run it, how to add 
>>>>>>>>>>>>>>>>>> a test?  Would that be the right venue?  I just would like 
>>>>>>>>>>>>>>>>>> to see how we can not only plug it in to regular CI but get 
>>>>>>>>>>>>>>>>>> everyone that wants to add a test be able to know how to get 
>>>>>>>>>>>>>>>>>> started with it.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Jeremy
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On May 16, 2023, at 1:34 PM, Abe Ratnofsky <[email protected]> 
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Just to make sure I'm understanding the details, this would 
>>>>>>>>>>>>>>>>>>> mean apache/cassandra-harry maintains its status as a 
>>>>>>>>>>>>>>>>>>> separate repository, apache/cassandra references it as a 
>>>>>>>>>>>>>>>>>>> submodule, and clones and builds Harry locally, rather than 
>>>>>>>>>>>>>>>>>>> pulling a released JAR. We can then reference Harry as a 
>>>>>>>>>>>>>>>>>>> library without maintaining public artifacts for it. Is 
>>>>>>>>>>>>>>>>>>> that in line with what you're thinking?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> > I'd also like to see us get a Harry run integrated as 
>>>>>>>>>>>>>>>>>>> > part of our pre-commit CI
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I'm a strong supporter of this, of course.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On May 16, 2023, at 11:03 AM, Josh McKenzie 
>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Similar to what we've done with accord in 
>>>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-18204, I'd 
>>>>>>>>>>>>>>>>>>>> like to discuss bringing cassandra-harry in-tree as a 
>>>>>>>>>>>>>>>>>>>> submodule. repo link: 
>>>>>>>>>>>>>>>>>>>> https://github.com/apache/cassandra-harry
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Given the value it's brought to the project's 
>>>>>>>>>>>>>>>>>>>> stabilization efforts and the movement of other things in 
>>>>>>>>>>>>>>>>>>>> the ecosystem to being more integrated (accord, 
>>>>>>>>>>>>>>>>>>>> build-scripts 
>>>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-18133), I 
>>>>>>>>>>>>>>>>>>>> think having the testing framework better localized and 
>>>>>>>>>>>>>>>>>>>> integrated would be a net benefit for adoption, awareness, 
>>>>>>>>>>>>>>>>>>>> maintenance, and tighter workflows as we troubleshoot 
>>>>>>>>>>>>>>>>>>>> future failures it surfaces.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I'd also like to see us get a Harry run integrated as part 
>>>>>>>>>>>>>>>>>>>> of our pre-commit CI (a 5 minute simple soak test for 
>>>>>>>>>>>>>>>>>>>> instance) and having that local in this fashion should 
>>>>>>>>>>>>>>>>>>>> make that a cleaner integration as well.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thoughts?
>

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

Reply via email to