Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

David Capwell Thu, 01 Jun 2023 10:15:43 -0700

Most edge cases we have seen in Accord are working with feature branches from 
other authors where we use relative paths to make sure the git@ vs https:// 
doesn’t become a problem for CI (submodule points to https:// to work in CI, 
but if you do that during feature development it gets annoying to push to 
GitHub… so we do ../cassandra-accord.git so git respects w/e protocol you are 
using).  In 1-2 peoples environments, when they checked out another authors 
logic the C* remote was correct, but the Accord one was still pointing to 
Apache (which doesn’t have the feature branch)…. This is trivial to fix, and 
might be a bug with our git hooks…. But still calling out as it has been an 
issue.


Josh, do you see any reports on what isn’t working?  I think most people don’t 
touch 1% of what git can do… so it might be that 10% is broken but that no one 
in our domain actually touches that path?

> On May 31, 2023, at 12:36 PM, Josh McKenzie <jmcken...@apache.org> wrote:
> 
> Bumping into worktree + submodule pain on some harry related work; it looks 
> like "git worktree" and submodules are not currently fully implemented:
> 
> https://git-scm.com/docs/git-worktree#_bugs
> BUGS
> Multiple checkout in general is still experimental, and the support for 
> submodules is incomplete. It is NOT recommended to make multiple checkouts of 
> a superproject.
> 
> I rely pretty heavily on worktrees and I know a lot of other folks who do 
> too. This is a dealbreaker for me in terms of adding anything else as a 
> submodule and I'd like to know if the accord folks have been running into any 
> worktree related woes w/the accord integration.
> 
> 
> On Sun, May 28, 2023, at 10:14 AM, Alex Petrov wrote:
>> Regarding approachability, one of the things I thought is worth adding is a 
>> DSL. I feel like there's enough functionality in Harry and there's enough 
>> information for anyone who needs to write even an involved test out there, 
>> but adoption doesn't usually start with complex use-cases, so it could be 
>> that making it extremely simple to generate the data and validating that 
>> written data is where it's supposed to be, should help adoption a lot. 
>> Unfortunately, more complex use-cases such as group-by support, or SAI 
>> testing will require a bit more knowledge and writing an involved model, so 
>> I do not see any shortcuts we can take here.
>> 
>> > I do think that moving Harry in-tree would improve approachability
>> 
>> I think it's similar as it is with in-jvm dtest api. I feel like we wold 
>> evolve it more actively if we didn't have to cut a release before every 
>> commit. In other words, I think that changing Harry code and extending 
>> functionality will be easier, which I think will eventually lead to quicker 
>> adoption. But of course the act of moving itself does not increase adoption, 
>> it just comes from better ergonomics.
>> 
>> 
>> On Thu, May 25, 2023, at 8:03 PM, Abe Ratnofsky wrote:
>>> I'm seeing a few distinct topics here:
>>> 
>>> 1. Harry's adoption and approachability
>>> 
>>> I agree that approachability is one of Harry's main improvement areas right 
>>> now. If our goal is to produce a fuzz testing framework for the Cassandra 
>>> project, then adoption by contributors and usage for new feature 
>>> development are reasonable indicators for whether we're achieving that 
>>> goal. If Harry is not getting adopted by contributors outside of Apple, and 
>>> is not getting used for new feature development, then we should make an 
>>> effort to understand why. I don't think that a several-hour seminar is the 
>>> best point of leverage to achieve those goals.
>>> 
>>> Here's what I think we do need:
>>> 
>>> - The README should be understandable by anyone interested in writing a 
>>> fuzz test
>>> - Example tests should be runnable from a fresh clone of Cassandra, in an 
>>> IDE or on the command line
>>> - Examples of how we would test new features (like CEP-7, CEP-29, etc) with 
>>> the fuzz testing framework
>>> 
>>> I find the JVM dtest framework accomplishes similar goals, and one reason 
>>> is because there are plenty of examples, and it's relatively easy to copy 
>>> and paste one example and have it do what you'd like. I believe the same 
>>> approach would work for a fuzz testing framework.
>>> 
>>> Some of these tasks above are already done for Harry, such as better IDE 
>>> support for samples. This will be available in OSS Harry shortly.
>>> 
>>> 2. Moving Harry in-tree vs. in submodule
>>> 
>>> As I understand it, making Harry a submodule of Cassandra would make it 
>>> easier to deal with versioning, since we wouldn't have to do the entire 
>>> release dance we need to do for dtest-api, but I don't see this as a big 
>>> improvement to approachability.
>>> 
>>> I do think that moving Harry in-tree would improve approachability, for the 
>>> same reason as the JVM dtests. It's nice to write a feature or fix, find a 
>>> similar JVM dtest, copy, paste, and edit, and have something useful.
>>> 
>>> 3. General subdivision of Cassandra projects
>>> 
>>> This topic has come up quite a few times recently - around shared utilities 
>>> (CEP-10 concurrency primitives, etc), dtest-api, query parser, etc. The 
>>> project has tried out a few different approaches on composition of separate 
>>> projects. Hopefully in the near future we find the one that works best and 
>>> can start this process of splitting out libraries.
>>> 
>>> --
>>> Abe
>>> 
>>>> On May 25, 2023, at 6:36 AM, Josh McKenzie <jmcken...@apache.org> wrote:
>>>> 
>>>>> I would really like us to split out utilities into a common project
>>>> +1 to the sentiment.
>>>> 
>>>> Would also advocate strongly for it being more tightly integrated with the 
>>>> base project than what we've been doing with our ecosystem (i.e. 
>>>> completely separate projects, not submodules), mostly from a 
>>>> discoverability and workflow standpoint.
>>>> 
>>>> I'm definitely salty about having to have 4 IDE's / projects open just to 
>>>> work on the entire stack.
>>>> 
>>>> On Thu, May 25, 2023, at 5:05 AM, Alex Petrov wrote:
>>>>> This was not a talk, but rather an interactive workshop, unfortunately 
>>>>> will not work in a recorded way, but I am trying to work out ways to 
>>>>> preserve this.
>>>>> 
>>>>> On Thu, May 25, 2023, at 10:26 AM, Claude Warren, Jr via dev wrote:
>>>>>> Since the talk was not accepted for Cassandra Summit, would it be 
>>>>>> possible to record it as a simple youtube video and publish it so that 
>>>>>> the detailed information about how to use Harry is not lost?
>>>>>> 
>>>>>> On Thu, May 25, 2023 at 7:36 AM Alex Petrov <al...@coffeenco.de 
>>>>>> <mailto:al...@coffeenco.de>> wrote:
>>>>>> 
>>>>>> While we are at it, we may also want to pull the in-jvm dtest API as a 
>>>>>> submodule, and actually move some tests that are common between the 
>>>>>> branches there.
>>>>>> 
>>>>>> On Thu, May 25, 2023, at 6:03 AM, Caleb Rackliffe wrote:
>>>>>>> Isn’t the other reason Accord works well as a submodule that it has no 
>>>>>>> dependencies on C* proper? Harry does at the moment, right? (Not that 
>>>>>>> we couldn’t address that…just trying to think this through…)
>>>>>>> 
>>>>>>>> On May 24, 2023, at 6:54 PM, Benedict <bened...@apache.org 
>>>>>>>> <mailto:bened...@apache.org>> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> In this case Harry is a testing module - it’s not something we will 
>>>>>>>> develop in tandem with C* releases, and we will want improvements to 
>>>>>>>> be applied across all branches.
>>>>>>>> 
>>>>>>>> So it seems a natural fit for submodules to me.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 24 May 2023, at 21:09, Caleb Rackliffe <calebrackli...@gmail.com 
>>>>>>>>> <mailto:calebrackli...@gmail.com>> wrote:
>>>>>>>>> 
>>>>>>>>> > Submodules do have their own overhead and edge cases, so I am 
>>>>>>>>> > mostly in favor of using for cases where the code must live outside 
>>>>>>>>> > of tree (such as jvm-dtest that lives out of tree as all branches 
>>>>>>>>> > need the same interfaces)
>>>>>>>>> 
>>>>>>>>> Agreed. Basically where I've ended up on this topic.
>>>>>>>>> 
>>>>>>>>> > We could go over some interesting examples such as testing 2i (SAI)
>>>>>>>>> 
>>>>>>>>> +100
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, May 24, 2023 at 1:40 PM Alex Petrov <al...@coffeenco.de 
>>>>>>>>> <mailto:al...@coffeenco.de>> wrote:
>>>>>>>>> 
>>>>>>>>> > I'm about to need to harry test for the paging across tombstone 
>>>>>>>>> > work for https://issues.apache.org/jira/browse/CASSANDRA-18424 
>>>>>>>>> > (that's where my own overlapping fuzzing came in). In the process, 
>>>>>>>>> > I'll see if I can't distill something really simple along the lines 
>>>>>>>>> > of how React approaches it (https://react.dev/learn).
>>>>>>>>> 
>>>>>>>>> We can pick that up as an example, sure. 
>>>>>>>>> 
>>>>>>>>> On Wed, May 24, 2023, at 4:53 PM, Josh McKenzie wrote:
>>>>>>>>>>> I have submitted a proposal to Cassandra Summit for a 4-hour Harry 
>>>>>>>>>>> workshop,
>>>>>>>>>> I'm about to need to harry test for the paging across tombstone work 
>>>>>>>>>> for https://issues.apache.org/jira/browse/CASSANDRA-18424 (that's 
>>>>>>>>>> where my own overlapping fuzzing came in). In the process, I'll see 
>>>>>>>>>> if I can't distill something really simple along the lines of how 
>>>>>>>>>> React approaches it (https://react.dev/learn).
>>>>>>>>>> 
>>>>>>>>>> Ideally we'd be able to get something together that's a high level 
>>>>>>>>>> "In the next 15 minutes, you will know and understand A-G and have 
>>>>>>>>>> access to N% of the power of harry" kind of offer.
>>>>>>>>>> 
>>>>>>>>>> Honestly, there's a lot in our ecosystem where we could benefit from 
>>>>>>>>>> taking a page from their book in terms of onboarding and getting 
>>>>>>>>>> started IMO.
>>>>>>>>>> 
>>>>>>>>>> On Wed, May 24, 2023, at 10:31 AM, Alex Petrov wrote:
>>>>>>>>>>> > I wonder if a mini-onboarding session would be good as a 
>>>>>>>>>>> > community session - go over Harry, how to run it, how to add a 
>>>>>>>>>>> > test?  Would that be the right venue?  I just would like to see 
>>>>>>>>>>> > how we can not only plug it in to regular CI but get everyone 
>>>>>>>>>>> > that wants to add a test be able to know how to get started with 
>>>>>>>>>>> > it.
>>>>>>>>>>> 
>>>>>>>>>>> I have submitted a proposal to Cassandra Summit for a 4-hour Harry 
>>>>>>>>>>> workshop, but unfortunately it got declined. Goes without saying, 
>>>>>>>>>>> we can still do it online, time and resources permitting. But 
>>>>>>>>>>> again, I do not think it should be barring us from making Harry a 
>>>>>>>>>>> part of the codebase, as it already is. In fact, we can be 
>>>>>>>>>>> iterating on the development quicker having it in-tree. 
>>>>>>>>>>> 
>>>>>>>>>>> We could go over some interesting examples such as testing 2i 
>>>>>>>>>>> (SAI), modelling Group By tests, or testing repair. If there is 
>>>>>>>>>>> enough appetite and collaboration in the community, I will see if 
>>>>>>>>>>> we can pull something like that together. Input on _what_ you would 
>>>>>>>>>>> like to see / hear / tested is also appreciated. Harry was 
>>>>>>>>>>> developed out of a strong need for large-scale testing, which also 
>>>>>>>>>>> has informed many of its APIs, but we can make it easier to access 
>>>>>>>>>>> for interactive testing / unit tests. We have been doing a lot of 
>>>>>>>>>>> that with Transactional Metadata, too. 
>>>>>>>>>>> 
>>>>>>>>>>> > I'll hold off on this until Alex Petrov chimes in. @Alex -> got 
>>>>>>>>>>> > any thoughts here?
>>>>>>>>>>> 
>>>>>>>>>>> Yes, sorry for not responding on this thread earlier. I can not 
>>>>>>>>>>> understate how excited I am about this, and how important I think 
>>>>>>>>>>> this is. Time constraints are somehow hard to overcome, but I hope 
>>>>>>>>>>> the results brought by TCM will make it all worth it.
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, May 24, 2023, at 4:23 PM, Alex Petrov wrote:
>>>>>>>>>>>> I think pulling Harry into the tree will make adoption easier for 
>>>>>>>>>>>> the folks. I have been a bit swamped with Transactional Metadata 
>>>>>>>>>>>> work, but I wanted to make some of the things we were using for 
>>>>>>>>>>>> testing TCM available outside of TCM branch. This includes a bunch 
>>>>>>>>>>>> of helper methods to perform operations on the clusters, data 
>>>>>>>>>>>> generation, and more useful stuff. Of course, the question always 
>>>>>>>>>>>> remains about how much time I want to spend porting it all to 
>>>>>>>>>>>> Gossip, but I think we can find a reasonable compromise. 
>>>>>>>>>>>> 
>>>>>>>>>>>> I would not set this improvement as a prerequisite to pulling 
>>>>>>>>>>>> Harry into the main branch, but rather interpret it as a 
>>>>>>>>>>>> commitment from myself to take community input and make it more 
>>>>>>>>>>>> approachable by the day. 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, May 24, 2023, at 2:44 PM, Josh McKenzie wrote:
>>>>>>>>>>>>>> importantly it’s a million times better than the dtest-api 
>>>>>>>>>>>>>> process - which stymies development due to the friction.
>>>>>>>>>>>>> This is my major concern.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What prompted this thread was harry being external to the core 
>>>>>>>>>>>>> codebase and the lack of adoption and usage of it having led to 
>>>>>>>>>>>>> atrophy of certain aspects of it, which then led to redundant 
>>>>>>>>>>>>> implementation of some fuzz testing and lost time.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> We'd all be better served to have this closer to the main 
>>>>>>>>>>>>> codebase as a forcing function to smooth out the rough edges, 
>>>>>>>>>>>>> integrate it, and make it a collective artifact and first class 
>>>>>>>>>>>>> citizen IMO.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I have similar opinions about the dtest-api.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, May 24, 2023, at 4:05 AM, Benedict wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> It’s not without hiccups, and I’m sure we have more to learn. 
>>>>>>>>>>>>>> But it mostly just works, and importantly it’s a million times 
>>>>>>>>>>>>>> better than the dtest-api process - which stymies development 
>>>>>>>>>>>>>> due to the friction.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 24 May 2023, at 08:39, Mick Semb Wever <m...@apache.org 
>>>>>>>>>>>>>>> <mailto:m...@apache.org>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> WRT git submodules and CASSANDRA-18204, are we happy with how 
>>>>>>>>>>>>>>> it is working for accord ? 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The time spent on getting that running has been a fair few 
>>>>>>>>>>>>>>> hours, where we could have cut many manual module releases in 
>>>>>>>>>>>>>>> that time. 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> David and folks working on accord ? 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, 23 May 2023 at 20:09, Josh McKenzie 
>>>>>>>>>>>>>>> <jmcken...@apache.org <mailto:jmcken...@apache.org>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I'll hold off on this until Alex Petrov chimes in. @Alex -> got 
>>>>>>>>>>>>>>> any thoughts here?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, May 16, 2023, at 5:17 PM, Jeremy Hanna wrote:
>>>>>>>>>>>>>>>> I think it would be great to onboard Harry more officially 
>>>>>>>>>>>>>>>> into the project.  However it would be nice to perhaps do some 
>>>>>>>>>>>>>>>> sanity checking outside of Apple folks to see how approachable 
>>>>>>>>>>>>>>>> it is.  That is, can someone take it and just run it with the 
>>>>>>>>>>>>>>>> current readme without any additional context?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I wonder if a mini-onboarding session would be good as a 
>>>>>>>>>>>>>>>> community session - go over Harry, how to run it, how to add a 
>>>>>>>>>>>>>>>> test?  Would that be the right venue?  I just would like to 
>>>>>>>>>>>>>>>> see how we can not only plug it in to regular CI but get 
>>>>>>>>>>>>>>>> everyone that wants to add a test be able to know how to get 
>>>>>>>>>>>>>>>> started with it.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Jeremy
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On May 16, 2023, at 1:34 PM, Abe Ratnofsky <a...@aber.io 
>>>>>>>>>>>>>>>>> <mailto:a...@aber.io>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Just to make sure I'm understanding the details, this would 
>>>>>>>>>>>>>>>>> mean apache/cassandra-harry maintains its status as a 
>>>>>>>>>>>>>>>>> separate repository, apache/cassandra references it as a 
>>>>>>>>>>>>>>>>> submodule, and clones and builds Harry locally, rather than 
>>>>>>>>>>>>>>>>> pulling a released JAR. We can then reference Harry as a 
>>>>>>>>>>>>>>>>> library without maintaining public artifacts for it. Is that 
>>>>>>>>>>>>>>>>> in line with what you're thinking?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> > I'd also like to see us get a Harry run integrated as part 
>>>>>>>>>>>>>>>>> > of our pre-commit CI
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I'm a strong supporter of this, of course.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On May 16, 2023, at 11:03 AM, Josh McKenzie 
>>>>>>>>>>>>>>>>>> <jmcken...@apache.org <mailto:jmcken...@apache.org>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Similar to what we've done with accord in 
>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-18204, I'd 
>>>>>>>>>>>>>>>>>> like to discuss bringing cassandra-harry in-tree as a 
>>>>>>>>>>>>>>>>>> submodule. repo link: 
>>>>>>>>>>>>>>>>>> https://github.com/apache/cassandra-harry
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Given the value it's brought to the project's stabilization 
>>>>>>>>>>>>>>>>>> efforts and the movement of other things in the ecosystem to 
>>>>>>>>>>>>>>>>>> being more integrated (accord, build-scripts 
>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-18133), I 
>>>>>>>>>>>>>>>>>> think having the testing framework better localized and 
>>>>>>>>>>>>>>>>>> integrated would be a net benefit for adoption, awareness, 
>>>>>>>>>>>>>>>>>> maintenance, and tighter workflows as we troubleshoot future 
>>>>>>>>>>>>>>>>>> failures it surfaces.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I'd also like to see us get a Harry run integrated as part 
>>>>>>>>>>>>>>>>>> of our pre-commit CI (a 5 minute simple soak test for 
>>>>>>>>>>>>>>>>>> instance) and having that local in this fashion should make 
>>>>>>>>>>>>>>>>>> that a cleaner integration as well.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thoughts?

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

Reply via email to