If we make sure all branches are using the latest “stable” accord then this is 6 commits (4 for C*, 1 for accord the stable branch, then 1 to merge into trunk)

If we’re modifying stable, we only need one commit per C* branch per release. We don’t need to immediately point C* to it. So there could plausibly be far fewer total commits this way, though the reality is hard to predict and will vary.

On 18 Jan 2023, at 20:45, David Capwell <dcapw...@apple.com> wrote:

Been out, sorry for just catching up now…

I feel this thread pidgin hold on the word Accord and ignored the fact we are dealing with this pain today with python/jvm dtest and trying to improve that would help the project…. We also have other related projects that we are developing in parallel to Cassandra such as Harry, and there is interest in exporting our utils + simulator for other projects to use…. We also depend on related projects such as JAMM which clog us from bumping JDK versions...

Accord is just 1 example of a Cassandra dependency needed for a release… by only focusing on Accord and “should it be external” this thread is ignoring the pain we face today and how we could improve.

We tried in-tree for in-jvm dtest and found that this broke every other commit… maintaining the APIs across all our supported branches was too hard to do and moving it outside of the tree helped make the upgrade tests more stable (there were breakage but less frequent)…. We currently have to release this for every patch, which has actually caused us to rely on class path ordering to have some branches fork the classes so they can avoid this….  We tried to do snapshot builds where the version contained the SHA, but this has the issue that snapshot builds “may” go away over time and made older SHAs no longer building… Jvm-dtest is in bad shape and really could benefit from us looking to improve this process…

We break python-dtest when cross-cutting changes are added as CI is hard to do correctly or not supported (testing downstream users (our 4 supported branches) is rarely done).  

We want to start using Harry as part of our test suite, so if a patch needs to change harry then what “should” we do? Do we block merging into Cassandra until we vote on a Harry release?

Maybe we should be asking what capabilities we need and how to address each?  I believe Mick has focused on this capabilities conversation and feel its 100% the best route to do, we should be listing out what we need to do our work and if/how the different solutions address this.

For me I need the following:

* be able to make cross-cutting changes in 1 ticket
** in my PR override CI to use my PRs for sub-projects
* commits to Cassandra should be reproducible and buildable
* downstream testing support… if we make a change to python-dtest or Harry we should know if this breaks Cassandra before merging and which supported branches
* [nice to have] be able to work with all subprojects in one IDE and not have to switch between windows while making cross-cutting changes
* [nice to have] commit understand dependencies and commits things in correct order

Now, for the “how”, I am open but see the two leading cases are: git submodule and script that mimics git submodules…. I have used other tools that boil down to fetching a list of repo/sha into specific directories and find them more annoying than git submodules…

For me, both ways address my needs above; I can make cross cutting change with easy and could change CI to build my changes rather than the HEAD of a specific branch.

To address Mick’s capabilities I think I saw the following (correct me if missing any):

 - you can no longer just `git clone …`  (and we clone automatically in a number of places)

But submodules and script that no longer works, but we can make this less painful by enhancing build.xml to make sure it builds out the gate; we can’t see all the code on a fresh commit but we would still be buildable

 - same with `git pull …` (easy to be left with out-of-sync submodules)

Correct, if you use submodules/script you have a text file saying what we “should” use, but this does not enforce actually using them… again we could make sure build.xml does the right thing, but this can be confusing for people who mainly build in IDE and don’t depend on build.xml until later in development… this is something we should think about…

A project I am familiar with has their build auto-inject git hooks to make sure things “just work”, we may be able to solve this in a similar way?

 - permanence from a git SHA no longer exists

Why is this?  The SHA points to other SHAs, so it is still immutable.  If we claim that pointing to other SHAs doesn’t count then why do library versions?  Both are immutable snapshots of code at a specific point in time?

 - our releases get more complicated (our source tarballs are the asf releases)

We don’t include our dependencies do we?  If so, then does it really?  If Accord is a library we use, why would we include it’s source in the build?  Isn’t it just another library from this point of view?

 - handling patches cover submodules

I don’t know what you mean by this, do you mean how do we submit cross-cutting patches?  How I do this in the cep-15-accord branch is by updating the pointer to point to my dependency PR, that way the build “does the right thing”, I just have to fix this up before merging into Cassandra (have to commit in the “correct" order)

 - switching branches, and using git worktrees, during dv

What is the concern her?  I am using work trees for PR review and cep-15-accord development and have zero issues with this; can you expand more on this?

And who would be fixing our build/test/release scripts to accommodate?

100% valid question to ask.  I personally am in favor of the proposer doing the work and not depend on specific CI people to do the work for them….  But cool with others helping out… I do feel its not good to depend on a single CI person to do all this; w/e it is we define.

I'm thinking about reproducible builds,

Is the concern that checking out a sub-modules’s SHA may not compile, breaking C*?  Is there another concern here?  Want to fully understand

switching between branches,

This is a pain point that I feel should be handled by git hooks.  We have this issue when you try to reuse the same directory for different release branches, and its super annoying when you go back in time when jars were in-tree as you need to cleanup after switching back…. I do agree that we should flesh this out as its going to be the common case, so how do we “do the right thing” should be handled

and git bisecting

Isn’t this just another example of switching branches?  If we solve that case then doesn’t git bisect come in for free?  


To include forward-merging

What is the concern here?

Rather that you need to know in advance when the SHA is not HEAD.

Do you?  Or do you really need to know which “branch” it is following?  For example, lets say we release 5.0 then 5.1 then 5.2, and there are accord versions for each: 1.0, 1.2, 2.0… do we not need to really know which branch it is following, and only when you are trying to do a cross-cutting change?

For example, if I want to fix a bug in 5.1 that modifies accord, I need to know to use the accord-1.2 branch?  I think this is a solvable progress with submodules and script, but green we should think about this case as its going to come up

Correct. submodules does not solve/remove the need to commit to multiple branches and forward merge. Furthermore submodules means at least one additional commit, and possibly twice as many commits.

We have 4 maintained branches atm, so if there is a bug in accord that needs to be fixed in all 4 we need
4 commits for C*
1 to 4 for Accord, depending on release history.

If we make sure all branches are using the latest “stable” accord then this is 6 commits (4 for C*, 1 for accord the stable branch, then 1 to merge into trunk)

Our current commit process is human controlled, so every commit is a chance for human error.  Maybe we should look to improve this?  I know I have my own script to avoid human error (which supports jvm/python dtest), maybe it would be best if the project had automation to make sure everyone “does the right thing”?

On Jan 18, 2023, at 3:06 AM, Benedict <bened...@apache.org> wrote:

Linking or merging while it is still also being a separate library and repo.

I am still unclear why you think this is “a significant thing”?

On 18 Jan 2023, at 10:41, Mick Semb Wever <m...@apache.org> wrote:




You would reference the snapshot dependency by the timestamped snapshot. This makes it a reproducible build.

How confident are we that the repository will not alter or delete them?


They cannot be altered.

I see artefacts there that are more than a decade old. But we cannot rely on their permanence. 

Putting the SHA into the jar's manifest is easy.  And this blog post shows how you can also expose this info on the command line: https://medium.com/liveramp-engineering/identifying-maven-snapshot-artifacts-by-git-revision-15b860d6228b 

Given there's no guaranteed permanence to the snapshots, we would need to have the git sha in the version, so if much older versions can't be downloaded it can still be rebuilt.

This is done like: <revision>1.0.0_${sha1}-SNAPSHOT</revision>

 
linking in the source code into in-tree is a significant thing to do

Could you explain why? I thought your preferred alternative was merging the source trees permanently


Linking or merging while it is still also being a separate library and repo.
If we are really not that interested in it as a separate library, and dev change is high, or the code is somewhere less accessible, then in tree makes sense IMHO.


Reply via email to