Thanks Mick and David. I've been following this silently for a few days because we already exhausted my knowledge on the topic. But it seems your collective knowledge is uncovering a nice solution.
If I summarize, I like all of this: - link to SHA, not library version - use git submodules because that's what they are meant to be used for/ it's standard - use git hooks to automate the otherwise annoying ux of submodules - use gradle to automate the installation of the hooks (note: imo must ask user for explicit permission) - whether or not user installed the hooks, build system by default should check and fail to work with wrong sha in any submodule. But allow overrides. - The build system, source tarball etc should consider the submodules as just being a directory in the source tree. Things should work the same whether you are in a git checkout or source tarball. Henrik On Fri, 20 Jan 2023, 02:54 David Capwell, <dcapw...@apple.com> wrote: > Thanks for the reply, my replies are inline to your inline replies =D > > On Jan 19, 2023, at 2:39 PM, Mick Semb Wever <m...@apache.org> wrote: > > > Thanks David for the detailed write up. Replies inline… > > > >> We tried in-tree for in-jvm dtest and found that this broke every other >> commit… maintaining the APIs across all our supported branches was too hard >> to do and moving it outside of the tree helped make the upgrade tests more >> stable (there were breakage but less frequent)…. >> > > > The in-jvm dtest-api library is unique in this way. I would not use it as > reasoning that other libraries should not be in-tree. > > > Fair, its the only API we have that is required to be byte code compatible > cross versions/builds; this unique property may argue for different > solutions than others > > > > > >> We tried to do snapshot builds where the version contained the SHA, but >> this has the issue that snapshot builds “may” go away over time and made >> older SHAs no longer building… >> > > > Only keeping the last snapshot in repository.a.o is INFRA's policy (i've > found out). > We can ask INFRA to set up a separate snapshots repository just for us, > with a longer expiry policy. I'd rather not create extra work for infra if > there's other ways we can do this, and this approach would always require > some fallback approach to rebuilding the dedepency's SHA from scratch. > > > If they will allow this and allow the snapshots to never be purged, then I > am ok with this as a solution. > > > > > >> We break python-dtest when cross-cutting changes are added as CI is hard >> to do correctly or not supported (testing downstream users (our 4 supported >> branches) is rarely done). >> > > > python dtests' is also in a different category, (the context and > consumption in a different direction, i.e. it's not a library used within > the in-tree). > > > I disagree. The point I was making is we have several dependencies and we > should think about how we maintain them. My point is still valid that > python dtests are involved with cross cutting changes to Cassandra, and the > lack of downstream testing has broken us several times. The solution to > this problem may be different than Accord (as C* doesn’t depend on python > dtest as you point out), but that does not mean we shouldn’t think about it > in this conversation…. > > One thing that comes to mind is that dependencies may benefit from running > a limited C* CI as part of their merge process. At the moment people are > expected to create a tmp CI branch for all 4 supported C* versions, point > it to the python dtest change, then submit to the JIRA as proof that CI was > ran… normally when I find python dtest broke in branch X I find this had > not happened… > > This holds true I believe for JVM dtest as well as we should be validating > that the 4 target C* branches still work if you are touching jvm dtest… > > Now, with all that, Accord being external will have similar issues, a > change there may break Cassandra so we should include a subset of Cassandra > tests in Accord’s CI. > > > > >> * [nice to have] be able to work with all subprojects in one IDE and not >> have to switch between windows while making cross-cutting changes >> > > > Isn't it only IntelliJ that suffers this problem? (That doesn't invalidate > it, just asking…) > > > I have not used Eclipse or NetBeans for around 10 years so no clue! > > > > >> - same with `git pull …` (easy to be left with out-of-sync submodules) >> >> >> Correct, if you use submodules/script you have a text file saying what we >> “should” use, but this does not enforce actually using them… again we could >> make sure build.xml does the right thing, >> > > > If we try this approach out, I'm definitely in favour of any build.xml > command immediately failing if `git submodule status` != `git submodule > status --cached` > > > +1 > > > > >> but this can be confusing for people who mainly build in IDE and don’t >> depend on build.xml until later in development… this is something we should >> think about… >> > > > Again, isn't this only IntelliJ? > > > Not sure, the only other IDE we support is NetBeans and not sure what we > do there. > > > > >> A project I am familiar with has their build auto-inject git hooks to >> make sure things “just work”, we may be able to solve this in a similar way? >> > > > I'd like to hear/see more! > > > The project wants to make sure commit messages are structured “correctly” > so enforces this via git hooks. Gradle (the build they use) makes tasks > depend on “installGitHooks” which copies 2 hooks to .git/hooks (commit-msg, > and pre-push) > > > We could always do the same in build.xml, that copies hooks we define into > .git/hooks to make sure the behaviors we expect are enforced. See > https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks > <https://urldefense.com/v3/__https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks__;!!PbtH5S7Ebw!b1oU-fnsM0LGnf3l7vKGwIkz2N0BiHRQp56nBVDfFDkBDnXSMOMSmYtP3WORmbBFVIaWCm-qX5Z2xrMxsiM$> > > > There is a “post-checkout” hook we could leverage to detect that the > dependences SHA are no longer the same and recursively checks out the > “correct” dependencies > > > - permanence from a git SHA no longer exists >> >> >> Why is this? The SHA points to other SHAs, so it is still immutable. If >> we claim that pointing to other SHAs doesn’t count then why do library >> versions? Both are immutable snapshots of code at a specific point in time? >> > > > This, and a number of the other points, is already resolved (that > submodule's are on fixed SHAs, not floating HEAD). > > - our releases get more complicated (our source tarballs are the asf >> releases) >> >> >> We don’t include our dependencies do we? If so, then does it really? If >> Accord is a library we use, why would we include it’s source in the build? >> Isn’t it just another library from this point of view? >> > > > The build of the source tarball must work. If the source tarball release > switches how it does things, from building the submodule to including a > dependency then we're back to having to make releases (and introducing a > risk, and we don't ourselves work frequently with the source tarballs). > > > That’s fair. We wouldn’t have a .git dir anymore so couldn’t rely on git > to “do the right thing” in this context, so bundling the dependencies may > be the simplest in the git submodules case. > > Good point that we must think about before implementing anything! > > > > > >> switching between branches, >> >> >> This is a pain point that I feel should be handled by git hooks. We have >> this issue when you try to reuse the same directory for different release >> branches, and its super annoying when you go back in time when jars were >> in-tree as you need to cleanup after switching back…. I do agree that we >> should flesh this out as its going to be the common case, so how do we “do >> the right thing” should be handled >> > > > +1 > > > >> Rather that you need to know in advance when the SHA is not HEAD. >> >> >> Do you? Or do you really need to know which “branch” it is following? >> For example, lets say we release 5.0 then 5.1 then 5.2, and there are >> accord versions for each: 1.0, 1.2, 2.0… do we not need to really know >> which branch it is following, and only when you are trying to do a >> cross-cutting change? >> > > I'm still a little confused here. If a submodule is following a branch, is > that floating? Then a parent SHA isn't fixed to a submodule SHA? > > Say trunk is using accord:a12 where a12 is a SHA on its trunk. Other > non-cassandra people using accord make commits, but our in-tree trunk isn't > moved forward. Then someone in-tree does some dev that touches accord, they > work away but late in the dev cycle find out that in-tree trunk isn't on > the latest accord trunk and there's a conflict rebasing their work onto the > latest accord. Is this an accurate description? > > Hope that all makes sense. > > > > Totally makes sense. I think there are a few things to decompose > > 1) if you need to change a dependency, what branch do you modify? > 2) what if the branch has changes we don’t want to take in? Such as for > hot fixes? > > Its been a long time since I used git submodules so before commenting I > created a project to test this out and get a better feel for it. > > > $ cat .gitmodules > [submodule "apache-cassandra"] > path = apache-cassandra > url = https://github.com/apache/cassandra.git > $ cat .git/modules/apache-cassandra/HEAD > b07c3127cce7e1a8b9d4c34e1359539e12f1b4a7 > > The git repo contains the SHA, which is stable until someone commits a > change to it. If the submodule has external updates, that doesn’t get > pulled in > > If you roll back to an older commit, need to update the submodules > > git submodule update > > This updates the HEAD and puts you back at the SHA of the submodule for > that specific SHA > > So which branch can be defined in the git config, but which SHA is done > internal to the git db so you just checkout in the subproject then commit > in the main project to fix the SHA > > > Now, if you find that the branch (trunk in this case) has drifted and you > can’t make your change, that is true for all dependencies (got to love > guava!), so you wouldn’t be able to update until you resolve > > Now, in the hot fix case, we would need to make sure sub projects follow > same branching rules (we own them so can do) so we don’t pull in big > changes we don’t want. > > > > > >