Thanks Mick and David. I've been following this silently for a few days
because we already exhausted my knowledge on the topic. But it seems your
collective knowledge is uncovering a nice solution.

If I summarize, I like all of this:

- link to SHA, not library version
- use git submodules because that's what they are meant to be used for/
it's standard
- use git hooks to automate the otherwise annoying ux of submodules
- use gradle to automate the installation of the hooks (note: imo must ask
user for explicit permission)
- whether or not user installed the hooks, build system by default should
check and fail to work with wrong sha in any submodule. But allow overrides.
- The build system, source tarball etc should consider the submodules as
just being a directory in the source tree. Things should work the same
whether you are in a git checkout or source tarball.

Henrik

On Fri, 20 Jan 2023, 02:54 David Capwell, <dcapw...@apple.com> wrote:

> Thanks for the reply, my replies are inline to your inline replies =D
>
> On Jan 19, 2023, at 2:39 PM, Mick Semb Wever <m...@apache.org> wrote:
>
>
> Thanks David for the detailed write up. Replies inline…
>
>
>
>> We tried in-tree for in-jvm dtest and found that this broke every other
>> commit… maintaining the APIs across all our supported branches was too hard
>> to do and moving it outside of the tree helped make the upgrade tests more
>> stable (there were breakage but less frequent)….
>>
>
>
> The in-jvm dtest-api library is unique in this way. I would not use it as
> reasoning that other libraries should not be in-tree.
>
>
> Fair, its the only API we have that is required to be byte code compatible
> cross versions/builds; this unique property may argue for different
> solutions than others
>
>
>
>
>
>> We tried to do snapshot builds where the version contained the SHA, but
>> this has the issue that snapshot builds “may” go away over time and made
>> older SHAs no longer building…
>>
>
>
> Only keeping the last snapshot in repository.a.o is INFRA's policy (i've
> found out).
> We can ask INFRA to set up a separate snapshots repository just for us,
> with a longer expiry policy. I'd rather not create extra work for infra if
> there's other ways we can do this, and this approach would always require
> some fallback approach to rebuilding the dedepency's SHA from scratch.
>
>
> If they will allow this and allow the snapshots to never be purged, then I
> am ok with this as a solution.
>
>
>
>
>
>> We break python-dtest when cross-cutting changes are added as CI is hard
>> to do correctly or not supported (testing downstream users (our 4 supported
>> branches) is rarely done).
>>
>
>
> python dtests' is also in a different category, (the context and
> consumption in a different direction, i.e. it's not a library used within
> the in-tree).
>
>
> I disagree.  The point I was making is we have several dependencies and we
> should think about how we maintain them.  My point is still valid that
> python dtests are involved with cross cutting changes to Cassandra, and the
> lack of downstream testing has broken us several times.  The solution to
> this problem may be different than Accord (as C* doesn’t depend on python
> dtest as you point out), but that does not mean we shouldn’t think about it
> in this conversation….
>
> One thing that comes to mind is that dependencies may benefit from running
> a limited C* CI as part of their merge process.  At the moment people are
> expected to create a tmp CI branch for all 4 supported C* versions, point
> it to the python dtest change, then submit to the JIRA as proof that CI was
> ran… normally when I find python dtest broke in branch X I find this had
> not happened…
>
> This holds true I believe for JVM dtest as well as we should be validating
> that the 4 target C* branches still work if you are touching jvm dtest…
>
> Now, with all that, Accord being external will have similar issues, a
> change there may break Cassandra so we should include a subset of Cassandra
> tests in Accord’s CI.
>
>
>
>
>> * [nice to have] be able to work with all subprojects in one IDE and not
>> have to switch between windows while making cross-cutting changes
>>
>
>
> Isn't it only IntelliJ that suffers this problem? (That doesn't invalidate
> it, just asking…)
>
>
> I have not used Eclipse or NetBeans for around 10 years so no clue!
>
>
>
>
>>  - same with `git pull …` (easy to be left with out-of-sync submodules)
>>
>>
>> Correct, if you use submodules/script you have a text file saying what we
>> “should” use, but this does not enforce actually using them… again we could
>> make sure build.xml does the right thing,
>>
>
>
> If we try this approach out, I'm definitely in favour of any build.xml
> command immediately failing if `git submodule status` != `git submodule
> status --cached`
>
>
> +1
>
>
>
>
>> but this can be confusing for people who mainly build in IDE and don’t
>> depend on build.xml until later in development… this is something we should
>> think about…
>>
>
>
> Again, isn't this only IntelliJ?
>
>
> Not sure, the only other IDE we support is NetBeans and not sure what we
> do there.
>
>
>
>
>> A project I am familiar with has their build auto-inject git hooks to
>> make sure things “just work”, we may be able to solve this in a similar way?
>>
>
>
> I'd like to hear/see more!
>
>
> The project wants to make sure commit messages are structured “correctly”
> so enforces this via git hooks.  Gradle (the build they use) makes tasks
> depend on “installGitHooks” which copies 2 hooks to .git/hooks (commit-msg,
> and pre-push)
>
>
> We could always do the same in build.xml, that copies hooks we define into
> .git/hooks to make sure the behaviors we expect are enforced.  See
> https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks
> <https://urldefense.com/v3/__https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks__;!!PbtH5S7Ebw!b1oU-fnsM0LGnf3l7vKGwIkz2N0BiHRQp56nBVDfFDkBDnXSMOMSmYtP3WORmbBFVIaWCm-qX5Z2xrMxsiM$>
>
>
> There is a “post-checkout” hook we could leverage to detect that the
> dependences SHA are no longer the same and recursively checks out the
> “correct” dependencies
>
>
>  - permanence from a git SHA no longer exists
>>
>>
>> Why is this?  The SHA points to other SHAs, so it is still immutable.  If
>> we claim that pointing to other SHAs doesn’t count then why do library
>> versions?  Both are immutable snapshots of code at a specific point in time?
>>
>
>
> This, and a number of the other points, is already resolved (that
> submodule's are on fixed SHAs, not floating HEAD).
>
>  - our releases get more complicated (our source tarballs are the asf
>> releases)
>>
>>
>> We don’t include our dependencies do we?  If so, then does it really?  If
>> Accord is a library we use, why would we include it’s source in the build?
>> Isn’t it just another library from this point of view?
>>
>
>
> The build of the source tarball must work. If the source tarball release
> switches how it does things, from building the submodule to including a
> dependency then we're back to having to make releases (and introducing a
> risk, and we don't ourselves work frequently with the source tarballs).
>
>
> That’s fair.  We wouldn’t have a .git dir anymore so couldn’t rely on git
> to “do the right thing” in this context, so bundling the dependencies may
> be the simplest in the git submodules case.
>
> Good point that we must think about before implementing anything!
>
>
>
>
>
>> switching between branches,
>>
>>
>> This is a pain point that I feel should be handled by git hooks.  We have
>> this issue when you try to reuse the same directory for different release
>> branches, and its super annoying when you go back in time when jars were
>> in-tree as you need to cleanup after switching back…. I do agree that we
>> should flesh this out as its going to be the common case, so how do we “do
>> the right thing” should be handled
>>
>
>
> +1
>
>
>
>> Rather that you need to know in advance when the SHA is not HEAD.
>>
>>
>> Do you?  Or do you really need to know which “branch” it is following?
>> For example, lets say we release 5.0 then 5.1 then 5.2, and there are
>> accord versions for each: 1.0, 1.2, 2.0… do we not need to really know
>> which branch it is following, and only when you are trying to do a
>> cross-cutting change?
>>
>
> I'm still a little confused here. If a submodule is following a branch, is
> that floating? Then a parent SHA isn't fixed to a submodule SHA?
>
> Say trunk is using accord:a12 where a12 is a SHA on its trunk. Other
> non-cassandra people using accord make commits, but our in-tree trunk isn't
> moved forward. Then someone in-tree does some dev that touches accord, they
> work away but late in the dev cycle find out that in-tree trunk isn't on
> the latest accord trunk and there's a conflict rebasing their work onto the
> latest accord. Is this an accurate description?
>
> Hope that all makes sense.
>
>
>
> Totally makes sense.  I think there are a few things to decompose
>
> 1) if you need to change a dependency, what branch do you modify?
> 2) what if the branch has changes we don’t want to take in?  Such as for
> hot fixes?
>
> Its been a long time since I used git submodules so before commenting I
> created a project to test this out and get a better feel for it.
>
>
> $ cat .gitmodules
> [submodule "apache-cassandra"]
>         path = apache-cassandra
>         url = https://github.com/apache/cassandra.git
> $ cat .git/modules/apache-cassandra/HEAD
> b07c3127cce7e1a8b9d4c34e1359539e12f1b4a7
>
> The git repo contains the SHA, which is stable until someone commits a
> change to it.  If the submodule has external updates, that doesn’t get
> pulled in
>
> If you roll back to an older commit, need to update the submodules
>
> git submodule update
>
> This updates the HEAD and puts you back at the SHA of the submodule for
> that specific SHA
>
> So which branch can be defined in the git config, but which SHA is done
> internal to the git db so you just checkout in the subproject then commit
> in the main project to fix the SHA
>
>
> Now, if you find that the branch (trunk in this case) has drifted and you
> can’t make your change, that is true for all dependencies (got to love
> guava!), so you wouldn’t be able to update until you resolve
>
> Now, in the hot fix case, we would need to make sure sub projects follow
> same branching rules (we own them so can do) so we don’t pull in big
> changes we don’t want.
>
>
>
>
>
>

Reply via email to