Re: Intra-project dependencies

David Capwell Thu, 19 Jan 2023 16:54:25 -0800

Thanks for the reply, my replies are inline to your inline replies =D

> On Jan 19, 2023, at 2:39 PM, Mick Semb Wever <m...@apache.org> wrote:
> 
> 
> Thanks David for the detailed write up. Replies inline…
> 
>  
> We tried in-tree for in-jvm dtest and found that this broke every other 
> commit… maintaining the APIs across all our supported branches was too hard 
> to do and moving it outside of the tree helped make the upgrade tests more 
> stable (there were breakage but less frequent)…. 
> 
> 
> The in-jvm dtest-api library is unique in this way. I would not use it as 
> reasoning that other libraries should not be in-tree.


Fair, its the only API we have that is required to be byte code compatible 
cross versions/builds; this unique property may argue for different solutions 
than others

> 
> 
>  
> We tried to do snapshot builds where the version contained the SHA, but this 
> has the issue that snapshot builds “may” go away over time and made older 
> SHAs no longer building… 
>  
> 
> Only keeping the last snapshot in repository.a.o is INFRA's policy (i've 
> found out).
> We can ask INFRA to set up a separate snapshots repository just for us, with 
> a longer expiry policy. I'd rather not create extra work for infra if there's 
> other ways we can do this, and this approach would always require some 
> fallback approach to rebuilding the dedepency's SHA from scratch.

If they will allow this and allow the snapshots to never be purged, then I am 
ok with this as a solution.

> 
> 
>  
> We break python-dtest when cross-cutting changes are added as CI is hard to 
> do correctly or not supported (testing downstream users (our 4 supported 
> branches) is rarely done).  
> 
> 
> python dtests' is also in a different category, (the context and consumption 
> in a different direction, i.e. it's not a library used within the in-tree). 

I disagree.  The point I was making is we have several dependencies and we 
should think about how we maintain them.  My point is still valid that python 
dtests are involved with cross cutting changes to Cassandra, and the lack of 
downstream testing has broken us several times.  The solution to this problem 
may be different than Accord (as C* doesn’t depend on python dtest as you point 
out), but that does not mean we shouldn’t think about it in this conversation….

One thing that comes to mind is that dependencies may benefit from running a 
limited C* CI as part of their merge process.  At the moment people are 
expected to create a tmp CI branch for all 4 supported C* versions, point it to 
the python dtest change, then submit to the JIRA as proof that CI was ran… 
normally when I find python dtest broke in branch X I find this had not 
happened… 

This holds true I believe for JVM dtest as well as we should be validating that 
the 4 target C* branches still work if you are touching jvm dtest…

Now, with all that, Accord being external will have similar issues, a change 
there may break Cassandra so we should include a subset of Cassandra tests in 
Accord’s CI.

> 
>  
> * [nice to have] be able to work with all subprojects in one IDE and not have 
> to switch between windows while making cross-cutting changes
> 
> 
> Isn't it only IntelliJ that suffers this problem? (That doesn't invalidate 
> it, just asking…)

I have not used Eclipse or NetBeans for around 10 years so no clue!  

> 
>   
>>  - same with `git pull …` (easy to be left with out-of-sync submodules)
> 
> Correct, if you use submodules/script you have a text file saying what we 
> “should” use, but this does not enforce actually using them… again we could 
> make sure build.xml does the right thing,
> 
> 
> If we try this approach out, I'm definitely in favour of any build.xml 
> command immediately failing if `git submodule status` != `git submodule 
> status --cached`

+1

> 
>  
> but this can be confusing for people who mainly build in IDE and don’t depend 
> on build.xml until later in development… this is something we should think 
> about…
> 
> 
> Again, isn't this only IntelliJ?

Not sure, the only other IDE we support is NetBeans and not sure what we do 
there. 

> 
>  
> A project I am familiar with has their build auto-inject git hooks to make 
> sure things “just work”, we may be able to solve this in a similar way?
> 
> 
> I'd like to hear/see more!

The project wants to make sure commit messages are structured “correctly” so 
enforces this via git hooks.  Gradle (the build they use) makes tasks depend on 
“installGitHooks” which copies 2 hooks to .git/hooks (commit-msg, and pre-push)


We could always do the same in build.xml, that copies hooks we define into 
.git/hooks to make sure the behaviors we expect are enforced.  See 
https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks 
<https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks>


There is a “post-checkout” hook we could leverage to detect that the 
dependences SHA are no longer the same and recursively checks out the “correct” 
dependencies

> 
>>  - permanence from a git SHA no longer exists
> 
> Why is this?  The SHA points to other SHAs, so it is still immutable.  If we 
> claim that pointing to other SHAs doesn’t count then why do library versions? 
>  Both are immutable snapshots of code at a specific point in time?
> 
> 
> This, and a number of the other points, is already resolved (that submodule's 
> are on fixed SHAs, not floating HEAD).
> 
>>  - our releases get more complicated (our source tarballs are the asf 
>> releases)
> 
> We don’t include our dependencies do we?  If so, then does it really?  If 
> Accord is a library we use, why would we include it’s source in the build?  
> Isn’t it just another library from this point of view?
> 
> 
> The build of the source tarball must work. If the source tarball release 
> switches how it does things, from building the submodule to including a 
> dependency then we're back to having to make releases (and introducing a 
> risk, and we don't ourselves work frequently with the source tarballs).

That’s fair.  We wouldn’t have a .git dir anymore so couldn’t rely on git to 
“do the right thing” in this context, so bundling the dependencies may be the 
simplest in the git submodules case.

Good point that we must think about before implementing anything!

> 
>  
>   
>> switching between branches,
> 
> This is a pain point that I feel should be handled by git hooks.  We have 
> this issue when you try to reuse the same directory for different release 
> branches, and its super annoying when you go back in time when jars were 
> in-tree as you need to cleanup after switching back…. I do agree that we 
> should flesh this out as its going to be the common case, so how do we “do 
> the right thing” should be handled
> 
> 
> +1
> 
>   
>> Rather that you need to know in advance when the SHA is not HEAD.
> 
> Do you?  Or do you really need to know which “branch” it is following?  For 
> example, lets say we release 5.0 then 5.1 then 5.2, and there are accord 
> versions for each: 1.0, 1.2, 2.0… do we not need to really know which branch 
> it is following, and only when you are trying to do a cross-cutting change?
> 
> I'm still a little confused here. If a submodule is following a branch, is 
> that floating? Then a parent SHA isn't fixed to a submodule SHA?
> 
> Say trunk is using accord:a12 where a12 is a SHA on its trunk. Other 
> non-cassandra people using accord make commits, but our in-tree trunk isn't 
> moved forward. Then someone in-tree does some dev that touches accord, they 
> work away but late in the dev cycle find out that in-tree trunk isn't on the 
> latest accord trunk and there's a conflict rebasing their work onto the 
> latest accord. Is this an accurate description?
> 
> Hope that all makes sense.


Totally makes sense.  I think there are a few things to decompose

1) if you need to change a dependency, what branch do you modify?
2) what if the branch has changes we don’t want to take in?  Such as for hot 
fixes?

Its been a long time since I used git submodules so before commenting I created 
a project to test this out and get a better feel for it.


$ cat .gitmodules
[submodule "apache-cassandra"]
        path = apache-cassandra
        url = https://github.com/apache/cassandra.git
$ cat .git/modules/apache-cassandra/HEAD
b07c3127cce7e1a8b9d4c34e1359539e12f1b4a7

The git repo contains the SHA, which is stable until someone commits a change 
to it.  If the submodule has external updates, that doesn’t get pulled in

If you roll back to an older commit, need to update the submodules

git submodule update

This updates the HEAD and puts you back at the SHA of the submodule for that 
specific SHA

So which branch can be defined in the git config, but which SHA is done 
internal to the git db so you just checkout in the subproject then commit in 
the main project to fix the SHA


Now, if you find that the branch (trunk in this case) has drifted and you can’t 
make your change, that is true for all dependencies (got to love guava!), so 
you wouldn’t be able to update until you resolve

Now, in the hot fix case, we would need to make sure sub projects follow same 
branching rules (we own them so can do) so we don’t pull in big changes we 
don’t want.

Re: Intra-project dependencies

Reply via email to