Re: Git migration - handling commits and PRs across multiple modules

Robert Munteanu Mon, 09 Oct 2017 11:49:38 -0700

Hi Andrei,

On Mon, 2017-10-09 at 17:18 +0000, Andrei Dulvac wrote:
> On Mon, Oct 9, 2017 at 5:03 PM Robert Munteanu <romb...@apache.org>
> wrote:
> > 
> > > 
> > > With the git extreme repo split, how will commits that touch two
> > > repos be
> > > handled? Or pull-requests? Would I have to open two PRs for a
> > > functionally-atomic change?
> > 
> > Well, a functionally atomic change should not IMO touch multiple
> > modules.
> > 
> 
> What about coding in a feature that requires new API in a new module?
> Technically, yes, a "functionally-atomic" change can be broken down
> into
> two commits that justify themselves as self-standing, but it smells a
> bit
> like over-engineering. Would you create two PRs for such a change? Or
> add
> the API first, merge the PR, release the API module and then open a
> PR for
> the impl? Anyways, I really think what you mean makes sense, but
> sometimes
> it's not the reality, especially in modules that are not in
> maintenance
> mode.


You make one commit for the API module, one commit for the impl module.
You need to change the impl → api dependency to SNAPSHOT in either
scenario.

I may be biased by the fact that I have never encountered such a
situation while working on Sling, although I know they do exist :-) I
just don't see it as a scenario to optimize for.

> 
> > 
> > If you need to touch multiple modules then you perform multiple
> > commits
> > or open multiple pull requests.
> > 
> 
> But the PRs would share the jira issue description, would not be
> reviewable
> or testable separately and would have to be merged at the same
> time...

Yes, they would share Jira description. But yes, they would be
reviewable separately - APIs changes can be reviewed without looking at
the impl. One might argue that it's even healthier to do so to prevent
bikeshedding about implementation details but that's another issue :-)

And implementations + corresponding tests can be reviewed separately of
the API change, I don't see why not.

And since we only deploy releases to the launchpad this does not break
anything.

> > 
> > > How are projects that have reactors and SNAPSHOT dependencies
> > > going
> > > to work?
> > 
> > Just like they do before - the CI always builds and deploys
> > snapshots
> > so each project will be independently buildable. It's been this way
> > since we switched to per-module Jenkins jobs.
> > 
> 
> Got it. But wouldn't that make it very hard for local development?

Why would being able to build a single module be harder than having to
build multiple projects? Of course you can build multiple projects if
you want to, but that's not required by any means.

We plan to generate checkouts using google repo, as mentioned before.
There is a "groups" functionality which should allow you to only check
out some modules.

In the same way we generate this repo file, we can generate reactor
POMs. These poms can be inferred from naming conventions or from
predefined rules. However, it's best to keep the repos consistent - one
module per repository if we want to have such global approaches rather
than special-casing here and there.

> 
> > 
> > > It feels to me that this strict one maven module per repo would
> > > introduce
> > > some artificial problems and besides git tags, I see no tangible
> > > advantage
> > > to doing so.
> > 
> > When working with 280 modules, simplicity is a great advantage :-)
> > With
> > the (IMO rare) situation of needing to touch multiple modules in
> > one
> > go, what other problems do you see with this approach?
> > 
> 
> Local development (keeping track of logs, N number of the same
> operations
> to e.g. checkout branches, revert changes, etc.), github events. But
> they're all sort of related to the need to build/ deploy/ test more
> than
> one maven module at a time. Is it really that rare?

We have that scenario for Sling-wide changes, e.g. parent pom updates.
The good part i

> 
> 
> > Note that we will introduce a way of locally checking out all
> > repositories in a single directory so you will be able to work on a
> > single filesystem view.
> > 
> 
> I didn't expect anything less :) But that's extra tooling that would
> have
> to be maintained. But I guess you use something like gitslave or
> submodules, in which case it comes for free.

google repo is what we prototyped so far. Since we autogenerate them we
can autogenerate everything :-)

> > > If it's a question of two-way referencing, the <scm> tag in the
> > > pom
> > > file
> > > should be enough for anybody or any tooling.
> > 
> > Not sure what you mean by two-way referencing, can you elaborate on
> > that?
> > 
> 
> A way to infer the repo from an artifact (the reference the other way
> is
> obvious; i.e. build code in repo and get artifact)

As documented at https://cwiki.apache.org/confluence/display/SLING/Move
+from+Subversion+to+Git , the rule is sling-${artifactId}. After the
latest updates from infra, it's probably going to be sling-
${artifactId}.replaceAll('.','-'), but you get the point.

> 
> Still, what are the advantages besides a (false?) sense of
> consistency?

I maintain that consistency is important when dealing with a large
number of modules. Consistency is important for tooling, as to be
honest we don't really have someone working on our tooling part - it's
a couple of us in our spare time. Consistency is important for
developers and contributors, as we should get minimal friction when
looking at a Sling module once we've looked at another one. 

Yes, there can be some differences but if we start having special cases
the barrier to contribution goes up. And one of the big selling points
of Git and GitHub is lowering the barrier to contribution.

------

In addition to what we've discussed so far, I want to point out that
it's always possible to revert this if in the future we decide it was a
bad idea to have repos per project. We merge 3 repositories into 1,
drop the old 3 and start working on the new one. 

Given how long we've taken to get here and all the moving pieces that
are into play I suggest we move now with the migration, given that we
have a backup plan.

Thanks,

Robert

Re: Git migration - handling commits and PRs across multiple modules

Reply via email to