Re: Model Version 5.0.0

Stephen Connolly Mon, 25 Nov 2013 02:03:32 -0800

First off, and this is addressed at drive-by readers, most everyone else
knows me well enough to know this anyway. I may be the PMC chair, but
99.99% of the things I say are not said as the PMC chair, instead they are
said as a committer to the project who is interested in the current and
future health. I do not have any extra special influence. (If I *were*
stating something as PMC chair it would be sent from my apache account and
I would put on the hat with a phrase such as "as PMC chair"... and it would
probably only be to resolve a stalemate in a technical choice that had
stalemated both the committers and the PMC and was threatening the
reputation of the ASF... i.e. the board would likely be watching closely...
)

On 25 November 2013 07:46, Kristian Rosenvold
<kristian.rosenv...@gmail.com>wrote:

> IMO publishing to central/acrhiva would involve publishing the "richest"
> format available. Based on use-agent identification (or lack of a given
> request param indicating old-style client) the repository should be able to
> down-transform a v5 pom to a v4 pom "on the fly" ?

How would that handle GPG signatures of the pom file? If you allow for
on-the-fly transformation then you loose the GPG signature immediately. The
dependency information of a pom is critical to maintaining the integrity of
the actual file. For example if you can modify the pom file you can replace
one dependency with another that you control and gives you a hook with
which to inject malicious code.

I see that as a credible risk. For sure we have been less than awesome at
providing users with the tools to verify the GPG signatures of resolved
artifacts and pom files... but that does not mean we should mandate that
the repository adopt technologies that render such tooling impossible to
achieve.

> We're not going to be
> losing semantic
> backward compatibility on any of the changes I've seen suggested yet ?
>

I think every change has a mapping back to modelVersion 4.0.0... not every
change will have a complete mapping back to modelVersion 4.0.0... or to put
it another way, we can transform 5.0.0 to 4.0.0... and similarly as the
5.0.0 model would be an extension of 4.0.0 we can extend it up... but 5.0.0
-> 4.0.0 -> 5.0.0 would loose information on the down transform... 5.0.0 is
a double and 4.0.0 is a float (3.0.0 is an unsigned long ;-) )

>
> Also, did I miss the bit where someone explained why the whole "how to
> build" section cannot be stripped away upon publication ? I don't
> understand why that means we need multuiple files.
>

Lets say, for sake of argument, that we decide to go with the build vs
dependency-consumer split.

When you checkout your source tree you will see something like

pom.xml (or pom.json or whatever format we decide for *building*)

So what do we deploy to the repo... well for org.machu.foo:foobar:1.0

* We have to publish a file that is parsable by 4.0.0 readers... otherwise
we limit consumers of our artifact to those using a client that understands
the new format... that would be bad... so foobar-1.0.pom gets published...
it's a modelVersion 4.0.0 pom. *Because* we are publishing this for
*consumers of the artifact* it does not need any build information. We can
strip all that cruft out and just provide the <dependencies>. Further we
can resolve *all* versions to pinned explicit versions and strip out scopes
that do not make sense for consumers, e.g. `test`. We can even add
exclusions based on the complete tree (which would allow removing
`provided` scope dependencies as well as being necessary for the `provides`
extra semantic information I want to see added). None of these changes will
break existing clients. It does mean that a modelVersion 5.0.0 pom will not
be able to generate a pom for 4.0.0 clients that contains some of the
bug/features that some people seem to rely on, e.g. ${} expansion in
<dependencies>... but we don't need to maintain such guarantees when we
have a new schema.

* We want to be able to expose the new dependency model information to
consumers that can understand such information... so lets publish that
information as a (briefly searches for relatively unused file extensions...
Dependency ModeL... ok) .dml file. This file will contain the dependencies,
extra semantic information such as provides, etc. Again it would be fully
resolved and not contain a reference to a parent .dml file. In other words,
once you get that file you have everything needed to parse that file (well
except perhaps for a transformation mapping to down-map the file into a
format you *can* parse... my XSLT idea)

* Finally we have the artifact itself, foobar-1.0.jar and all the gpg
signature and the .md5 hashes and .sha1 hashes... (we could argue only the
gpg signatures are needed... but older clients rely on the hashes, so we
cannot really break them... though a good repository manager could
certainly generate them if the gpg signatures verify)

So the complete list in this scheme

foobar-1.0.pom
foobar-1.0.pom.gpg
foobar-1.0.pom.md5
foobar-1.0.pom.sha1
foobar-1.0.pom.gpg.sha1
foobar-1.0.pom.gpg.md5
foobar-1.0.dml
foobar-1.0.dml.gpg
foobar-1.0.dml.md5 (could perhaps omit as only new clients will read this)
foobar-1.0.dml.sha1 (could perhaps omit as only new clients will read this)
foobar-1.0.dml.gpg.sha1 (could perhaps omit as only new clients will read
this)
foobar-1.0.dml.gpg.md5 (could perhaps omit as only new clients will read
this)
foobar-1.0.jar
foobar-1.0.jar.gpg
foobar-1.0.jar.md5
foobar-1.0.jar.sha1
foobar-1.0.jar.gpg.sha1
foobar-1.0.jar.gpg.md5

So that is all fine and dandy...

Now we refactor the build, introducing a common parent pom,
foobar-parent... what do we need to publish for that?

* Well my first "get out of jail" card is to mandate that when building,
you cannot use a parent pom that has a *newer* modelVersion than the child
pom. Thus we do not have to worry about people using Maven 3.2 and trying
to use foobar-parent:1.0 as their parent pom. We set the <prerequisites> in
the deployed modelVersion 4.0.0 pom to the required version of Maven and we
inject an enforcer rule bound to the `validate` phase that immediately
fails the build with a message stating that you cannot use it as a parent.

  Thus even if you use Maven 4.0.0 to build a modelVersion 4.0.0 pom, you
still will not be able to have a modelVersion 5.0.0 parent. I think this is
reasonable, as we cannot expect to fully down-map the dependency features
let alone the build features.

  So we are deploying a *generated* modelVersion 4.0.0 pom as
foobar-parent-1.0.pom which is stripped to just effective dependencies (etc
as for the jar) and has a <build> section that causes anyone who tries to
use it as a parent pom from a pre-maven 4.0.0 format pom to get an
immediate build failure.

* There are valid cases where a parent pom can include a set of
dependencies that are common to all child projects. It may not be a style
that I like, but just as I am not going to give out if somebody writes
their *project* and has the idiotic idea of using TABs to indent (I'll moan
if I have to make a contribution to their project though) I do not think we
should prevent such a use case. Additionally, and perhaps more importantly,
there can be side artifacts for a pom packaging. Thus we really should be
publishing a .dml file for the parent. Most likely it will be empty (we
don't need <dependencyManagement> because .dml files *never* include a
parent reference) but the file is needed for any side-artifacts

* What about people using this project *as a parent*... we need to deploy
something for them... we can assume they will be able to understand our
modelVersion and format (because we have used that get out of jail card
already to prevent the modelVersion 4.0.0 children), so lets just deploy
the pom with a classifier of build

foobar-parent-1.0.pom
foobar-parent-1.0.pom.gpg
foobar-parent-1.0.pom.md5
foobar-parent-1.0.pom.sha1
foobar-parent-1.0.pom.gpg.sha1
foobar-parent-1.0.pom.gpg.md5
foobar-parent-1.0-build.pom
foobar-parent-1.0-build.pom.gpg
foobar-parent-1.0-build.pom.md5 (could perhaps omit as only new clients
will read this)
foobar-parent-1.0-build.pom.sha1 (could perhaps omit as only new clients
will read this)
foobar-parent-1.0-build.pom.gpg.sha1 (could perhaps omit as only new
clients will read this)
foobar-parent-1.0-build.pom.gpg.md5 (could perhaps omit as only new clients
will read this)
foobar-parent-1.0.dml
foobar-parent-1.0.dml.gpg
foobar-parent-1.0.dml.md5 (could perhaps omit as only new clients will read
this)
foobar-parent-1.0.dml.sha1 (could perhaps omit as only new clients will
read this)
foobar-parent-1.0.dml.gpg.sha1 (could perhaps omit as only new clients will
read this)
foobar-parent-1.0.dml.gpg.md5 (could perhaps omit as only new clients will
read this)
foobar-1.0-src.tar.gz (illustrating the most common side-artifact for pom
projects)
foobar-1.0-src.tar.gz.gpg
foobar-1.0-src.tar.gz.md5
foobar-1.0-src.tar.gz.sha1
foobar-1.0-src.tar.gz.gpg.sha1
foobar-1.0-src.tar.gz.gpg.md5

That is my view of *one way* to get to modelVersion 5.0.0. I think that
*technically* the above could work. There are issues:

* Newer clients will go looking for the .dml file and then fall back to the
.pom if the .dml is missing... that makes 5 requests (.dml, .pom, .pom.gpg,
.jar, .jar.gpg - or replace .gpg with whatever hash you want) to get a .jar
file rather than 4, in other words a 20% increase in requests for older
artifacts... or 33% increase if you don't want integrity checks)... we
could do a bulk generation of the .dml files... but then we have to
generate gpg signatures for those files which would break the trust that
gpg is supposed to inject.

  Given that older clients currently go hunting for two hashes I think we
can ignore this issue, e.g. it's actually better than .pom, .pom.md5,
.pom.sha1, .jar, .jar.md5, .jar.sha1

  This would therefore be using the .gpg file as a download integrity check
and then optionally an additional check that users can choose to turn on
would be to check that the key used to sign is trusted.

* I am not sure how down-model versioning would work in reality. So the
idea here is that we say that the .dml file is a machine generated format.
It makes sense, to me, that this would be XML (because XSLT is cross
platform-ish). We would mandate that the first element be the
modelVersion.. could be via a namespace or an element... does not matter
too much for this.

  A parser thus reads the modelVersion easily. If it is a known
modelVersion... fine, proceed with the parse. If it is a newer modelVersion
then you go download org.apache.maven:model-mapping:${modelVersion}:xsl and
run the .dml through that transformation... lather rinse repeat until you
have a modelVersion that you understand...

  We would need to EITHER be very careful when publishing the XSLT files OR
relax the rules on re-downloading non-SNAPSHOTs for
org.apache.maven:model-mapping only (the later could produce irreproducible
builds though)

  In any case I think that is how we can allow for future evolution of the
.dml modelVersion (NOTE: this need not be the pom modelVersion...)... but
where we have the greater need for schema change is on the build side not
on the dependency list side... so I think it should not be too much of a
concern... we just have to be very careful with .dml schema changes.

What does this get us?

* It lets us change the build schema

* It lets us change the build format... the pom need not be XML any more

In short, it frees us up to change.

Is this the only way? Nope... it is the best way I can think of... I hope
that somebody has a better suggestion and I fear that this is the best...
but there certainly are a lot worse ways of evolving our schema

-Stephen

>
> I'm exposed to "the competition" at @dayjob these days, and I must say I
> think reducing verobosity and duplication is /the/ most important feture of
>  a v5 pom for me.
>
> Kristian
>
>

Re: Model Version 5.0.0

Reply via email to