[ 
https://issues.apache.org/jira/browse/MNG-8184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Bruun-Hansen updated MNG-8184:
-----------------------------------
    Description: 
h2. The problem

At the moment, adding an additional repository to a project's POM has 
implications few Maven users realize.

Maven adds the repository _prior_ in the lookup order to say Maven Central.

This means:
 * The new repository can now effectively "impersonate" anything in Maven 
Central. This is obviously a huge security risk.

 * Builds will now take longer time because Maven needs to check the newly 
added repository too. And this is for every of your project's dependencies. You 
better hope that the added repository responds quickly.

In many, many projects it happens that an additional repository must be added, 
either because the organization has an internal Maven Repository Server or 
because the project need artifacts from some third-party Maven Repository 
Server on the internet, like {{maven.oracle.com}} or whatever.
h2. Proposed solution

What I propose is somewhat inspired by a mechanism in NuGet known as [Package 
Source 
Mapping|https://learn.microsoft.com/en-us/nuget/consume-packages/package-source-mapping].

Specifically, I propose to add a groupId matching mechanism to the 
{{<repository>}} element. Like this:
{code:xml}
<repository>
    <name>Dubious Third Party</name>
    <id>roque</id>
    <url>https://maven.wild-rover-roque.com</url>
    <groupIdMatchers>
       <groupIdMatcher>com.wild-rover-roque.*</groupIdMatcher>
    </groupIdMatcher>
</repository>
{code}
 
As the name suggest, it means the repository will only be considered in the 
artifact discovery process if the {{groupId}} matches any of the configured 
filters. The absence of any {{<groupIdMatcher>}} would mean the repository will 
be considered for any groupId, as today.

The proposal is a simple, yet effective way to massively strengthen the supply 
chain security of a Maven build. Indeed, in the future the recommendation 
should be to _always_ add {{<groupIdMatcher>}} when you add a 
{{{}<repository>{}}}. It is difficult for me come up with a scenario where you 
wouldn't be able to come up with good fitting filter values. Even for your 
internal corporate Maven Repository Server you can most likely say without 
hesitation what the filter value should be. It could look something like this:
{code:xml}
<repository>
    <!-- The organization's internal Maven Repo Server -->
    <name>Internal Nexus</name>
    <id>nexus</id>
    <url>https://maven.carlsberg.internal/repository/releases</url>
    <groupIdMatchers>
       <groupIdMatcher>com.carlsberg.*</groupIdMatcher>
       <groupIdMatcher>dk.jacobsen.*</groupIdMatcher>
    </groupIdMatchers>
</repository>
{code}
Imagine: IDE's can do checks for this and flag if you are missing to specify 
{{{}<groupIdMatchers>{}}}. If you want the IDE to stop obsessing, you can add:
{code:xml}
    <groupIdMatchers>
       <groupIdMatcher>*</groupIdMatcher>
    </groupIdMatchers>
{code}
... which gives the same effect as not specifying any {{<groupIdMatchers>}} but 
which means you've actively given it consideration.

As an added bonus, the proposal will also make builds go faster.

Finally the proposal is backwards compatible. Unlike other suggestions that has 
existed over time, such as changing the [default lookup 
order|https://maven.apache.org/guides/mini/guide-multiple-repositories.html#repository-order].
 Prior proposals has suggested new features/switches that would prioritize 
Maven Central, this proposal does the opposite: it instead effectively 
de-prioritizes other repositories, making Maven Central the one that is likely 
to be poked first.
h2. The details

The proposal applies to both {{<repository>}} and {{<pluginRepository>}} and to 
the POM as well as the {{{}settings.xml{}}}, in short anywhere repositories for 
artifact discovery are configured.

The actual syntax of the filter value would have to be worked out. I propose 
something very, simple, not a regexp, but a watered down glob where the only 
supported wildcard is the '*' and only as a suffix. But that's because I'm 
thinking it needs to be quick to evaluate. Perhaps not a concern?

 
h2. Existing workarounds

A known workaround is to re-state Maven Central in your POM _before_ your 
third-party repos, like this:
{code:xml}
<repository>
     <id>central</id>
     <name>Central Repository</name>
     <url>https://repo.maven.apache.org/maven2</url>
     ...
</repository>
<repository>
    <name>Dubious Third Party</name>
    <id>roque</id>
    <url>https://maven.wild-rover-roque.com</url>
    <groupIdMatchers>
       <groupIdMatcher>com.wild-rover-roque.*</groupIdMatcher>
    </groupIdMatchers>
    ...
</repository>
{code}
 
This changes the lookup order, so that lookup is performed in {{central}} 
before {{roque}}.

Obviously, this is less than ideal. Developers forget that they really (IMO) 
always should be doing this. And they tend to get the exact definition of what 
Maven Central should look like wrong.

  was:
h2. The problem

At the moment, adding an additional repository to a project's POM has 
implications few Maven users realize.

Maven adds the repository _prior_ in the lookup order to say Maven Central.

This means:
 * The new repository can now effectively "impersonate" anything in Maven 
Central. This is obviously a huge security risk.

 * Builds will now take longer time because Maven needs to check the newly 
added repository too. And this is for every of your project's dependencies. You 
better hope that the added repository responds quickly.

In many, many projects it happens that an additional repository must be added, 
either because the organization has an internal Maven Repository Server or 
because the project need artifacts from some third-party Maven Repository 
Server on the internet, like {{maven.oracle.com}} or whatever.
h2. Proposed solution

What I propose is somewhat inspired by a mechanism in NuGet known as [Package 
Source 
Mapping|https://learn.microsoft.com/en-us/nuget/consume-packages/package-source-mapping].

Specifically, I propose to add a groupId matching mechanism to the 
{{<repository>}} element. Like this:
{code:xml}
<repository>
    <name>Dubious Third Party</name>
    <id>roque</id>
    <url>https://maven.wild-rover-roque.com</url>
    <groupIdMatchers>
       <groupIdMatcher>com.wild-rover-roque.*</groupIdMatcher>
    </groupIdMatcher>
</repository>
{code}
 
As the name suggest, it means the repository will only be considered in the 
artifact discovery process if the {{groupId}} matches any of the configured 
filters. The absence of any {{<groupIdMatcher>}} would mean the repository will 
be considered for any groupId, as today.

The proposal is a simple, yet effective way to massively strengthen the supply 
chain security of a Maven build. Indeed, in the future the recommendation 
should be to _always_ add {{<groupIdMatcher>}} when you add a 
{{{}<repository>{}}}. It is difficult for me come up with a scenario where you 
wouldn't be able to come up with good fitting filter values. Even for your 
internal corporate Maven Repository Server you can most likely say without 
hesitation what the filter value should be. It could look something like this:
{code:xml}
<repository>
    <!-- The organization's internal Maven Repo Server -->
    <name>Internal Nexus</name>
    <id>nexus</id>
    <url>https://maven.carlsberg.internal/repository/releases</url>
    <groupIdMatchers>
       <groupIdMatcher>com.carlsberg.*</groupIdMatcher>
       <groupIdMatcher>dk.jacobsen.*</groupIdMatcher>
    </groupIdMatchers>
</repository>
{code}
Imagine: IDE's can do checks for this and flag if you are missing to specify 
{{{}<groupIdMatchers>{}}}. If you want the IDE to stop obsessing, you can add:
{code:xml}
    <groupIdMatchers>
       <groupIdMatcher>*</groupIdMatcher>
    </groupIdMatchers>
{code}
... which gives the same effect as not specifying any {{<groupIdMatchers>}} but 
which means you've actively given it consideration.

As an added bonus, the proposal will also make builds go faster.

Finally the proposal is backwards compatible. Unlike other suggestions that has 
existed over time, such as changing the [default lookup 
order|https://maven.apache.org/guides/mini/guide-multiple-repositories.html#repository-order].
 Prior proposals has suggested new features/switches that would prioritize 
Maven Central, this proposal does the opposite: it instead effectively 
de-prioritizes other repositories, making Maven Central the one that is likely 
to be poked first.
h2. The details

The proposal applies to both {{<repository>}} and {{<pluginRepository>}} and to 
the POM as well as the {{{}settings.xml{}}}, in short anywhere repositories for 
artifact discovery are configured.

The actual syntax of the filter value would have to be worked out. I propose 
something very, simple, not a regexp, but a watered down glob where the only 
supported wildcard is the '*' and only as a suffix. But that's because I'm 
thinking it needs to be quick to evaluate. Perhaps not a concern?

 
h2. Existing workarounds

A known workaround is to re-state Maven Central in your POM _before_ your 
third-party repos, like this:
{code:xml}
<repository>
     <id>central</id>
     <name>Central Repository</name>
     <url>https://repo.maven.apache.org/maven2</url>
     ...
</repository>
<repository>
    <name>Dubious Third Party</name>
    <id>roque</id>
    <url>https://maven.wild-rover-roque.com</url>
    <groupIdMatchers>
       <groupIdMatcher>com.wild-rover-roque.*</groupIdMatcher>
    </groupIdMatchers>
    ...
</repository>
{code}
 

Obviously, this is less than ideal. Developers forget that they really (IMO) 
always should be doing this. And they tend to get the exact definition of what 
Maven Central should look like wrong.


> GroupId pinning (supply chain security)
> ---------------------------------------
>
>                 Key: MNG-8184
>                 URL: https://issues.apache.org/jira/browse/MNG-8184
>             Project: Maven
>          Issue Type: New Feature
>          Components: Artifacts and Repositories
>            Reporter: Lars Bruun-Hansen
>            Priority: Major
>
> h2. The problem
> At the moment, adding an additional repository to a project's POM has 
> implications few Maven users realize.
> Maven adds the repository _prior_ in the lookup order to say Maven Central.
> This means:
>  * The new repository can now effectively "impersonate" anything in Maven 
> Central. This is obviously a huge security risk.
>  * Builds will now take longer time because Maven needs to check the newly 
> added repository too. And this is for every of your project's dependencies. 
> You better hope that the added repository responds quickly.
> In many, many projects it happens that an additional repository must be 
> added, either because the organization has an internal Maven Repository 
> Server or because the project need artifacts from some third-party Maven 
> Repository Server on the internet, like {{maven.oracle.com}} or whatever.
> h2. Proposed solution
> What I propose is somewhat inspired by a mechanism in NuGet known as [Package 
> Source 
> Mapping|https://learn.microsoft.com/en-us/nuget/consume-packages/package-source-mapping].
> Specifically, I propose to add a groupId matching mechanism to the 
> {{<repository>}} element. Like this:
> {code:xml}
> <repository>
>     <name>Dubious Third Party</name>
>     <id>roque</id>
>     <url>https://maven.wild-rover-roque.com</url>
>     <groupIdMatchers>
>        <groupIdMatcher>com.wild-rover-roque.*</groupIdMatcher>
>     </groupIdMatcher>
> </repository>
> {code}
>  
> As the name suggest, it means the repository will only be considered in the 
> artifact discovery process if the {{groupId}} matches any of the configured 
> filters. The absence of any {{<groupIdMatcher>}} would mean the repository 
> will be considered for any groupId, as today.
> The proposal is a simple, yet effective way to massively strengthen the 
> supply chain security of a Maven build. Indeed, in the future the 
> recommendation should be to _always_ add {{<groupIdMatcher>}} when you add a 
> {{{}<repository>{}}}. It is difficult for me come up with a scenario where 
> you wouldn't be able to come up with good fitting filter values. Even for 
> your internal corporate Maven Repository Server you can most likely say 
> without hesitation what the filter value should be. It could look something 
> like this:
> {code:xml}
> <repository>
>     <!-- The organization's internal Maven Repo Server -->
>     <name>Internal Nexus</name>
>     <id>nexus</id>
>     <url>https://maven.carlsberg.internal/repository/releases</url>
>     <groupIdMatchers>
>        <groupIdMatcher>com.carlsberg.*</groupIdMatcher>
>        <groupIdMatcher>dk.jacobsen.*</groupIdMatcher>
>     </groupIdMatchers>
> </repository>
> {code}
> Imagine: IDE's can do checks for this and flag if you are missing to specify 
> {{{}<groupIdMatchers>{}}}. If you want the IDE to stop obsessing, you can add:
> {code:xml}
>     <groupIdMatchers>
>        <groupIdMatcher>*</groupIdMatcher>
>     </groupIdMatchers>
> {code}
> ... which gives the same effect as not specifying any {{<groupIdMatchers>}} 
> but which means you've actively given it consideration.
> As an added bonus, the proposal will also make builds go faster.
> Finally the proposal is backwards compatible. Unlike other suggestions that 
> has existed over time, such as changing the [default lookup 
> order|https://maven.apache.org/guides/mini/guide-multiple-repositories.html#repository-order].
>  Prior proposals has suggested new features/switches that would prioritize 
> Maven Central, this proposal does the opposite: it instead effectively 
> de-prioritizes other repositories, making Maven Central the one that is 
> likely to be poked first.
> h2. The details
> The proposal applies to both {{<repository>}} and {{<pluginRepository>}} and 
> to the POM as well as the {{{}settings.xml{}}}, in short anywhere 
> repositories for artifact discovery are configured.
> The actual syntax of the filter value would have to be worked out. I propose 
> something very, simple, not a regexp, but a watered down glob where the only 
> supported wildcard is the '*' and only as a suffix. But that's because I'm 
> thinking it needs to be quick to evaluate. Perhaps not a concern?
>  
> h2. Existing workarounds
> A known workaround is to re-state Maven Central in your POM _before_ your 
> third-party repos, like this:
> {code:xml}
> <repository>
>      <id>central</id>
>      <name>Central Repository</name>
>      <url>https://repo.maven.apache.org/maven2</url>
>      ...
> </repository>
> <repository>
>     <name>Dubious Third Party</name>
>     <id>roque</id>
>     <url>https://maven.wild-rover-roque.com</url>
>     <groupIdMatchers>
>        <groupIdMatcher>com.wild-rover-roque.*</groupIdMatcher>
>     </groupIdMatchers>
>     ...
> </repository>
> {code}
>  
> This changes the lookup order, so that lookup is performed in {{central}} 
> before {{roque}}.
> Obviously, this is less than ideal. Developers forget that they really (IMO) 
> always should be doing this. And they tend to get the exact definition of 
> what Maven Central should look like wrong.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to