A few more use cases to worry about.

Use Case 1:
Request on "/repository/" url for maven-metadata.xml.

Problem: Assume Default Layout for all maven-metadata.xml requests?

Use Case 2:
Request on "/repository/" url using maven 1 legacy style for content that doesn't exist and needs to arrive via proxy connector from remote repository?

Problem: Can't parse pom (we don't have it yet) to get ArtifactReference in order to make request to remote repository

Use Case 3:
Request on "/repository/" url using maven 1 legacy style, but for content that is stored in default layout.

Problem: Catch-22: Need to parse path to Artifact Reference in order to load pom, in order to know the Artifact Reference to the pom to load.

- Joakim

If a request arrives into the "/repository/" url tha

Joakim Erdfelt wrote:

This is a long email, read it all before commenting, and you'll likely
see a response to your earlier questions. :-)

I'm currently working on MRM-432 and MRM-519, and I'm in the middle of an
important change to how Archiva handles Layout detection, interaction,
and parsing.

:Background:

Layouts in Archiva have 2 main purposes.

 1. to convert a path to an artifact reference.
 2. to convert an artifact reference to a path.

Layouts are used by the following.

 1. The "/repository/${repoid}/" urls use layouts to determine the
    Artifact Reference that the client is requesting.
    The "/repository/" url is layout neutral, and can have maven 1
    clients ask for content in legacy format, or maven 2 clients ask
    for content in default layout.
 2. Proxy requests out to remote repositories utilize layouts to take
    an internal Artifact Reference, convert it to a path appropriate
    to the remote layout configuration and obtain the content that is
    desired.
 3. Simple Consumers utilize layouts to obtain File references, and
    Artifact References to the repository content for purposes of
    operating on the content in a way that they desire.
 4. Complex consumers (such as metadata updater, and snapshots purge)
    utilize layouts to obtain lists of versions and artifacts.

What Works.

 * Converting an Artifact Reference to a path.
 * Discovering Versions in a default layout.
   (needed by metadata update / snapshot purge)
 * Converting a default layout path to an Artifact Reference correctly.

What Doesn't Work.

 * Detecting the layout in use 100% of the time.
 * Converting a legacy layout path to an Artifact Reference 100% of
   the time.
 * Discovering versions in a legacy layout.
   (do we need metadata update / snapshot purge here?)
 * Reporting problems correctly.

:The Problem:

The inability to parse useful information in a consistent way for all
provided paths.
Gleaning the following information from the path.

 * Layout Type (default / legacy)
 * Group ID
 * Artifact ID
 * Version (Deployed version & Base version)
 * Classifier (Not applicable in legacy layout)
 * Type (Not the same as Extension)

Example Paths: (included in this email for discussion, actual list
               from test cases)

groupId/jars/-1.0.jar
org.apache.maven.test/jars/artifactId-1.0.war
ch.ethz.ganymed/jars/ganymed-ssh2-build210.jar
javax/jars/comm-3.0-u1.jar
javax.persistence/jars/ejb-3.0-public_review.jar
maven/jars/maven-test-plugin-1.8.2.jar
commons-lang/jars/commons-lang-2.1.jar
org.apache.derby/jars/derby-10.2.2.0.jar
com.foo/ejbs/foo-client-1.0.jar
com.foo.lib/javadoc.jars/foo-lib-2.1-alpha-1-javadoc.jar
com.foo.lib/java-sources/foo-lib-2.1-alpha-1-sources.jar
com.foo/jars/foo-tool-1.0.jar
org.apache.geronimo.specs/jars/geronimo-ejb_2.1_spec-1.0.1.jar
directory-clients/poms/ldap-clients-0.9.1-SNAPSHOT.pom
org.apache.archiva.test/jars/redonkulous-3.1-beta-1-20050831.101112-42.jar
invalid/invalid/1.0-20050611.123456-1/invalid-1.0-20050611.123456-1.jar
ch/ethz/ganymed/ganymed-ssh2/build210/ganymed-ssh2-build210.jar
javax/comm/3.0-u1/comm-3.0-u1.jar
javax/persistence/ejb/3.0-public_review/ejb-3.0-public_review.jar
maven/maven-test-plugin/1.8.2/maven-test-plugin-1.8.2.pom
test/maven-arch/test-arch/2.0.3-SNAPSHOT/test-arch-2.0.3-SNAPSHOT.pom
com/company/department/com.company.department/0.2/com.company.department-0.2.pom com/company/department/com.company.department.project/0.3/com.company.department.project-0.3.pom
com/foo/foo-tool/1.0/foo-tool-1.0.jar
commons-lang/commons-lang/2.1/commons-lang-2.1.jar
com/foo/foo-client/1.0/foo-client-1.0.jar
com/foo/lib/foo-lib/2.1-alpha-1/foo-lib-2.1-alpha-1-sources.jar
org/apache/archiva/test/redonkulous/3.1-beta-1-SNAPSHOT/redonkulous-3.1-beta-1-20050831.101112-42.jar

:Proposal:

The proposed logic for detecting layout.

 1. Split path by directory seperators.
 2. If more than 3 parts ( dir/dir/dir/dir/filename ) == default layout.
 3. If less than 3 parts ( dir/filename ) == invalid path.
 4. If 3 parts ( dir/dir/filename ) then
4.1. If part 2 name ends in "s" then test for potential legacy layout.
         4.1.1. Identify filename extension.
         4.1.2. Get potential list of artifact types for extension.
         4.1.3. If part 2 (minus the end "s")  is in the list of
                artifact types == legacy layout
    4.2. Can't be legacy, then hand off to default layout.

 The problem with this approach is maintaining the list of extensions to
 artifact type.  The artifact type is arbitrary, and can be expanded
 upon by the user to include types that we can't even imagine today.
 (See MRM-481: issue with extension .xml.zip)

The proposed logic for parsing default layout paths.

 This one is easy.  paths are in the following format ...

${groupId}/${artifactId}/${baseVersion}/${artifactId}-${version}-${classifier}.${type}

 Once we seperate out the directories from the filename, we get the
 following order.

 dirs[dirs.length] = base version.
 dirs[dirs.length-1] = artifact Id.
 dirs[0] thru dirs[dirs.length-2] = groupId.

 That gives us the crucial pieces in the filename
 ${artifactId}-${version}, which makes detecting the classifier and
 type easy enough.

The proposed logic for parsing legacy layout paths.

 Legacy layouts are tricky.  It is nearly impossible to detect, using
 the path alone, the correct artifactId or version.  So the process
 will need to read the pom file associated with the artifact Id in order
 to determine the correct Artifact Reference pieces.

 The problem with this approach is that we now need 2 pieces of
 information, the repository root (location or url) and the path.
 Plus we incur a hit / read of the pom file.

 So, if we use the pseudo-pattern ...
 [:groupId:]/[:type:]s/[:filename:].[:ext:]
 as a starting point, swap out the [:type:] and [:ext:] for "pom" and
 load the pom from the actual repository to determine the groupId,
 artifactId, and version, we can then have an valid Artifact Reference.

 The problem with relying on the pom is that it is now required for
 legacy layout "from path" logic, this changes the assumption that poms
 are optional and not required, as well as changing the interface
 to the layout objects to needing a repository as well.

The proposed changes to the codebase.

 * Eliminate RepositoryLayoutUtils, roll layout specific filename
   parsing routines into their respected layouts.
 * Eliminate direct usage of BidirectionalRepositoryLayout by
   consumers.
 * Create RepositoryContentRequest that takes the freeform requests
   arriving in from the "/repository/" urls and puts it through
   the logic as outlined above.
 * Rename BidirectionalRepositoryLayout interface to RepositoryContent
   to simplify name and represent new role of accessing repository
   content that requires a repository reference.
 * Create DefaultRepositoryContent and LegacyRepositoryContent
   implementations, that utilize techniques described above, and
   logic already present in DefaultBidirectionalRepositoryLayout and
   LegacayBidirectionalRepositoryLayout.
 * Create AnonymousProjectReader that takes a File object pointing to
   a pom, read the <pomVersion> or <modelVersion> elements and load
   the pom information as appropriate.
 * Create RepositoryContentFactory that returns a RepositoryContent
   implementation for the provided repository id.

Example of new RepositoryContent interface.

--(snip)--
package org.apache.maven.archiva.repository;

/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements.  See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership.  The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License.  You may obtain a copy of the License at
*
*  http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied.  See the License for the
* specific language governing permissions and limitations
* under the License.
*/

import org.apache.maven.archiva.model.ArtifactReference;
import org.apache.maven.archiva.model.ProjectReference;
import org.apache.maven.archiva.model.VersionedReference;
import org.apache.maven.archiva.repository.layout.LayoutException;

import java.util.List;

/**
* RepositoryContent interface for interacting with a managed repository
* in an abstract way, without the need for processing based on
* filesystem paths, or working with the database.
*
* @author <a href="mailto:[EMAIL PROTECTED]">Joakim Erdfelt</a>
* @version $Id$
*/
public interface RepositoryContent
{
   /**
    * Determines if the project referenced exists in the repository.
    *
    * @param reference the project reference to check for.
    * @return true it the project referenced exists.
    */
   public boolean hasContent( ProjectReference reference );

   /**
    * Determines if the version reference exists in the repository.
    *
    * @param reference the version reference to check for.
    * @return true if the version referenced exists.
    */
   public boolean hasContent( VersionedReference reference );

   /**
    * Determines if the artifact referenced exists in the repository.
    *
    * @param reference the artifact reference to check for.
    * @return true if the artifact referenced exists.
    */
   public boolean hasContent( ArtifactReference reference );

   /**
    * Given a repository relative path to a filename, return the
    * [EMAIL PROTECTED] VersionedReference} object suitable for the path.
    *
    * @param path the path relative to the repository base dir for
    *        the artifact.
    * @return the [EMAIL PROTECTED] ArtifactReference} representing the path.
    *        (or null if path cannot be converted to a
    *        [EMAIL PROTECTED] ArtifactReference})
    * @throws LayoutException if there was a problem converting the
    *         path to an artifact.
    */
   public ArtifactReference toArtifactReference( String path );

   /**
    * Given an ArtifactReference, return the relative path to the
    * artifact.
    *
    * @param reference the artifact reference to use.
    * @return the relative path to the artifact.
    */
   public String toPath( ArtifactReference reference );

   /**
    * Given an ArtifactReference, return the file reference to the
    * artifact.
    *
    * @param reference the artifact reference to use.
    * @return the relative path to the artifact.
    */
   public File toFile( ArtifactReference reference );

   /**
    * Given an ArtifactReference, return the url to the artifact.
    *
    * @param reference the artifact reference to use.
    * @return the relative path to the artifact.
    */
   public URL toURL( ArtifactReference reference );


   /**
    * Gather up the list of related artifacts to the ArtifactReference
    * provided. This typically inclues the pom files, and those things
    * with classifiers (such as doc, source code, test libs, etc...)
    *
    * NOTE: Some layouts (such as maven 1 "legacy"), and remote
    * repositories are not compatible with this query.
    *
    * @param reference the reference to work off of.
    * @return the list of ArtifactReferences for related artifacts.
    * @throws ContentNotFoundException if the initial artifact reference
    *         does not exist within the repository.
    */
   public List<ArtifactReference> getRelatedArtifacts(
       ArtifactReference reference )
       throws ContentNotFoundException, NotSupportedException;

   /**
    * Given a specific VersionedReference, return the list of available
    * versions for that versioned reference.
    *
    * NOTE: This is really only useful when working with SNAPSHOTs.
    *       Not compatible with remote repositories.
    *
    * @param reference the versioned reference to work off of.
    * @return the list of versions found.
    * @throws ContentNotFoundException if the versioned reference does
    *         not exist within the repository.
    */
   public List<String> getVersions( VersionedReference reference )
       throws ContentNotFoundException, NotSupportedException;

   /**
    * Given a specific ProjectReference, return the list of available
    * versions for that project reference.
    *
    * @param reference the project reference to work off of.
    * @return the list of versions found for that project reference.
    * @throws ContentNotFoundException if the project reference does not
    *         exist within the repository.
    */
   public List<String> getVersions( ProjectReference reference )
       throws ContentNotFoundException, NotSupportedException;
}
--(snip)--

I feel this is a better long term solution for the persistent layout
parsing issues we have within Archiva.  However not all of the problems
have been solved.  I've outlined the ones that need help above in this
email, but I'm sure there are ones that have been overlooked.

Disclaimer: Yes, this is in the form of a proposal, but I'm already
working on this, and will continue down this path unless
someone here has a strong objection about this approach.



--
- Joakim Erdfelt
 [EMAIL PROTECTED]
 Open Source Software (OSS) Developer

Reply via email to