The db scanning is basically just querying the artifacts from the database. The list of queried artifacts are iterated and database consumers perform specific actions on each iteration..

Some of these actions are:
1. Adding the project models of the artifact to the database - during repo scanning, only the basic information (groupId, artifactId, version, etc.) of an artifact are added to the db. It is during the db scanning that the contents of an artifact's project model (pom) are added to the database. I think Joakim designed it like this to make the repo scanning faster.

2. Cleaning up the database - deleted artifacts from the repo are discovered during database scanning. As the queried artifacts are iterated, the cleanup consumers verify if the artifact still exists on the repository.. If not, then that artifact is deleted from the database.

Hope this helps :-)

-Deng

Wendy Smoak wrote:
On 10/16/07, Joakim Erdfelt <[EMAIL PROTECTED]> wrote:
ArchivaArtifactConsumer is an abstract-dealing-with-artifacts consumer.
RepositoryContentConsumer is for files.

A file that isn't an artifact can be *.xml, *.sha1, *.md5,
maven-metadata.xml, bad content, poorly named content, etc.

Would it be better to state the phase/scan instead?

RepositoryContentConsumer becomes -> RepositoryScanConsumer
ArchivaArtifactConsumer becomes -> DatabaseScanConsumer

All artifacts _are_ repository content, are they not?  And even after
the renaming... it can't be in the database unless it's in the
repository.

I understand scanning the filesystem to update the database.  But when
and why do you "scan" the database?


Reply via email to