The db scanning is basically just querying the artifacts from the
database. The list of queried artifacts are iterated and database
consumers perform specific actions on each iteration..
Some of these actions are:
1. Adding the project models of the artifact to the database - during
repo scanning, only the basic information (groupId, artifactId, version,
etc.) of an artifact are added to the db. It is during the db scanning
that the contents of an artifact's project model (pom) are added to the
database. I think Joakim designed it like this to make the repo scanning
faster.
2. Cleaning up the database - deleted artifacts from the repo are
discovered during database scanning. As the queried artifacts are
iterated, the cleanup consumers verify if the artifact still exists on
the repository.. If not, then that artifact is deleted from the database.
Hope this helps :-)
-Deng
Wendy Smoak wrote:
On 10/16/07, Joakim Erdfelt <[EMAIL PROTECTED]> wrote:
ArchivaArtifactConsumer is an abstract-dealing-with-artifacts consumer.
RepositoryContentConsumer is for files.
A file that isn't an artifact can be *.xml, *.sha1, *.md5,
maven-metadata.xml, bad content, poorly named content, etc.
Would it be better to state the phase/scan instead?
RepositoryContentConsumer becomes -> RepositoryScanConsumer
ArchivaArtifactConsumer becomes -> DatabaseScanConsumer
All artifacts _are_ repository content, are they not? And even after
the renaming... it can't be in the database unless it's in the
repository.
I understand scanning the filesystem to update the database. But when
and why do you "scan" the database?