This is an automated email from the ASF dual-hosted git repository.
sbp pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tooling-trusted-releases.git
The following commit(s) were added to refs/heads/main by this push:
new d846152 Add database documentation about core models and other
features
d846152 is described below
commit d846152e30b4499819857bd288891b3e5b76782a
Author: Sean B. Palmer <[email protected]>
AuthorDate: Fri Oct 10 17:10:31 2025 +0100
Add database documentation about core models and other features
---
atr/docs/database.html | 25 +++++++++++++++++++++++++
atr/docs/database.md | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
atr/models/sql.py | 1 +
3 files changed, 74 insertions(+)
diff --git a/atr/docs/database.html b/atr/docs/database.html
index e274d6b..656fa73 100644
--- a/atr/docs/database.html
+++ b/atr/docs/database.html
@@ -5,6 +5,31 @@
<p><strong>Sections</strong>:</p>
<ul>
<li><a href="#introduction">Introduction</a></li>
+<li><a href="#core-models">Core models</a></li>
+<li><a href="#other-features">Other features</a></li>
</ul>
<h2 id="introduction">Introduction</h2>
<p>ATR stores all of its data in a SQLite database. The database schema is
defined in <a href="/ref/atr/models/sql.py"><code>models.sql</code></a> using
<a href="https://sqlmodel.tiangolo.com/">SQLModel</a>, which uses <a
href="https://docs.pydantic.dev/latest/">Pydantic</a> for data validation and
<a href="https://www.sqlalchemy.org/">SQLAlchemy</a> for database operations.
This page explains the main features of the database schema to help you
understand how data is structured in ATR.</p>
+<h2 id="core-models">Core models</h2>
+<p>The three most important models in ATR are <a
href="/ref/atr/models/sql.py:Committee"><code>Committee</code></a>, <a
href="/ref/atr/models/sql.py:Project"><code>Project</code></a>, and <a
href="/ref/atr/models/sql.py:Release"><code>Release</code></a>.</p>
+<p>A <a href="/ref/atr/models/sql.py:Committee"><code>Committee</code></a>
represents a PMC or PPMC at the ASF. Each committee has a name, which is the
primary key, and a full name for display purposes. Committees can have child
committees, which is used for the relationship between the Incubator PMC and
individual podling PPMCs. Committees also have lists of committee members and
committers stored as JSON arrays.</p>
+<p>A <a href="/ref/atr/models/sql.py:Project"><code>Project</code></a> belongs
to a committee and can have multiple releases. Projects have a name as the
primary key, along with metadata such as a description and category and
programming language tags. Each project can optionally have a <a
href="/ref/atr/models/sql.py:ReleasePolicy"><code>ReleasePolicy</code></a> that
defines how releases should be handled, including e.g. vote templates and
GitHub workflow configuration.</p>
+<p>A <a href="/ref/atr/models/sql.py:Release"><code>Release</code></a> belongs
to a project and represents a specific version of software which is voted on by
a committee. The primary key is a name derived from the project name and
version. Releases have a phase that indicates their current state in the
release process, from draft composition to final publication. Each release can
have multiple <a
href="/ref/atr/models/sql.py:Revision"><code>Revision</code></a> instances
before final pub [...]
+<h2 id="other-features">Other features</h2>
+<p>The models themselves are the most important components, but to support
those models we need other components such as enumerations, column types,
automatically populated fields, computed properties, and constraints.</p>
+<h3 id="enumerations">Enumerations</h3>
+<p>ATR uses Python enumerations to ensure that certain fields only contain
valid values. The most important enumeration is <a
href="/ref/atr/models/sql.py:ReleasePhase"><code>ReleasePhase</code></a>, which
defines the four phases of a release: <code>RELEASE_CANDIDATE_DRAFT</code> for
composing, <code>RELEASE_CANDIDATE</code> for voting,
<code>RELEASE_PREVIEW</code> for finishing, and <code>RELEASE</code> for
completed releases.</p>
+<p>The <a href="/ref/atr/models/sql.py:TaskStatus"><code>TaskStatus</code></a>
enumeration defines the states a task can be in: <code>QUEUED</code>,
<code>ACTIVE</code>, <code>COMPLETED</code>, or <code>FAILED</code>. The <a
href="/ref/atr/models/sql.py:TaskType"><code>TaskType</code></a> enumeration
lists all the different types of background tasks that ATR can execute, from
signature checks to SBOM generation.</p>
+<p>The <a
href="/ref/atr/models/sql.py:DistributionPlatform"><code>DistributionPlatform</code></a>
enumeration is more complex, as each value contains not just a name but a <a
href="/ref/atr/models/sql.py:DistributionPlatformValue"><code>DistributionPlatformValue</code></a>
with template URLs and configuration for different package distribution
platforms like PyPI, npm, and Maven Central.</p>
+<h3 id="special-column-types">Special column types</h3>
+<p>SQLite does not support all the data types we need, so we use SQLAlchemy
type decorators to handle conversions. The <a
href="/ref/atr/models/sql.py:UTCDateTime"><code>UTCDateTime</code></a> type
ensures that all datetime values are stored in UTC and returned as
timezone-aware datetime objects. When Python code provides a datetime with
timezone information, the type decorator converts it to UTC before storing.
When reading from the database, it adds back the UTC timezone information.</p>
+<p>The <a
href="/ref/atr/models/sql.py:ResultsJSON"><code>ResultsJSON</code></a> type
handles storing task results. It automatically serializes Pydantic models to
JSON when writing to the database, and deserializes them back to the
appropriate result model when reading.</p>
+<h3 id="automatic-field-population">Automatic field population</h3>
+<p>Some fields are populated automatically using SQLAlchemy event listeners.
When a new <a href="/ref/atr/models/sql.py:Revision"><code>Revision</code></a>
is created, the <a
href="/ref/atr/models/sql.py:populate_revision_sequence_and_name"><code>populate_revision_sequence_and_name</code></a>
function runs before the database insert. This function queries for the
highest existing sequence number for the release, increments it, and sets both
the <code>seq</code> and <code>number</code> fi [...]
+<p>The <a
href="/ref/atr/models/sql.py:check_release_name"><code>check_release_name</code></a>
function runs before inserting a release. If the release name is empty, it
automatically generates it from the project name and version using the <a
href="/ref/atr/models/sql.py:release_name"><code>release_name</code></a> helper
function.</p>
+<h3 id="computed-properties">Computed properties</h3>
+<p>Some properties are computed dynamically rather than stored in the
database. The <code>Release.latest_revision_number</code> property is
implemented as a SQLAlchemy column property using a correlated subquery. This
means that when you access <code>release.latest_revision_number</code>,
SQLAlchemy automatically executes a query to find the highest revision number
for that release. The query is defined once in <a
href="/ref/atr/models/sql.py:RELEASE_LATEST_REVISION_NUMBER"><code>RELEASE [...]
+<p>Projects have many computed properties that provide access to release
policy settings with appropriate defaults. For example,
<code>Project.policy_start_vote_template</code> returns the custom vote
template if one is configured, or falls back to
<code>Project.policy_start_vote_default</code> if not. This pattern allows
projects to customize their release process while providing sensible
defaults.</p>
+<h3 id="constraints-and-validation">Constraints and validation</h3>
+<p>Database constraints ensure data integrity. The <a
href="/ref/atr/models/sql.py:Task"><code>Task</code></a> model includes a check
constraint that validates the status transitions. A task must start in
<code>QUEUED</code> state, can only transition to <code>ACTIVE</code> when
<code>started</code> and <code>pid</code> are set, and can only reach
<code>COMPLETED</code> or <code>FAILED</code> when the <code>completed</code>
timestamp is set. These constraints prevent invalid state transi [...]
+<p>Unique constraints ensure that certain combinations of fields are unique.
The <code>Release</code> model has a unique constraint on <code>(project_name,
version)</code> to prevent creating duplicate releases for the same project
version. The <code>Revision</code> model has two unique constraints: one on
<code>(release_name, seq)</code> and another on <code>(release_name,
number)</code>, ensuring that revision numbers are unique within a release.</p>
diff --git a/atr/docs/database.md b/atr/docs/database.md
index 4e49028..cd5db57 100644
--- a/atr/docs/database.md
+++ b/atr/docs/database.md
@@ -9,7 +9,55 @@
**Sections**:
* [Introduction](#introduction)
+* [Core models](#core-models)
+* [Other features](#other-features)
## Introduction
ATR stores all of its data in a SQLite database. The database schema is
defined in [`models.sql`](/ref/atr/models/sql.py) using
[SQLModel](https://sqlmodel.tiangolo.com/), which uses
[Pydantic](https://docs.pydantic.dev/latest/) for data validation and
[SQLAlchemy](https://www.sqlalchemy.org/) for database operations. This page
explains the main features of the database schema to help you understand how
data is structured in ATR.
+
+## Core models
+
+The three most important models in ATR are
[`Committee`](/ref/atr/models/sql.py:Committee),
[`Project`](/ref/atr/models/sql.py:Project), and
[`Release`](/ref/atr/models/sql.py:Release).
+
+A [`Committee`](/ref/atr/models/sql.py:Committee) represents a PMC or PPMC at
the ASF. Each committee has a name, which is the primary key, and a full name
for display purposes. Committees can have child committees, which is used for
the relationship between the Incubator PMC and individual podling PPMCs.
Committees also have lists of committee members and committers stored as JSON
arrays.
+
+A [`Project`](/ref/atr/models/sql.py:Project) belongs to a committee and can
have multiple releases. Projects have a name as the primary key, along with
metadata such as a description and category and programming language tags. Each
project can optionally have a
[`ReleasePolicy`](/ref/atr/models/sql.py:ReleasePolicy) that defines how
releases should be handled, including e.g. vote templates and GitHub workflow
configuration.
+
+A [`Release`](/ref/atr/models/sql.py:Release) belongs to a project and
represents a specific version of software which is voted on by a committee. The
primary key is a name derived from the project name and version. Releases have
a phase that indicates their current state in the release process, from draft
composition to final publication. Each release can have multiple
[`Revision`](/ref/atr/models/sql.py:Revision) instances before final
publication, representing iterations of the underl [...]
+
+## Other features
+
+The models themselves are the most important components, but to support those
models we need other components such as enumerations, column types,
automatically populated fields, computed properties, and constraints.
+
+### Enumerations
+
+ATR uses Python enumerations to ensure that certain fields only contain valid
values. The most important enumeration is
[`ReleasePhase`](/ref/atr/models/sql.py:ReleasePhase), which defines the four
phases of a release: `RELEASE_CANDIDATE_DRAFT` for composing,
`RELEASE_CANDIDATE` for voting, `RELEASE_PREVIEW` for finishing, and `RELEASE`
for completed releases.
+
+The [`TaskStatus`](/ref/atr/models/sql.py:TaskStatus) enumeration defines the
states a task can be in: `QUEUED`, `ACTIVE`, `COMPLETED`, or `FAILED`. The
[`TaskType`](/ref/atr/models/sql.py:TaskType) enumeration lists all the
different types of background tasks that ATR can execute, from signature checks
to SBOM generation.
+
+The [`DistributionPlatform`](/ref/atr/models/sql.py:DistributionPlatform)
enumeration is more complex, as each value contains not just a name but a
[`DistributionPlatformValue`](/ref/atr/models/sql.py:DistributionPlatformValue)
with template URLs and configuration for different package distribution
platforms like PyPI, npm, and Maven Central.
+
+### Special column types
+
+SQLite does not support all the data types we need, so we use SQLAlchemy type
decorators to handle conversions. The
[`UTCDateTime`](/ref/atr/models/sql.py:UTCDateTime) type ensures that all
datetime values are stored in UTC and returned as timezone-aware datetime
objects. When Python code provides a datetime with timezone information, the
type decorator converts it to UTC before storing. When reading from the
database, it adds back the UTC timezone information.
+
+The [`ResultsJSON`](/ref/atr/models/sql.py:ResultsJSON) type handles storing
task results. It automatically serializes Pydantic models to JSON when writing
to the database, and deserializes them back to the appropriate result model
when reading.
+
+### Automatic field population
+
+Some fields are populated automatically using SQLAlchemy event listeners. When
a new [`Revision`](/ref/atr/models/sql.py:Revision) is created, the
[`populate_revision_sequence_and_name`](/ref/atr/models/sql.py:populate_revision_sequence_and_name)
function runs before the database insert. This function queries for the
highest existing sequence number for the release, increments it, and sets both
the `seq` and `number` fields. It also constructs the revision name by
combining the release n [...]
+
+The [`check_release_name`](/ref/atr/models/sql.py:check_release_name) function
runs before inserting a release. If the release name is empty, it automatically
generates it from the project name and version using the
[`release_name`](/ref/atr/models/sql.py:release_name) helper function.
+
+### Computed properties
+
+Some properties are computed dynamically rather than stored in the database.
The `Release.latest_revision_number` property is implemented as a SQLAlchemy
column property using a correlated subquery. This means that when you access
`release.latest_revision_number`, SQLAlchemy automatically executes a query to
find the highest revision number for that release. The query is defined once in
[`RELEASE_LATEST_REVISION_NUMBER`](/ref/atr/models/sql.py:RELEASE_LATEST_REVISION_NUMBER)
and attached [...]
+
+Projects have many computed properties that provide access to release policy
settings with appropriate defaults. For example,
`Project.policy_start_vote_template` returns the custom vote template if one is
configured, or falls back to `Project.policy_start_vote_default` if not. This
pattern allows projects to customize their release process while providing
sensible defaults.
+
+### Constraints and validation
+
+Database constraints ensure data integrity. The
[`Task`](/ref/atr/models/sql.py:Task) model includes a check constraint that
validates the status transitions. A task must start in `QUEUED` state, can only
transition to `ACTIVE` when `started` and `pid` are set, and can only reach
`COMPLETED` or `FAILED` when the `completed` timestamp is set. These
constraints prevent invalid state transitions at the database level.
+
+Unique constraints ensure that certain combinations of fields are unique. The
`Release` model has a unique constraint on `(project_name, version)` to prevent
creating duplicate releases for the same project version. The `Revision` model
has two unique constraints: one on `(release_name, seq)` and another on
`(release_name, number)`, ensuring that revision numbers are unique within a
release.
diff --git a/atr/models/sql.py b/atr/models/sql.py
index 01e7581..d3f25c3 100644
--- a/atr/models/sql.py
+++ b/atr/models/sql.py
@@ -280,6 +280,7 @@ class ResultsJSON(sqlalchemy.types.TypeDecorator):
try:
return results.ResultsAdapter.validate_python(value)
except pydantic.ValidationError:
+ # TODO: Should we make this more strict?
return None
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]